2024-02-21 19:50:50

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 00/36] Memory allocation profiling

Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for
debug kernels, overhead low enough to be deployed in production.

Example output:
root@moria-kvm:~# sort -rn /proc/allocinfo
127664128 31168 mm/page_ext.c:270 func:alloc_page_ext
56373248 4737 mm/slub.c:2259 func:alloc_slab_page
14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded
14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash
13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio
9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node
4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
3940352 962 mm/memory.c:4214 func:alloc_anon_folio
2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
...

Since v3:
- Dropped patch changing string_get_size() [2] as not needed
- Dropped patch modifying xfs allocators [3] as non needed,
per Dave Chinner
- Added Reviewed-by, per Kees Cook
- Moved prepare_slab_obj_exts_hook() and alloc_slab_obj_exts() where they
are used, per Vlastimil Babka
- Fixed SLAB_NO_OBJ_EXT definition to use unused bit, per Vlastimil Babka
- Refactored patch [4] into other patches, per Vlastimil Babka
- Replaced snprintf() with seq_buf_printf(), per Kees Cook
- Changed output to report bytes, per Andrew Morton and Pasha Tatashin
- Changed output to report [module] only for loadable modules,
per Vlastimil Babka
- Moved mem_alloc_profiling_enabled() check earlier, per Vlastimil Babka
- Changed the code to handle page splitting to be more understandable,
per Vlastimil Babka
- Moved alloc_tagging_slab_free_hook(), mark_objexts_empty(),
mark_failed_objexts_alloc() and handle_failed_objexts_alloc(),
per Vlastimil Babka
- Fixed loss of __alloc_size(1, 2) in kvmalloc functions,
per Vlastimil Babka
- Refactored the code in show_mem() to avoid memory allocations,
per Michal Hocko
- Changed to trylock in show_mem() to avoid blocking in atomic context,
per Tetsuo Handa
- Added mm mailing list into MAINTAINERS, per Kees Cook
- Added base commit SHA, per Andy Shevchenko
- Added a patch with documentation, per Jani Nikula
- Fixed 0day bugs
- Added benchmark results [5], per Steven Rostedt
- Rebased over Linux 6.8-rc5

Items not yet addressed:
- An early_boot option to prevent pageext overhead. We are looking into
ways for using the same sysctr instead of adding additional early boot
parameter.

Usage:
kconfig options:
- CONFIG_MEM_ALLOC_PROFILING
- CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
- CONFIG_MEM_ALLOC_PROFILING_DEBUG
adds warnings for allocations that weren't accounted because of a
missing annotation

sysctl:
/proc/sys/vm/mem_profiling

Runtime info:
/proc/allocinfo

Notes:

[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
(6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y
(7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y

Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:

kmalloc pgalloc
(1 baseline) 6.764s 16.902s
(2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%)
(3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%)
(4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%)
(5 memcg) 13.388s (+97.94%) 48.460s (+186.71%)
(6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%)
(7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%)

Memory overhead:
Kernel size:

text data bss dec diff
(1) 26515311 18890222 17018880 62424413
(2) 26524728 19423818 16740352 62688898 264485
(3) 26524724 19423818 16740352 62688894 264481
(4) 26524728 19423818 16740352 62688898 264485
(5) 26541782 18964374 16957440 62463596 39183

Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags: 192 kB
PageExts: 262144 kB (256MB)
SlabExts: 9876 kB (9.6MB)
PcpuExts: 512 kB (0.5MB)

Total overhead is 0.2% of total memory.

[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/
[4] https://lore.kernel.org/all/[email protected]/
[5] Benchmarks:

Hackbench tests run 100 times:
hackbench -s 512 -l 200 -g 15 -f 25 -P
baseline disabled profiling enabled profiling
avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023)
stdev 0.0137 0.0188 0.0077


hackbench -l 10000
baseline disabled profiling enabled profiling
avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859)
stdev 0.0933 0.0286 0.0489

stress-ng tests:
stress-ng --class memory --seq 4 -t 60
stress-ng --class cpu --seq 4 -t 60
Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/

Kent Overstreet (13):
fix missing vmalloc.h includes
asm-generic/io.h: Kill vmalloc.h dependency
mm/slub: Mark slab_free_freelist_hook() __always_inline
scripts/kallysms: Always include __start and __stop symbols
fs: Convert alloc_inode_sb() to a macro
rust: Add a rust helper for krealloc()
mempool: Hook up to memory allocation profiling
mm: percpu: Introduce pcpuobj_ext
mm: percpu: Add codetag reference into pcpuobj_ext
mm: vmalloc: Enable memory allocation profiling
rhashtable: Plumb through alloc tag
MAINTAINERS: Add entries for code tagging and memory allocation
profiling
memprofiling: Documentation

Suren Baghdasaryan (23):
mm: enumerate all gfp flags
mm: introduce slabobj_ext to support slab object extensions
mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext
creation
mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
slab: objext: introduce objext_flags as extension to
page_memcg_data_flags
lib: code tagging framework
lib: code tagging module support
lib: prevent module unloading if memory is not freed
lib: add allocation tagging support for memory allocation profiling
lib: introduce support for page allocation tagging
mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation
tags
change alloc_pages name in dma_map_ops to avoid name conflicts
mm: enable page allocation tagging
mm: create new codetag references during page splitting
mm/page_ext: enable early_page_ext when
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
lib: add codetag reference into slabobj_ext
mm/slab: add allocation accounting into slab allocation and free paths
mm/slab: enable slab allocation tagging for kmalloc and friends
mm: percpu: enable per-cpu allocation tagging
lib: add memory allocations report in show_mem()
codetag: debug: skip objext checking when it's for objext itself
codetag: debug: mark codetags for reserved pages as empty
codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext
allocations

Documentation/admin-guide/sysctl/vm.rst | 16 +
Documentation/filesystems/proc.rst | 29 ++
Documentation/mm/allocation-profiling.rst | 86 ++++++
MAINTAINERS | 17 ++
arch/alpha/kernel/pci_iommu.c | 2 +-
arch/alpha/lib/checksum.c | 1 +
arch/alpha/lib/fpreg.c | 1 +
arch/alpha/lib/memcpy.c | 1 +
arch/arm/kernel/irq.c | 1 +
arch/arm/kernel/traps.c | 1 +
arch/arm64/kernel/efi.c | 1 +
arch/loongarch/include/asm/kfence.h | 1 +
arch/mips/jazz/jazzdma.c | 2 +-
arch/powerpc/kernel/dma-iommu.c | 2 +-
arch/powerpc/kernel/iommu.c | 1 +
arch/powerpc/mm/mem.c | 1 +
arch/powerpc/platforms/ps3/system-bus.c | 4 +-
arch/powerpc/platforms/pseries/vio.c | 2 +-
arch/riscv/kernel/elf_kexec.c | 1 +
arch/riscv/kernel/probes/kprobes.c | 1 +
arch/s390/kernel/cert_store.c | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/x86/include/asm/io.h | 1 +
arch/x86/kernel/amd_gart_64.c | 2 +-
arch/x86/kernel/cpu/sgx/main.c | 1 +
arch/x86/kernel/irq_64.c | 1 +
arch/x86/mm/fault.c | 1 +
drivers/accel/ivpu/ivpu_mmu_context.c | 1 +
drivers/gpu/drm/gma500/mmu.c | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 1 +
.../gpu/drm/i915/gem/selftests/mock_dmabuf.c | 1 +
drivers/gpu/drm/i915/gt/shmem_utils.c | 1 +
drivers/gpu/drm/i915/gvt/firmware.c | 1 +
drivers/gpu/drm/i915/gvt/gtt.c | 1 +
drivers/gpu/drm/i915/gvt/handlers.c | 1 +
drivers/gpu/drm/i915/gvt/mmio.c | 1 +
drivers/gpu/drm/i915/gvt/vgpu.c | 1 +
drivers/gpu/drm/i915/intel_gvt.c | 1 +
drivers/gpu/drm/imagination/pvr_vm_mips.c | 1 +
drivers/gpu/drm/mediatek/mtk_drm_gem.c | 1 +
drivers/gpu/drm/omapdrm/omap_gem.c | 1 +
drivers/gpu/drm/v3d/v3d_bo.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_binding.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 1 +
drivers/gpu/drm/xen/xen_drm_front_gem.c | 1 +
drivers/hwtracing/coresight/coresight-trbe.c | 1 +
drivers/iommu/dma-iommu.c | 2 +-
.../marvell/octeon_ep/octep_pfvf_mbox.c | 1 +
.../net/ethernet/microsoft/mana/hw_channel.c | 1 +
drivers/parisc/ccio-dma.c | 2 +-
drivers/parisc/sba_iommu.c | 2 +-
drivers/platform/x86/uv_sysfs.c | 1 +
drivers/scsi/mpi3mr/mpi3mr_transport.c | 2 +
drivers/staging/media/atomisp/pci/hmm/hmm.c | 2 +-
drivers/vfio/pci/pds/dirty.c | 1 +
drivers/virt/acrn/mm.c | 1 +
drivers/virtio/virtio_mem.c | 1 +
drivers/xen/grant-dma-ops.c | 2 +-
drivers/xen/swiotlb-xen.c | 2 +-
include/asm-generic/codetag.lds.h | 14 +
include/asm-generic/io.h | 1 -
include/asm-generic/vmlinux.lds.h | 3 +
include/linux/alloc_tag.h | 195 ++++++++++++
include/linux/codetag.h | 81 +++++
include/linux/dma-map-ops.h | 2 +-
include/linux/fortify-string.h | 5 +-
include/linux/fs.h | 6 +-
include/linux/gfp.h | 126 +++++---
include/linux/gfp_types.h | 101 +++++--
include/linux/memcontrol.h | 56 +++-
include/linux/mempool.h | 73 +++--
include/linux/mm.h | 9 +
include/linux/mm_types.h | 4 +-
include/linux/page_ext.h | 1 -
include/linux/pagemap.h | 9 +-
include/linux/pds/pds_common.h | 2 +
include/linux/percpu.h | 27 +-
include/linux/pgalloc_tag.h | 110 +++++++
include/linux/rhashtable-types.h | 11 +-
include/linux/sched.h | 24 ++
include/linux/slab.h | 175 +++++------
include/linux/string.h | 4 +-
include/linux/vmalloc.h | 60 +++-
include/rdma/rdmavt_qp.h | 1 +
init/Kconfig | 4 +
kernel/dma/mapping.c | 4 +-
kernel/kallsyms_selftest.c | 2 +-
kernel/module/main.c | 25 +-
lib/Kconfig.debug | 31 ++
lib/Makefile | 3 +
lib/alloc_tag.c | 204 +++++++++++++
lib/codetag.c | 283 ++++++++++++++++++
lib/rhashtable.c | 28 +-
mm/compaction.c | 7 +-
mm/debug_vm_pgtable.c | 1 +
mm/filemap.c | 6 +-
mm/huge_memory.c | 2 +
mm/kfence/core.c | 14 +-
mm/kfence/kfence.h | 4 +-
mm/memcontrol.c | 56 +---
mm/mempolicy.c | 52 ++--
mm/mempool.c | 36 +--
mm/mm_init.c | 13 +-
mm/nommu.c | 64 ++--
mm/page_alloc.c | 66 ++--
mm/page_ext.c | 13 +
mm/page_owner.c | 2 +-
mm/percpu-internal.h | 26 +-
mm/percpu.c | 120 +++-----
mm/show_mem.c | 26 ++
mm/slab.h | 126 ++++++--
mm/slab_common.c | 6 +-
mm/slub.c | 244 +++++++++++----
mm/util.c | 44 +--
mm/vmalloc.c | 88 +++---
rust/helpers.c | 8 +
scripts/kallsyms.c | 13 +
scripts/module.lds.S | 7 +
sound/pci/hda/cs35l41_hda.c | 1 +
123 files changed, 2269 insertions(+), 682 deletions(-)
create mode 100644 Documentation/mm/allocation-profiling.rst
create mode 100644 include/asm-generic/codetag.lds.h
create mode 100644 include/linux/alloc_tag.h
create mode 100644 include/linux/codetag.h
create mode 100644 include/linux/pgalloc_tag.h
create mode 100644 lib/alloc_tag.c
create mode 100644 lib/codetag.c


base-commit: 39133352cbed6626956d38ed72012f49b0421e7b
--
2.44.0.rc0.258.g7320e95886-goog



2024-02-21 19:51:38

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 02/36] asm-generic/io.h: Kill vmalloc.h dependency

From: Kent Overstreet <[email protected]>

Needed to avoid a new circular dependency with the memory allocation
profiling series.

Naturally, a whole bunch of files needed to include vmalloc.h that were
previously getting it implicitly.

Signed-off-by: Kent Overstreet <[email protected]>
---
include/asm-generic/io.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index bac63e874c7b..c27313414a82 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -991,7 +991,6 @@ static inline void iowrite64_rep(volatile void __iomem *addr,

#ifdef __KERNEL__

-#include <linux/vmalloc.h>
#define __io_virt(x) ((void __force *)(x))

/*
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:51:38

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 23/36] mm/slab: enable slab allocation tagging for kmalloc and friends

Redefine kmalloc, krealloc, kzalloc, kcalloc, etc. to record allocations
and deallocations done by these functions.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
include/linux/fortify-string.h | 5 +-
include/linux/slab.h | 169 +++++++++++++++++----------------
include/linux/string.h | 4 +-
mm/slab_common.c | 6 +-
mm/slub.c | 52 +++++-----
mm/util.c | 20 ++--
6 files changed, 130 insertions(+), 126 deletions(-)

diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
index 89a6888f2f9e..55f66bd8a366 100644
--- a/include/linux/fortify-string.h
+++ b/include/linux/fortify-string.h
@@ -697,9 +697,9 @@ __FORTIFY_INLINE void *memchr_inv(const void * const POS0 p, int c, size_t size)
return __real_memchr_inv(p, c, size);
}

-extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup)
+extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup_noprof)
__realloc_size(2);
-__FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp)
+__FORTIFY_INLINE void *kmemdup_noprof(const void * const POS0 p, size_t size, gfp_t gfp)
{
const size_t p_size = __struct_size(p);

@@ -709,6 +709,7 @@ __FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp
fortify_panic(__func__);
return __real_kmemdup(p, size, gfp);
}
+#define kmemdup(...) alloc_hooks(kmemdup_noprof(__VA_ARGS__))

/**
* strcpy - Copy a string into another string buffer
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 58794043ab5b..61e2a486d529 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -229,7 +229,10 @@ int kmem_cache_shrink(struct kmem_cache *s);
/*
* Common kmalloc functions provided by all allocators
*/
-void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2);
+void * __must_check krealloc_noprof(const void *objp, size_t new_size,
+ gfp_t flags) __realloc_size(2);
+#define krealloc(...) alloc_hooks(krealloc_noprof(__VA_ARGS__))
+
void kfree(const void *objp);
void kfree_sensitive(const void *objp);
size_t __ksize(const void *objp);
@@ -481,7 +484,10 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
static_assert(PAGE_SHIFT <= 20);
#define kmalloc_index(s) __kmalloc_index(s, true)

-void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
+#include <linux/alloc_tag.h>
+
+void *__kmalloc_noprof(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
+#define __kmalloc(...) alloc_hooks(__kmalloc_noprof(__VA_ARGS__))

/**
* kmem_cache_alloc - Allocate an object
@@ -493,9 +499,14 @@ void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_siz
*
* Return: pointer to the new object or %NULL in case of error
*/
-void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) __assume_slab_alignment __malloc;
-void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
- gfp_t gfpflags) __assume_slab_alignment __malloc;
+void *kmem_cache_alloc_noprof(struct kmem_cache *cachep,
+ gfp_t flags) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc(...) alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__))
+
+void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
+
void kmem_cache_free(struct kmem_cache *s, void *objp);

/*
@@ -506,29 +517,40 @@ void kmem_cache_free(struct kmem_cache *s, void *objp);
* Note that interrupts must be enabled when calling these functions.
*/
void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p);
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p);
+
+int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t size, void **p);
+#define kmem_cache_alloc_bulk(...) alloc_hooks(kmem_cache_alloc_bulk_noprof(__VA_ARGS__))

static __always_inline void kfree_bulk(size_t size, void **p)
{
kmem_cache_free_bulk(NULL, size, p);
}

-void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
+void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
__alloc_size(1);
-void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
- __malloc;
+#define __kmalloc_node(...) alloc_hooks(__kmalloc_node_noprof(__VA_ARGS__))
+
+void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags,
+ int node) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc_node(...) alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__))

-void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
+void *kmalloc_trace_noprof(struct kmem_cache *s, gfp_t flags, size_t size)
__assume_kmalloc_alignment __alloc_size(3);

-void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
- int node, size_t size) __assume_kmalloc_alignment
+void *kmalloc_node_trace_noprof(struct kmem_cache *s, gfp_t gfpflags,
+ int node, size_t size) __assume_kmalloc_alignment
__alloc_size(4);
-void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
+#define kmalloc_trace(...) alloc_hooks(kmalloc_trace_noprof(__VA_ARGS__))
+
+#define kmalloc_node_trace(...) alloc_hooks(kmalloc_node_trace_noprof(__VA_ARGS__))
+
+void *kmalloc_large_noprof(size_t size, gfp_t flags) __assume_page_alignment
__alloc_size(1);
+#define kmalloc_large(...) alloc_hooks(kmalloc_large_noprof(__VA_ARGS__))

-void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment
+void *kmalloc_large_node_noprof(size_t size, gfp_t flags, int node) __assume_page_alignment
__alloc_size(1);
+#define kmalloc_large_node(...) alloc_hooks(kmalloc_large_node_noprof(__VA_ARGS__))

/**
* kmalloc - allocate kernel memory
@@ -584,37 +606,39 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_align
* Try really hard to succeed the allocation but fail
* eventually.
*/
-static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
+static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t flags)
{
if (__builtin_constant_p(size) && size) {
unsigned int index;

if (size > KMALLOC_MAX_CACHE_SIZE)
- return kmalloc_large(size, flags);
+ return kmalloc_large_noprof(size, flags);

index = kmalloc_index(size);
- return kmalloc_trace(
+ return kmalloc_trace_noprof(
kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
flags, size);
}
- return __kmalloc(size, flags);
+ return __kmalloc_noprof(size, flags);
}
+#define kmalloc(...) alloc_hooks(kmalloc_noprof(__VA_ARGS__))

-static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
+static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gfp_t flags, int node)
{
if (__builtin_constant_p(size) && size) {
unsigned int index;

if (size > KMALLOC_MAX_CACHE_SIZE)
- return kmalloc_large_node(size, flags, node);
+ return kmalloc_large_node_noprof(size, flags, node);

index = kmalloc_index(size);
- return kmalloc_node_trace(
+ return kmalloc_node_trace_noprof(
kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
flags, node, size);
}
- return __kmalloc_node(size, flags, node);
+ return __kmalloc_node_noprof(size, flags, node);
}
+#define kmalloc_node(...) alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))

/**
* kmalloc_array - allocate memory for an array.
@@ -622,16 +646,17 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_t flags)
+static inline __alloc_size(1, 2) void *kmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
{
size_t bytes;

if (unlikely(check_mul_overflow(n, size, &bytes)))
return NULL;
if (__builtin_constant_p(n) && __builtin_constant_p(size))
- return kmalloc(bytes, flags);
- return __kmalloc(bytes, flags);
+ return kmalloc_noprof(bytes, flags);
+ return kmalloc_noprof(bytes, flags);
}
+#define kmalloc_array(...) alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))

/**
* krealloc_array - reallocate memory for an array.
@@ -640,18 +665,19 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_
* @new_size: new size of a single member of the array
* @flags: the type of memory to allocate (see kmalloc)
*/
-static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
- size_t new_n,
- size_t new_size,
- gfp_t flags)
+static inline __realloc_size(2, 3) void * __must_check krealloc_array_noprof(void *p,
+ size_t new_n,
+ size_t new_size,
+ gfp_t flags)
{
size_t bytes;

if (unlikely(check_mul_overflow(new_n, new_size, &bytes)))
return NULL;

- return krealloc(p, bytes, flags);
+ return krealloc_noprof(p, bytes, flags);
}
+#define krealloc_array(...) alloc_hooks(krealloc_array_noprof(__VA_ARGS__))

/**
* kcalloc - allocate memory for an array. The memory is set to zero.
@@ -659,16 +685,12 @@ static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flags)
-{
- return kmalloc_array(n, size, flags | __GFP_ZERO);
-}
+#define kcalloc(_n, _size, _flags) kmalloc_array(_n, _size, (_flags) | __GFP_ZERO)

-void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
+void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags, int node,
unsigned long caller) __alloc_size(1);
-#define kmalloc_node_track_caller(size, flags, node) \
- __kmalloc_node_track_caller(size, flags, node, \
- _RET_IP_)
+#define kmalloc_node_track_caller(...) \
+ alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_))

/*
* kmalloc_track_caller is a special version of kmalloc that records the
@@ -678,11 +700,9 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
* allocator where we care about the real place the memory allocation
* request comes from.
*/
-#define kmalloc_track_caller(size, flags) \
- __kmalloc_node_track_caller(size, flags, \
- NUMA_NO_NODE, _RET_IP_)
+#define kmalloc_track_caller(...) kmalloc_node_track_caller(__VA_ARGS__, NUMA_NO_NODE)

-static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size, gfp_t flags,
+static inline __alloc_size(1, 2) void *kmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags,
int node)
{
size_t bytes;
@@ -690,75 +710,56 @@ static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size,
if (unlikely(check_mul_overflow(n, size, &bytes)))
return NULL;
if (__builtin_constant_p(n) && __builtin_constant_p(size))
- return kmalloc_node(bytes, flags, node);
- return __kmalloc_node(bytes, flags, node);
+ return kmalloc_node_noprof(bytes, flags, node);
+ return __kmalloc_node_noprof(bytes, flags, node);
}
+#define kmalloc_array_node(...) alloc_hooks(kmalloc_array_node_noprof(__VA_ARGS__))

-static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t flags, int node)
-{
- return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
-}
+#define kcalloc_node(_n, _size, _flags, _node) \
+ kmalloc_array_node(_n, _size, (_flags) | __GFP_ZERO, _node)

/*
* Shortcuts
*/
-static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
-{
- return kmem_cache_alloc(k, flags | __GFP_ZERO);
-}
+#define kmem_cache_zalloc(_k, _flags) kmem_cache_alloc(_k, (_flags)|__GFP_ZERO)

/**
* kzalloc - allocate memory. The memory is set to zero.
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate (see kmalloc).
*/
-static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags)
+static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags)
{
- return kmalloc(size, flags | __GFP_ZERO);
+ return kmalloc_noprof(size, flags | __GFP_ZERO);
}
+#define kzalloc(...) alloc_hooks(kzalloc_noprof(__VA_ARGS__))
+#define kzalloc_node(_size, _flags, _node) kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)

-/**
- * kzalloc_node - allocate zeroed memory from a particular memory node.
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate (see kmalloc).
- * @node: memory node from which to allocate
- */
-static inline __alloc_size(1) void *kzalloc_node(size_t size, gfp_t flags, int node)
-{
- return kmalloc_node(size, flags | __GFP_ZERO, node);
-}
+extern void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node) __alloc_size(1);
+#define kvmalloc_node(...) alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__))

-extern void *kvmalloc_node(size_t size, gfp_t flags, int node) __alloc_size(1);
-static inline __alloc_size(1) void *kvmalloc(size_t size, gfp_t flags)
-{
- return kvmalloc_node(size, flags, NUMA_NO_NODE);
-}
-static inline __alloc_size(1) void *kvzalloc_node(size_t size, gfp_t flags, int node)
-{
- return kvmalloc_node(size, flags | __GFP_ZERO, node);
-}
-static inline __alloc_size(1) void *kvzalloc(size_t size, gfp_t flags)
-{
- return kvmalloc(size, flags | __GFP_ZERO);
-}
+#define kvmalloc(_size, _flags) kvmalloc_node(_size, _flags, NUMA_NO_NODE)
+#define kvzalloc(_size, _flags) kvmalloc(_size, _flags|__GFP_ZERO)
+
+#define kvzalloc_node(_size, _flags, _node) kvmalloc_node(_size, _flags|__GFP_ZERO, _node)

-static inline __alloc_size(1, 2) void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
+static inline __alloc_size(1, 2) void *kvmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
{
size_t bytes;

if (unlikely(check_mul_overflow(n, size, &bytes)))
return NULL;

- return kvmalloc(bytes, flags);
+ return kvmalloc_node_noprof(bytes, flags, NUMA_NO_NODE);
}

-static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t flags)
-{
- return kvmalloc_array(n, size, flags | __GFP_ZERO);
-}
+#define kvmalloc_array(...) alloc_hooks(kvmalloc_array_noprof(__VA_ARGS__))
+#define kvcalloc(_n, _size, _flags) kvmalloc_array(_n, _size, _flags|__GFP_ZERO)

-extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+extern void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
__realloc_size(3);
+#define kvrealloc(...) alloc_hooks(kvrealloc_noprof(__VA_ARGS__))
+
extern void kvfree(const void *addr);
DEFINE_FREE(kvfree, void *, if (_T) kvfree(_T))

diff --git a/include/linux/string.h b/include/linux/string.h
index ab148d8dbfc1..14e4fb4340f4 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -214,7 +214,9 @@ extern void kfree_const(const void *x);
extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
extern const char *kstrdup_const(const char *s, gfp_t gfp);
extern char *kstrndup(const char *s, size_t len, gfp_t gfp);
-extern void *kmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
+extern void *kmemdup_noprof(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
+#define kmemdup(...) alloc_hooks(kmemdup_noprof(__VA_ARGS__))
+
extern void *kvmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
extern char *kmemdup_nul(const char *s, size_t len, gfp_t gfp);

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 238293b1dbe1..5f9e25626dc7 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1184,7 +1184,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
return (void *)p;
}

- ret = kmalloc_track_caller(new_size, flags);
+ ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_);
if (ret && p) {
/* Disable KASAN checks as the object's redzone is accessed. */
kasan_disable_current();
@@ -1208,7 +1208,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
*
* Return: pointer to the allocated memory or %NULL in case of error
*/
-void *krealloc(const void *p, size_t new_size, gfp_t flags)
+void *krealloc_noprof(const void *p, size_t new_size, gfp_t flags)
{
void *ret;

@@ -1223,7 +1223,7 @@ void *krealloc(const void *p, size_t new_size, gfp_t flags)

return ret;
}
-EXPORT_SYMBOL(krealloc);
+EXPORT_SYMBOL(krealloc_noprof);

/**
* kfree_sensitive - Clear sensitive information in memory before freeing
diff --git a/mm/slub.c b/mm/slub.c
index a69b6b4c8df6..920b24b4140e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3920,7 +3920,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
return object;
}

-void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+void *kmem_cache_alloc_noprof(struct kmem_cache *s, gfp_t gfpflags)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE, _RET_IP_,
s->object_size);
@@ -3929,9 +3929,9 @@ void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)

return ret;
}
-EXPORT_SYMBOL(kmem_cache_alloc);
+EXPORT_SYMBOL(kmem_cache_alloc_noprof);

-void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags)
{
void *ret = slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, _RET_IP_,
@@ -3941,10 +3941,10 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,

return ret;
}
-EXPORT_SYMBOL(kmem_cache_alloc_lru);
+EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);

/**
- * kmem_cache_alloc_node - Allocate an object on the specified node
+ * kmem_cache_alloc_node_noprof - Allocate an object on the specified node
* @s: The cache to allocate from.
* @gfpflags: See kmalloc().
* @node: node number of the target node.
@@ -3956,7 +3956,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_lru);
*
* Return: pointer to the new object or %NULL in case of error
*/
-void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
+void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t gfpflags, int node)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);

@@ -3964,7 +3964,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)

return ret;
}
-EXPORT_SYMBOL(kmem_cache_alloc_node);
+EXPORT_SYMBOL(kmem_cache_alloc_node_noprof);

/*
* To avoid unnecessary overhead, we pass through large allocation requests
@@ -3981,7 +3981,7 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
flags = kmalloc_fix_flags(flags);

flags |= __GFP_COMP;
- folio = (struct folio *)alloc_pages_node(node, flags, order);
+ folio = (struct folio *)alloc_pages_node_noprof(node, flags, order);
if (folio) {
ptr = folio_address(folio);
lruvec_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B,
@@ -3996,7 +3996,7 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
return ptr;
}

-void *kmalloc_large(size_t size, gfp_t flags)
+void *kmalloc_large_noprof(size_t size, gfp_t flags)
{
void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);

@@ -4004,9 +4004,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
flags, NUMA_NO_NODE);
return ret;
}
-EXPORT_SYMBOL(kmalloc_large);
+EXPORT_SYMBOL(kmalloc_large_noprof);

-void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+void *kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
{
void *ret = __kmalloc_large_node(size, flags, node);

@@ -4014,7 +4014,7 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
flags, node);
return ret;
}
-EXPORT_SYMBOL(kmalloc_large_node);
+EXPORT_SYMBOL(kmalloc_large_node_noprof);

static __always_inline
void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
@@ -4041,26 +4041,26 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
return ret;
}

-void *__kmalloc_node(size_t size, gfp_t flags, int node)
+void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node)
{
return __do_kmalloc_node(size, flags, node, _RET_IP_);
}
-EXPORT_SYMBOL(__kmalloc_node);
+EXPORT_SYMBOL(__kmalloc_node_noprof);

-void *__kmalloc(size_t size, gfp_t flags)
+void *__kmalloc_noprof(size_t size, gfp_t flags)
{
return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
}
-EXPORT_SYMBOL(__kmalloc);
+EXPORT_SYMBOL(__kmalloc_noprof);

-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
- int node, unsigned long caller)
+void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags,
+ int node, unsigned long caller)
{
return __do_kmalloc_node(size, flags, node, caller);
}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
+EXPORT_SYMBOL(kmalloc_node_track_caller_noprof);

-void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
+void *kmalloc_trace_noprof(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
_RET_IP_, size);
@@ -4070,9 +4070,9 @@ void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
-EXPORT_SYMBOL(kmalloc_trace);
+EXPORT_SYMBOL(kmalloc_trace_noprof);

-void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
+void *kmalloc_node_trace_noprof(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t size)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
@@ -4082,7 +4082,7 @@ void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
-EXPORT_SYMBOL(kmalloc_node_trace);
+EXPORT_SYMBOL(kmalloc_node_trace_noprof);

static noinline void free_to_partial_list(
struct kmem_cache *s, struct slab *slab,
@@ -4691,8 +4691,8 @@ static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
#endif /* CONFIG_SLUB_TINY */

/* Note that interrupts must be enabled when calling this function. */
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
- void **p)
+int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t size,
+ void **p)
{
int i;
struct obj_cgroup *objcg = NULL;
@@ -4720,7 +4720,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,

return i;
}
-EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);


/*
diff --git a/mm/util.c b/mm/util.c
index 5a6a9802583b..291f7945190f 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -115,7 +115,7 @@ char *kstrndup(const char *s, size_t max, gfp_t gfp)
EXPORT_SYMBOL(kstrndup);

/**
- * kmemdup - duplicate region of memory
+ * kmemdup_noprof - duplicate region of memory
*
* @src: memory region to duplicate
* @len: memory region length
@@ -124,16 +124,16 @@ EXPORT_SYMBOL(kstrndup);
* Return: newly allocated copy of @src or %NULL in case of error,
* result is physically contiguous. Use kfree() to free.
*/
-void *kmemdup(const void *src, size_t len, gfp_t gfp)
+void *kmemdup_noprof(const void *src, size_t len, gfp_t gfp)
{
void *p;

- p = kmalloc_track_caller(len, gfp);
+ p = kmalloc_node_track_caller_noprof(len, gfp, NUMA_NO_NODE, _RET_IP_);
if (p)
memcpy(p, src, len);
return p;
}
-EXPORT_SYMBOL(kmemdup);
+EXPORT_SYMBOL(kmemdup_noprof);

/**
* kvmemdup - duplicate region of memory
@@ -577,7 +577,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
EXPORT_SYMBOL(vm_mmap);

/**
- * kvmalloc_node - attempt to allocate physically contiguous memory, but upon
+ * kvmalloc_node_noprof - attempt to allocate physically contiguous memory, but upon
* failure, fall back to non-contiguous (vmalloc) allocation.
* @size: size of the request.
* @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
@@ -592,7 +592,7 @@ EXPORT_SYMBOL(vm_mmap);
*
* Return: pointer to the allocated memory of %NULL in case of failure
*/
-void *kvmalloc_node(size_t size, gfp_t flags, int node)
+void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
{
gfp_t kmalloc_flags = flags;
void *ret;
@@ -614,7 +614,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
kmalloc_flags &= ~__GFP_NOFAIL;
}

- ret = kmalloc_node(size, kmalloc_flags, node);
+ ret = kmalloc_node_noprof(size, kmalloc_flags, node);

/*
* It doesn't really make sense to fallback to vmalloc for sub page
@@ -643,7 +643,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
node, __builtin_return_address(0));
}
-EXPORT_SYMBOL(kvmalloc_node);
+EXPORT_SYMBOL(kvmalloc_node_noprof);

/**
* kvfree() - Free memory.
@@ -682,7 +682,7 @@ void kvfree_sensitive(const void *addr, size_t len)
}
EXPORT_SYMBOL(kvfree_sensitive);

-void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
{
void *newp;

@@ -695,7 +695,7 @@ void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
kvfree(p);
return newp;
}
-EXPORT_SYMBOL(kvrealloc);
+EXPORT_SYMBOL(kvrealloc_noprof);

/**
* __vmalloc_array - allocate memory for a virtually contiguous array.
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:52:00

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 01/36] fix missing vmalloc.h includes

From: Kent Overstreet <[email protected]>

The next patch drops vmalloc.h from a system header in order to fix
a circular dependency; this adds it to all the files that were pulling
it in implicitly.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
arch/alpha/lib/checksum.c | 1 +
arch/alpha/lib/fpreg.c | 1 +
arch/alpha/lib/memcpy.c | 1 +
arch/arm/kernel/irq.c | 1 +
arch/arm/kernel/traps.c | 1 +
arch/arm64/kernel/efi.c | 1 +
arch/loongarch/include/asm/kfence.h | 1 +
arch/powerpc/kernel/iommu.c | 1 +
arch/powerpc/mm/mem.c | 1 +
arch/riscv/kernel/elf_kexec.c | 1 +
arch/riscv/kernel/probes/kprobes.c | 1 +
arch/s390/kernel/cert_store.c | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/x86/include/asm/io.h | 1 +
arch/x86/kernel/cpu/sgx/main.c | 1 +
arch/x86/kernel/irq_64.c | 1 +
arch/x86/mm/fault.c | 1 +
drivers/accel/ivpu/ivpu_mmu_context.c | 1 +
drivers/gpu/drm/gma500/mmu.c | 1 +
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 1 +
drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c | 1 +
drivers/gpu/drm/i915/gt/shmem_utils.c | 1 +
drivers/gpu/drm/i915/gvt/firmware.c | 1 +
drivers/gpu/drm/i915/gvt/gtt.c | 1 +
drivers/gpu/drm/i915/gvt/handlers.c | 1 +
drivers/gpu/drm/i915/gvt/mmio.c | 1 +
drivers/gpu/drm/i915/gvt/vgpu.c | 1 +
drivers/gpu/drm/i915/intel_gvt.c | 1 +
drivers/gpu/drm/imagination/pvr_vm_mips.c | 1 +
drivers/gpu/drm/mediatek/mtk_drm_gem.c | 1 +
drivers/gpu/drm/omapdrm/omap_gem.c | 1 +
drivers/gpu/drm/v3d/v3d_bo.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_binding.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 1 +
drivers/gpu/drm/xen/xen_drm_front_gem.c | 1 +
drivers/hwtracing/coresight/coresight-trbe.c | 1 +
drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c | 1 +
drivers/net/ethernet/microsoft/mana/hw_channel.c | 1 +
drivers/platform/x86/uv_sysfs.c | 1 +
drivers/scsi/mpi3mr/mpi3mr_transport.c | 2 ++
drivers/vfio/pci/pds/dirty.c | 1 +
drivers/virt/acrn/mm.c | 1 +
drivers/virtio/virtio_mem.c | 1 +
include/linux/pds/pds_common.h | 2 ++
include/rdma/rdmavt_qp.h | 1 +
mm/debug_vm_pgtable.c | 1 +
sound/pci/hda/cs35l41_hda.c | 1 +
51 files changed, 53 insertions(+)

diff --git a/arch/alpha/lib/checksum.c b/arch/alpha/lib/checksum.c
index 3f35c3ed6948..c29b98ef9c82 100644
--- a/arch/alpha/lib/checksum.c
+++ b/arch/alpha/lib/checksum.c
@@ -14,6 +14,7 @@
#include <linux/string.h>

#include <asm/byteorder.h>
+#include <asm/checksum.h>

static inline unsigned short from64to16(unsigned long x)
{
diff --git a/arch/alpha/lib/fpreg.c b/arch/alpha/lib/fpreg.c
index 7c08b225261c..3d32165043f8 100644
--- a/arch/alpha/lib/fpreg.c
+++ b/arch/alpha/lib/fpreg.c
@@ -8,6 +8,7 @@
#include <linux/compiler.h>
#include <linux/export.h>
#include <linux/preempt.h>
+#include <asm/fpu.h>
#include <asm/thread_info.h>

#if defined(CONFIG_ALPHA_EV6) || defined(CONFIG_ALPHA_EV67)
diff --git a/arch/alpha/lib/memcpy.c b/arch/alpha/lib/memcpy.c
index cbac3dc6d963..0e536a1a39ff 100644
--- a/arch/alpha/lib/memcpy.c
+++ b/arch/alpha/lib/memcpy.c
@@ -18,6 +18,7 @@

#include <linux/types.h>
#include <linux/export.h>
+#include <linux/string.h>

/*
* This should be done in one go with ldq_u*2/mask/stq_u. Do it
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index fe28fc1f759d..dab42d066d06 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -32,6 +32,7 @@
#include <linux/kallsyms.h>
#include <linux/proc_fs.h>
#include <linux/export.h>
+#include <linux/vmalloc.h>

#include <asm/hardware/cache-l2x0.h>
#include <asm/hardware/cache-uniphier.h>
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 3bad79db5d6e..27addbf0f98c 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -26,6 +26,7 @@
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
#include <linux/irq.h>
+#include <linux/vmalloc.h>

#include <linux/atomic.h>
#include <asm/cacheflush.h>
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 0228001347be..a0dc6b88b11b 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -10,6 +10,7 @@
#include <linux/efi.h>
#include <linux/init.h>
#include <linux/screen_info.h>
+#include <linux/vmalloc.h>

#include <asm/efi.h>
#include <asm/stacktrace.h>
diff --git a/arch/loongarch/include/asm/kfence.h b/arch/loongarch/include/asm/kfence.h
index 6c82aea1c993..54062656dc7b 100644
--- a/arch/loongarch/include/asm/kfence.h
+++ b/arch/loongarch/include/asm/kfence.h
@@ -10,6 +10,7 @@
#define _ASM_LOONGARCH_KFENCE_H

#include <linux/kfence.h>
+#include <linux/vmalloc.h>
#include <asm/pgtable.h>
#include <asm/tlb.h>

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a9bebfd56b3b..25782d361884 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -26,6 +26,7 @@
#include <linux/iommu.h>
#include <linux/sched.h>
#include <linux/debugfs.h>
+#include <linux/vmalloc.h>
#include <asm/io.h>
#include <asm/iommu.h>
#include <asm/pci-bridge.h>
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 3a440004b97d..a197d4c2244b 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -16,6 +16,7 @@
#include <linux/highmem.h>
#include <linux/suspend.h>
#include <linux/dma-direct.h>
+#include <linux/vmalloc.h>

#include <asm/swiotlb.h>
#include <asm/machdep.h>
diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c
index 5bd1ec3341fe..92b1e16f99c4 100644
--- a/arch/riscv/kernel/elf_kexec.c
+++ b/arch/riscv/kernel/elf_kexec.c
@@ -19,6 +19,7 @@
#include <linux/libfdt.h>
#include <linux/types.h>
#include <linux/memblock.h>
+#include <linux/vmalloc.h>
#include <asm/setup.h>

int arch_kimage_file_post_load_cleanup(struct kimage *image)
diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index 2f08c14a933d..71a8b8945b26 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -6,6 +6,7 @@
#include <linux/extable.h>
#include <linux/slab.h>
#include <linux/stop_machine.h>
+#include <linux/vmalloc.h>
#include <asm/ptrace.h>
#include <linux/uaccess.h>
#include <asm/sections.h>
diff --git a/arch/s390/kernel/cert_store.c b/arch/s390/kernel/cert_store.c
index 554447768bdd..bf983513dd33 100644
--- a/arch/s390/kernel/cert_store.c
+++ b/arch/s390/kernel/cert_store.c
@@ -21,6 +21,7 @@
#include <linux/seq_file.h>
#include <linux/slab.h>
#include <linux/sysfs.h>
+#include <linux/vmalloc.h>
#include <crypto/sha2.h>
#include <keys/user-type.h>
#include <asm/debug.h>
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index ba75f6bee774..0854a8450a6e 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -20,6 +20,7 @@
#include <linux/gfp.h>
#include <linux/crash_dump.h>
#include <linux/debug_locks.h>
+#include <linux/vmalloc.h>
#include <asm/asm-extable.h>
#include <asm/diag.h>
#include <asm/ipl.h>
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 3814a9263d64..c6b799d28126 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -42,6 +42,7 @@
#include <asm/early_ioremap.h>
#include <asm/pgtable_types.h>
#include <asm/shared/io.h>
+#include <asm/special_insns.h>

#define build_mmio_read(name, size, type, reg, barrier) \
static inline type name(const volatile void __iomem *addr) \
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 166692f2d501..27892e57c4ef 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -13,6 +13,7 @@
#include <linux/sched/signal.h>
#include <linux/slab.h>
#include <linux/sysfs.h>
+#include <linux/vmalloc.h>
#include <asm/sgx.h>
#include "driver.h"
#include "encl.h"
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index fe0c859873d1..ade0043ce56e 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -18,6 +18,7 @@
#include <linux/uaccess.h>
#include <linux/smp.h>
#include <linux/sched/task_stack.h>
+#include <linux/vmalloc.h>

#include <asm/cpu_entry_area.h>
#include <asm/softirq_stack.h>
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 679b09cfe241..af223e57aa63 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -20,6 +20,7 @@
#include <linux/efi.h> /* efi_crash_gracefully_on_page_fault()*/
#include <linux/mm_types.h>
#include <linux/mm.h> /* find_and_lock_vma() */
+#include <linux/vmalloc.h>

#include <asm/cpufeature.h> /* boot_cpu_has, ... */
#include <asm/traps.h> /* dotraplinkage, ... */
diff --git a/drivers/accel/ivpu/ivpu_mmu_context.c b/drivers/accel/ivpu/ivpu_mmu_context.c
index fe6161299236..128aef8e5a19 100644
--- a/drivers/accel/ivpu/ivpu_mmu_context.c
+++ b/drivers/accel/ivpu/ivpu_mmu_context.c
@@ -6,6 +6,7 @@
#include <linux/bitfield.h>
#include <linux/highmem.h>
#include <linux/set_memory.h>
+#include <linux/vmalloc.h>

#include <drm/drm_cache.h>

diff --git a/drivers/gpu/drm/gma500/mmu.c b/drivers/gpu/drm/gma500/mmu.c
index a70b01ccdf70..4d78b33eaa82 100644
--- a/drivers/gpu/drm/gma500/mmu.c
+++ b/drivers/gpu/drm/gma500/mmu.c
@@ -5,6 +5,7 @@
**************************************************************************/

#include <linux/highmem.h>
+#include <linux/vmalloc.h>

#include "mmu.h"
#include "psb_drv.h"
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 0ba955611dfb..8780aa243105 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -5,6 +5,7 @@
*/

#include <drm/drm_cache.h>
+#include <linux/vmalloc.h>

#include "gt/intel_gt.h"
#include "gt/intel_tlb.h"
diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
index b2a5882b8f81..075657018739 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
@@ -4,6 +4,7 @@
* Copyright © 2016 Intel Corporation
*/

+#include <linux/vmalloc.h>
#include "mock_dmabuf.h"

static struct sg_table *mock_map_dma_buf(struct dma_buf_attachment *attachment,
diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c
index bccc3a1200bc..1fb6ff77fd89 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -7,6 +7,7 @@
#include <linux/mm.h>
#include <linux/pagemap.h>
#include <linux/shmem_fs.h>
+#include <linux/vmalloc.h>

#include "i915_drv.h"
#include "gem/i915_gem_object.h"
diff --git a/drivers/gpu/drm/i915/gvt/firmware.c b/drivers/gpu/drm/i915/gvt/firmware.c
index 4dd52ac2043e..d800d267f0e9 100644
--- a/drivers/gpu/drm/i915/gvt/firmware.c
+++ b/drivers/gpu/drm/i915/gvt/firmware.c
@@ -30,6 +30,7 @@

#include <linux/firmware.h>
#include <linux/crc32.h>
+#include <linux/vmalloc.h>

#include "i915_drv.h"
#include "gvt.h"
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 094fca9b0e73..58cca4906f41 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -39,6 +39,7 @@
#include "trace.h"

#include "gt/intel_gt_regs.h"
+#include <linux/vmalloc.h>

#if defined(VERBOSE_DEBUG)
#define gvt_vdbg_mm(fmt, args...) gvt_dbg_mm(fmt, ##args)
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
index efcb00472be2..ea9c30092767 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -52,6 +52,7 @@
#include "display/skl_watermark_regs.h"
#include "display/vlv_dsi_pll_regs.h"
#include "gt/intel_gt_regs.h"
+#include <linux/vmalloc.h>

/* XXX FIXME i915 has changed PP_XXX definition */
#define PCH_PP_STATUS _MMIO(0xc7200)
diff --git a/drivers/gpu/drm/i915/gvt/mmio.c b/drivers/gpu/drm/i915/gvt/mmio.c
index 5b5def6ddef7..780762f28aa4 100644
--- a/drivers/gpu/drm/i915/gvt/mmio.c
+++ b/drivers/gpu/drm/i915/gvt/mmio.c
@@ -33,6 +33,7 @@
*
*/

+#include <linux/vmalloc.h>
#include "i915_drv.h"
#include "i915_reg.h"
#include "gvt.h"
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index 08ad1bd651f1..63c751ca4119 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -34,6 +34,7 @@
#include "i915_drv.h"
#include "gvt.h"
#include "i915_pvinfo.h"
+#include <linux/vmalloc.h>

void populate_pvinfo_page(struct intel_vgpu *vgpu)
{
diff --git a/drivers/gpu/drm/i915/intel_gvt.c b/drivers/gpu/drm/i915/intel_gvt.c
index 9b6d87c8b583..5a01d60e5186 100644
--- a/drivers/gpu/drm/i915/intel_gvt.c
+++ b/drivers/gpu/drm/i915/intel_gvt.c
@@ -28,6 +28,7 @@
#include "gt/intel_context.h"
#include "gt/intel_ring.h"
#include "gt/shmem_utils.h"
+#include <linux/vmalloc.h>

/**
* DOC: Intel GVT-g host support
diff --git a/drivers/gpu/drm/imagination/pvr_vm_mips.c b/drivers/gpu/drm/imagination/pvr_vm_mips.c
index b7fef3c797e6..6563dcde109c 100644
--- a/drivers/gpu/drm/imagination/pvr_vm_mips.c
+++ b/drivers/gpu/drm/imagination/pvr_vm_mips.c
@@ -14,6 +14,7 @@
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/types.h>
+#include <linux/vmalloc.h>

/**
* pvr_vm_mips_init() - Initialise MIPS FW pagetable
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
index 4f2e3feabc0f..3e519869b632 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
@@ -4,6 +4,7 @@
*/

#include <linux/dma-buf.h>
+#include <linux/vmalloc.h>

#include <drm/drm.h>
#include <drm/drm_device.h>
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index 3421e8389222..9ea0c64c26b5 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -9,6 +9,7 @@
#include <linux/shmem_fs.h>
#include <linux/spinlock.h>
#include <linux/pfn_t.h>
+#include <linux/vmalloc.h>

#include <drm/drm_prime.h>
#include <drm/drm_vma_manager.h>
diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 1bdfac8beafd..bd078852cd60 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -21,6 +21,7 @@

#include <linux/dma-buf.h>
#include <linux/pfn_t.h>
+#include <linux/vmalloc.h>

#include "v3d_drv.h"
#include "uapi/drm/v3d_drm.h"
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_binding.c b/drivers/gpu/drm/vmwgfx/vmwgfx_binding.c
index ae2de914eb89..2731f6ded1c2 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_binding.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_binding.c
@@ -54,6 +54,7 @@
#include "vmwgfx_drv.h"
#include "vmwgfx_binding.h"
#include "device_include/svga3d_reg.h"
+#include <linux/vmalloc.h>

#define VMW_BINDING_RT_BIT 0
#define VMW_BINDING_PS_BIT 1
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c b/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c
index 195ff8792e5a..dd4ca6a9c690 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c
@@ -31,6 +31,7 @@
#include <drm/ttm/ttm_placement.h>

#include <linux/sched/signal.h>
+#include <linux/vmalloc.h>

bool vmw_supports_3d(struct vmw_private *dev_priv)
{
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c b/drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c
index 829df395c2ed..6e6beff9e262 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c
@@ -25,6 +25,7 @@
*
**************************************************************************/

+#include <linux/vmalloc.h>
#include "vmwgfx_devcaps.h"

#include "vmwgfx_drv.h"
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index d3e308fdfd5b..7a451410ad77 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -53,6 +53,7 @@
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/version.h>
+#include <linux/vmalloc.h>

#define VMWGFX_DRIVER_DESC "Linux drm driver for VMware graphics devices"

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 36987ef3fc30..4ce22843015e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -35,6 +35,7 @@

#include <linux/sync_file.h>
#include <linux/hashtable.h>
+#include <linux/vmalloc.h>

/*
* Helper macro to get dx_ctx_node if available otherwise print an error
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
index a1da5678c731..835d1eed8dd9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
@@ -31,6 +31,7 @@

#include <drm/vmwgfx_drm.h>
#include <linux/pci.h>
+#include <linux/vmalloc.h>

int vmw_getparam_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index 3ad2b4cfd1f0..63112ed975c4 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -11,6 +11,7 @@
#include <linux/dma-buf.h>
#include <linux/scatterlist.h>
#include <linux/shmem_fs.h>
+#include <linux/vmalloc.h>

#include <drm/drm_gem.h>
#include <drm/drm_prime.h>
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 6136776482e6..96a32b213669 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -17,6 +17,7 @@

#include <asm/barrier.h>
#include <asm/cpufeature.h>
+#include <linux/vmalloc.h>

#include "coresight-self-hosted-trace.h"
#include "coresight-trbe.h"
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c b/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c
index 2e2c3be8a0b4..e6eb98d70f3c 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c
@@ -15,6 +15,7 @@
#include <linux/io.h>
#include <linux/pci.h>
#include <linux/etherdevice.h>
+#include <linux/vmalloc.h>

#include "octep_config.h"
#include "octep_main.h"
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 2729a2c5acf9..11021c34e47e 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -3,6 +3,7 @@

#include <net/mana/gdma.h>
#include <net/mana/hw_channel.h>
+#include <linux/vmalloc.h>

static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 *msg_id)
{
diff --git a/drivers/platform/x86/uv_sysfs.c b/drivers/platform/x86/uv_sysfs.c
index 38d1b692d3c0..40e010877189 100644
--- a/drivers/platform/x86/uv_sysfs.c
+++ b/drivers/platform/x86/uv_sysfs.c
@@ -11,6 +11,7 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/kobject.h>
+#include <linux/vmalloc.h>
#include <asm/uv/bios.h>
#include <asm/uv/uv.h>
#include <asm/uv/uv_hub.h>
diff --git a/drivers/scsi/mpi3mr/mpi3mr_transport.c b/drivers/scsi/mpi3mr/mpi3mr_transport.c
index c0c8ab586957..408a4023406b 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_transport.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_transport.c
@@ -7,6 +7,8 @@
*
*/

+#include <linux/vmalloc.h>
+
#include "mpi3mr.h"

/**
diff --git a/drivers/vfio/pci/pds/dirty.c b/drivers/vfio/pci/pds/dirty.c
index 8ddf4346fcd5..0a161becd646 100644
--- a/drivers/vfio/pci/pds/dirty.c
+++ b/drivers/vfio/pci/pds/dirty.c
@@ -3,6 +3,7 @@

#include <linux/interval_tree.h>
#include <linux/vfio.h>
+#include <linux/vmalloc.h>

#include <linux/pds/pds_common.h>
#include <linux/pds/pds_core_if.h>
diff --git a/drivers/virt/acrn/mm.c b/drivers/virt/acrn/mm.c
index fa5d9ca6be57..c088ee1f1180 100644
--- a/drivers/virt/acrn/mm.c
+++ b/drivers/virt/acrn/mm.c
@@ -12,6 +12,7 @@
#include <linux/io.h>
#include <linux/mm.h>
#include <linux/slab.h>
+#include <linux/vmalloc.h>

#include "acrn_drv.h"

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 8e3223294442..e8355f55a8f7 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -21,6 +21,7 @@
#include <linux/bitmap.h>
#include <linux/lockdep.h>
#include <linux/log2.h>
+#include <linux/vmalloc.h>

#include <acpi/acpi_numa.h>

diff --git a/include/linux/pds/pds_common.h b/include/linux/pds/pds_common.h
index 30581e2e04cc..5802e1deef24 100644
--- a/include/linux/pds/pds_common.h
+++ b/include/linux/pds/pds_common.h
@@ -4,6 +4,8 @@
#ifndef _PDS_COMMON_H_
#define _PDS_COMMON_H_

+#include <linux/notifier.h>
+
#define PDS_CORE_DRV_NAME "pds_core"

/* the device's internal addressing uses up to 52 bits */
diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h
index 2e58d5e6ac0e..d67892944193 100644
--- a/include/rdma/rdmavt_qp.h
+++ b/include/rdma/rdmavt_qp.h
@@ -11,6 +11,7 @@
#include <rdma/ib_verbs.h>
#include <rdma/rdmavt_cq.h>
#include <rdma/rvt-abi.h>
+#include <linux/vmalloc.h>
/*
* Atomic bit definitions for r_aflags.
*/
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 5662e29fe253..d711246929aa 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -30,6 +30,7 @@
#include <linux/start_kernel.h>
#include <linux/sched/mm.h>
#include <linux/io.h>
+#include <linux/vmalloc.h>

#include <asm/cacheflush.h>
#include <asm/pgalloc.h>
diff --git a/sound/pci/hda/cs35l41_hda.c b/sound/pci/hda/cs35l41_hda.c
index d3fa6e136744..990b5bd717a1 100644
--- a/sound/pci/hda/cs35l41_hda.c
+++ b/sound/pci/hda/cs35l41_hda.c
@@ -13,6 +13,7 @@
#include <sound/soc.h>
#include <linux/pm_runtime.h>
#include <linux/spi/spi.h>
+#include <linux/vmalloc.h>
#include "hda_local.h"
#include "hda_auto_parser.h"
#include "hda_jack.h"
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:52:10

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 24/36] rust: Add a rust helper for krealloc()

From: Kent Overstreet <[email protected]>

Memory allocation profiling is turning krealloc() into a nontrivial
macro - so for now, we need a helper for it.

Until we have proper support on the rust side for memory allocation
profiling this does mean that all Rust allocations will be accounted to
the helper.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: "Björn Roy Baron" <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: [email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
rust/helpers.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/rust/helpers.c b/rust/helpers.c
index 70e59efd92bc..ad62eaf604b3 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -28,6 +28,7 @@
#include <linux/mutex.h>
#include <linux/refcount.h>
#include <linux/sched/signal.h>
+#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/wait.h>
#include <linux/workqueue.h>
@@ -157,6 +158,13 @@ void rust_helper_init_work_with_key(struct work_struct *work, work_func_t func,
}
EXPORT_SYMBOL_GPL(rust_helper_init_work_with_key);

+void * __must_check rust_helper_krealloc(const void *objp, size_t new_size,
+ gfp_t flags) __realloc_size(2)
+{
+ return krealloc(objp, new_size, flags);
+}
+EXPORT_SYMBOL_GPL(rust_helper_krealloc);
+
/*
* `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
* use it in contexts where Rust expects a `usize` like slice (array) indices.
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:52:22

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 25/36] mempool: Hook up to memory allocation profiling

From: Kent Overstreet <[email protected]>

This adds hooks to mempools for correctly annotating mempool-backed
allocations at the correct source line, so they show up correctly in
/sys/kernel/debug/allocations.

Various inline functions are converted to wrappers so that we can invoke
alloc_hooks() in fewer places.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/mempool.h | 73 ++++++++++++++++++++---------------------
mm/mempool.c | 36 ++++++++------------
2 files changed, 49 insertions(+), 60 deletions(-)

diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 7be1e32e6d42..69e65ca515ee 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -5,6 +5,8 @@
#ifndef _LINUX_MEMPOOL_H
#define _LINUX_MEMPOOL_H

+#include <linux/sched.h>
+#include <linux/alloc_tag.h>
#include <linux/wait.h>
#include <linux/compiler.h>

@@ -39,18 +41,32 @@ void mempool_exit(mempool_t *pool);
int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data,
gfp_t gfp_mask, int node_id);
-int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+
+int mempool_init_noprof(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data);
+#define mempool_init(...) \
+ alloc_hooks(mempool_init_noprof(__VA_ARGS__))

extern mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data);
-extern mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
+
+extern mempool_t *mempool_create_node_noprof(int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data,
gfp_t gfp_mask, int nid);
+#define mempool_create_node(...) \
+ alloc_hooks(mempool_create_node_noprof(__VA_ARGS__))
+
+#define mempool_create(_min_nr, _alloc_fn, _free_fn, _pool_data) \
+ mempool_create_node(_min_nr, _alloc_fn, _free_fn, _pool_data, \
+ GFP_KERNEL, NUMA_NO_NODE)

extern int mempool_resize(mempool_t *pool, int new_min_nr);
extern void mempool_destroy(mempool_t *pool);
-extern void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) __malloc;
+
+extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
+#define mempool_alloc(...) \
+ alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
+
extern void *mempool_alloc_preallocated(mempool_t *pool) __malloc;
extern void mempool_free(void *element, mempool_t *pool);

@@ -62,19 +78,10 @@ extern void mempool_free(void *element, mempool_t *pool);
void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data);
void mempool_free_slab(void *element, void *pool_data);

-static inline int
-mempool_init_slab_pool(mempool_t *pool, int min_nr, struct kmem_cache *kc)
-{
- return mempool_init(pool, min_nr, mempool_alloc_slab,
- mempool_free_slab, (void *) kc);
-}
-
-static inline mempool_t *
-mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
-{
- return mempool_create(min_nr, mempool_alloc_slab, mempool_free_slab,
- (void *) kc);
-}
+#define mempool_init_slab_pool(_pool, _min_nr, _kc) \
+ mempool_init(_pool, (_min_nr), mempool_alloc_slab, mempool_free_slab, (void *)(_kc))
+#define mempool_create_slab_pool(_min_nr, _kc) \
+ mempool_create((_min_nr), mempool_alloc_slab, mempool_free_slab, (void *)(_kc))

/*
* a mempool_alloc_t and a mempool_free_t to kmalloc and kfree the
@@ -83,17 +90,12 @@ mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data);
void mempool_kfree(void *element, void *pool_data);

-static inline int mempool_init_kmalloc_pool(mempool_t *pool, int min_nr, size_t size)
-{
- return mempool_init(pool, min_nr, mempool_kmalloc,
- mempool_kfree, (void *) size);
-}
-
-static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
-{
- return mempool_create(min_nr, mempool_kmalloc, mempool_kfree,
- (void *) size);
-}
+#define mempool_init_kmalloc_pool(_pool, _min_nr, _size) \
+ mempool_init(_pool, (_min_nr), mempool_kmalloc, mempool_kfree, \
+ (void *)(unsigned long)(_size))
+#define mempool_create_kmalloc_pool(_min_nr, _size) \
+ mempool_create((_min_nr), mempool_kmalloc, mempool_kfree, \
+ (void *)(unsigned long)(_size))

/*
* A mempool_alloc_t and mempool_free_t for a simple page allocator that
@@ -102,16 +104,11 @@ static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data);
void mempool_free_pages(void *element, void *pool_data);

-static inline int mempool_init_page_pool(mempool_t *pool, int min_nr, int order)
-{
- return mempool_init(pool, min_nr, mempool_alloc_pages,
- mempool_free_pages, (void *)(long)order);
-}
-
-static inline mempool_t *mempool_create_page_pool(int min_nr, int order)
-{
- return mempool_create(min_nr, mempool_alloc_pages, mempool_free_pages,
- (void *)(long)order);
-}
+#define mempool_init_page_pool(_pool, _min_nr, _order) \
+ mempool_init(_pool, (_min_nr), mempool_alloc_pages, \
+ mempool_free_pages, (void *)(long)(_order))
+#define mempool_create_page_pool(_min_nr, _order) \
+ mempool_create((_min_nr), mempool_alloc_pages, \
+ mempool_free_pages, (void *)(long)(_order))

#endif /* _LINUX_MEMPOOL_H */
diff --git a/mm/mempool.c b/mm/mempool.c
index dbbf0e9fb424..c47ff883cf36 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -240,17 +240,17 @@ EXPORT_SYMBOL(mempool_init_node);
*
* Return: %0 on success, negative error code otherwise.
*/
-int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data)
+int mempool_init_noprof(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+ mempool_free_t *free_fn, void *pool_data)
{
return mempool_init_node(pool, min_nr, alloc_fn, free_fn,
pool_data, GFP_KERNEL, NUMA_NO_NODE);

}
-EXPORT_SYMBOL(mempool_init);
+EXPORT_SYMBOL(mempool_init_noprof);

/**
- * mempool_create - create a memory pool
+ * mempool_create_node - create a memory pool
* @min_nr: the minimum number of elements guaranteed to be
* allocated for this pool.
* @alloc_fn: user-defined element-allocation function.
@@ -265,17 +265,9 @@ EXPORT_SYMBOL(mempool_init);
*
* Return: pointer to the created memory pool object or %NULL on error.
*/
-mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data)
-{
- return mempool_create_node(min_nr, alloc_fn, free_fn, pool_data,
- GFP_KERNEL, NUMA_NO_NODE);
-}
-EXPORT_SYMBOL(mempool_create);
-
-mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data,
- gfp_t gfp_mask, int node_id)
+mempool_t *mempool_create_node_noprof(int min_nr, mempool_alloc_t *alloc_fn,
+ mempool_free_t *free_fn, void *pool_data,
+ gfp_t gfp_mask, int node_id)
{
mempool_t *pool;

@@ -291,7 +283,7 @@ mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,

return pool;
}
-EXPORT_SYMBOL(mempool_create_node);
+EXPORT_SYMBOL(mempool_create_node_noprof);

/**
* mempool_resize - resize an existing memory pool
@@ -374,7 +366,7 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
EXPORT_SYMBOL(mempool_resize);

/**
- * mempool_alloc - allocate an element from a specific memory pool
+ * mempool_alloc_noprof - allocate an element from a specific memory pool
* @pool: pointer to the memory pool which was allocated via
* mempool_create().
* @gfp_mask: the usual allocation bitmask.
@@ -387,7 +379,7 @@ EXPORT_SYMBOL(mempool_resize);
*
* Return: pointer to the allocated element or %NULL on error.
*/
-void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
+void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
{
void *element;
unsigned long flags;
@@ -454,7 +446,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
finish_wait(&pool->wait, &wait);
goto repeat_alloc;
}
-EXPORT_SYMBOL(mempool_alloc);
+EXPORT_SYMBOL(mempool_alloc_noprof);

/**
* mempool_alloc_preallocated - allocate an element from preallocated elements
@@ -562,7 +554,7 @@ void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data)
{
struct kmem_cache *mem = pool_data;
VM_BUG_ON(mem->ctor);
- return kmem_cache_alloc(mem, gfp_mask);
+ return kmem_cache_alloc_noprof(mem, gfp_mask);
}
EXPORT_SYMBOL(mempool_alloc_slab);

@@ -580,7 +572,7 @@ EXPORT_SYMBOL(mempool_free_slab);
void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
{
size_t size = (size_t)pool_data;
- return kmalloc(size, gfp_mask);
+ return kmalloc_noprof(size, gfp_mask);
}
EXPORT_SYMBOL(mempool_kmalloc);

@@ -597,7 +589,7 @@ EXPORT_SYMBOL(mempool_kfree);
void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data)
{
int order = (int)(long)pool_data;
- return alloc_pages(gfp_mask, order);
+ return alloc_pages_noprof(gfp_mask, order);
}
EXPORT_SYMBOL(mempool_alloc_pages);

--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:52:29

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 26/36] mm: percpu: Introduce pcpuobj_ext

From: Kent Overstreet <[email protected]>

Upcoming alloc tagging patches require a place to stash per-allocation
metadata.

We already do this when memcg is enabled, so this patch generalizes the
obj_cgroup * vector in struct pcpu_chunk by creating a pcpu_obj_ext
type, which we will be adding to in an upcoming patch - similarly to the
previous slabobj_ext patch.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: [email protected]
---
mm/percpu-internal.h | 19 +++++++++++++++++--
mm/percpu.c | 30 +++++++++++++++---------------
2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index cdd0aa597a81..e62d582f4bf3 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -32,6 +32,16 @@ struct pcpu_block_md {
int nr_bits; /* total bits responsible for */
};

+struct pcpuobj_ext {
+#ifdef CONFIG_MEMCG_KMEM
+ struct obj_cgroup *cgroup;
+#endif
+};
+
+#ifdef CONFIG_MEMCG_KMEM
+#define NEED_PCPUOBJ_EXT
+#endif
+
struct pcpu_chunk {
#ifdef CONFIG_PERCPU_STATS
int nr_alloc; /* # of allocations */
@@ -64,8 +74,8 @@ struct pcpu_chunk {
int end_offset; /* additional area required to
have the region end page
aligned */
-#ifdef CONFIG_MEMCG_KMEM
- struct obj_cgroup **obj_cgroups; /* vector of object cgroups */
+#ifdef NEED_PCPUOBJ_EXT
+ struct pcpuobj_ext *obj_exts; /* vector of object cgroups */
#endif

int nr_pages; /* # of pages served by this chunk */
@@ -74,6 +84,11 @@ struct pcpu_chunk {
unsigned long populated[]; /* populated bitmap */
};

+static inline bool need_pcpuobj_ext(void)
+{
+ return !mem_cgroup_kmem_disabled();
+}
+
extern spinlock_t pcpu_lock;

extern struct list_head *pcpu_chunk_lists;
diff --git a/mm/percpu.c b/mm/percpu.c
index 4e11fc1e6def..2e5edaad9cc3 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1392,9 +1392,9 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr,
panic("%s: Failed to allocate %zu bytes\n", __func__,
alloc_size);

-#ifdef CONFIG_MEMCG_KMEM
+#ifdef NEED_PCPUOBJ_EXT
/* first chunk is free to use */
- chunk->obj_cgroups = NULL;
+ chunk->obj_exts = NULL;
#endif
pcpu_init_md_blocks(chunk);

@@ -1463,12 +1463,12 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)
if (!chunk->md_blocks)
goto md_blocks_fail;

-#ifdef CONFIG_MEMCG_KMEM
- if (!mem_cgroup_kmem_disabled()) {
- chunk->obj_cgroups =
+#ifdef NEED_PCPUOBJ_EXT
+ if (need_pcpuobj_ext()) {
+ chunk->obj_exts =
pcpu_mem_zalloc(pcpu_chunk_map_bits(chunk) *
- sizeof(struct obj_cgroup *), gfp);
- if (!chunk->obj_cgroups)
+ sizeof(struct pcpuobj_ext), gfp);
+ if (!chunk->obj_exts)
goto objcg_fail;
}
#endif
@@ -1480,7 +1480,7 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)

return chunk;

-#ifdef CONFIG_MEMCG_KMEM
+#ifdef NEED_PCPUOBJ_EXT
objcg_fail:
pcpu_mem_free(chunk->md_blocks);
#endif
@@ -1498,8 +1498,8 @@ static void pcpu_free_chunk(struct pcpu_chunk *chunk)
{
if (!chunk)
return;
-#ifdef CONFIG_MEMCG_KMEM
- pcpu_mem_free(chunk->obj_cgroups);
+#ifdef NEED_PCPUOBJ_EXT
+ pcpu_mem_free(chunk->obj_exts);
#endif
pcpu_mem_free(chunk->md_blocks);
pcpu_mem_free(chunk->bound_map);
@@ -1646,9 +1646,9 @@ static void pcpu_memcg_post_alloc_hook(struct obj_cgroup *objcg,
if (!objcg)
return;

- if (likely(chunk && chunk->obj_cgroups)) {
+ if (likely(chunk && chunk->obj_exts)) {
obj_cgroup_get(objcg);
- chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = objcg;
+ chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup = objcg;

rcu_read_lock();
mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_PERCPU_B,
@@ -1663,13 +1663,13 @@ static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
{
struct obj_cgroup *objcg;

- if (unlikely(!chunk->obj_cgroups))
+ if (unlikely(!chunk->obj_exts))
return;

- objcg = chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT];
+ objcg = chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup;
if (!objcg)
return;
- chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = NULL;
+ chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup = NULL;

obj_cgroup_uncharge(objcg, pcpu_obj_full_size(size));

--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:52:44

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 27/36] mm: percpu: Add codetag reference into pcpuobj_ext

From: Kent Overstreet <[email protected]>

To store codetag for every per-cpu allocation, a codetag reference is
embedded into pcpuobj_ext when CONFIG_MEM_ALLOC_PROFILING=y. Hooks to
use the newly introduced codetag are added.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
mm/percpu-internal.h | 11 +++++++++--
mm/percpu.c | 26 ++++++++++++++++++++++++++
2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index e62d582f4bf3..7e42f0ca3b7b 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -36,9 +36,12 @@ struct pcpuobj_ext {
#ifdef CONFIG_MEMCG_KMEM
struct obj_cgroup *cgroup;
#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ union codetag_ref tag;
+#endif
};

-#ifdef CONFIG_MEMCG_KMEM
+#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEM_ALLOC_PROFILING)
#define NEED_PCPUOBJ_EXT
#endif

@@ -86,7 +89,11 @@ struct pcpu_chunk {

static inline bool need_pcpuobj_ext(void)
{
- return !mem_cgroup_kmem_disabled();
+ if (IS_ENABLED(CONFIG_MEM_ALLOC_PROFILING))
+ return true;
+ if (!mem_cgroup_kmem_disabled())
+ return true;
+ return false;
}

extern spinlock_t pcpu_lock;
diff --git a/mm/percpu.c b/mm/percpu.c
index 2e5edaad9cc3..578531ea1f43 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1699,6 +1699,32 @@ static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
}
#endif /* CONFIG_MEMCG_KMEM */

+#ifdef CONFIG_MEM_ALLOC_PROFILING
+static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off,
+ size_t size)
+{
+ if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts)) {
+ alloc_tag_add(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag,
+ current->alloc_tag, size);
+ }
+}
+
+static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
+{
+ if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts))
+ alloc_tag_sub_noalloc(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag, size);
+}
+#else
+static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off,
+ size_t size)
+{
+}
+
+static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
+{
+}
+#endif
+
/**
* pcpu_alloc - the percpu allocator
* @size: size of area to allocate in bytes
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:52:44

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline

From: Kent Overstreet <[email protected]>

It seems we need to be more forceful with the compiler on this one.
This is done for performance reasons only.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 2ef88bbf56a3..d31b03a8d9d5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
return !kasan_slab_free(s, x, init);
}

-static inline bool slab_free_freelist_hook(struct kmem_cache *s,
+static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
void **head, void **tail,
int *cnt)
{
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:53:57

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 29/36] mm: vmalloc: Enable memory allocation profiling

From: Kent Overstreet <[email protected]>

This wrapps all external vmalloc allocation functions with the
alloc_hooks() wrapper, and switches internal allocations to _noprof
variants where appropriate, for the new memory allocation profiling
feature.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
drivers/staging/media/atomisp/pci/hmm/hmm.c | 2 +-
include/linux/vmalloc.h | 60 ++++++++++----
kernel/kallsyms_selftest.c | 2 +-
mm/nommu.c | 64 +++++++--------
mm/util.c | 24 +++---
mm/vmalloc.c | 88 ++++++++++-----------
6 files changed, 135 insertions(+), 105 deletions(-)

diff --git a/drivers/staging/media/atomisp/pci/hmm/hmm.c b/drivers/staging/media/atomisp/pci/hmm/hmm.c
index bb12644fd033..3e2899ad8517 100644
--- a/drivers/staging/media/atomisp/pci/hmm/hmm.c
+++ b/drivers/staging/media/atomisp/pci/hmm/hmm.c
@@ -205,7 +205,7 @@ static ia_css_ptr __hmm_alloc(size_t bytes, enum hmm_bo_type type,
}

dev_dbg(atomisp_dev, "pages: 0x%08x (%zu bytes), type: %d, vmalloc %p\n",
- bo->start, bytes, type, vmalloc);
+ bo->start, bytes, type, vmalloc_noprof);

return bo->start;

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..106d78e75606 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -2,6 +2,8 @@
#ifndef _LINUX_VMALLOC_H
#define _LINUX_VMALLOC_H

+#include <linux/alloc_tag.h>
+#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/init.h>
#include <linux/list.h>
@@ -137,26 +139,54 @@ extern unsigned long vmalloc_nr_pages(void);
static inline unsigned long vmalloc_nr_pages(void) { return 0; }
#endif

-extern void *vmalloc(unsigned long size) __alloc_size(1);
-extern void *vzalloc(unsigned long size) __alloc_size(1);
-extern void *vmalloc_user(unsigned long size) __alloc_size(1);
-extern void *vmalloc_node(unsigned long size, int node) __alloc_size(1);
-extern void *vzalloc_node(unsigned long size, int node) __alloc_size(1);
-extern void *vmalloc_32(unsigned long size) __alloc_size(1);
-extern void *vmalloc_32_user(unsigned long size) __alloc_size(1);
-extern void *__vmalloc(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
-extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
+extern void *vmalloc_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc(...) alloc_hooks(vmalloc_noprof(__VA_ARGS__))
+
+extern void *vzalloc_noprof(unsigned long size) __alloc_size(1);
+#define vzalloc(...) alloc_hooks(vzalloc_noprof(__VA_ARGS__))
+
+extern void *vmalloc_user_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc_user(...) alloc_hooks(vmalloc_user_noprof(__VA_ARGS__))
+
+extern void *vmalloc_node_noprof(unsigned long size, int node) __alloc_size(1);
+#define vmalloc_node(...) alloc_hooks(vmalloc_node_noprof(__VA_ARGS__))
+
+extern void *vzalloc_node_noprof(unsigned long size, int node) __alloc_size(1);
+#define vzalloc_node(...) alloc_hooks(vzalloc_node_noprof(__VA_ARGS__))
+
+extern void *vmalloc_32_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc_32(...) alloc_hooks(vmalloc_32_noprof(__VA_ARGS__))
+
+extern void *vmalloc_32_user_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc_32_user(...) alloc_hooks(vmalloc_32_user_noprof(__VA_ARGS__))
+
+extern void *__vmalloc_noprof(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+#define __vmalloc(...) alloc_hooks(__vmalloc_noprof(__VA_ARGS__))
+
+extern void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
unsigned long start, unsigned long end, gfp_t gfp_mask,
pgprot_t prot, unsigned long vm_flags, int node,
const void *caller) __alloc_size(1);
-void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
+#define __vmalloc_node_range(...) alloc_hooks(__vmalloc_node_range_noprof(__VA_ARGS__))
+
+void *__vmalloc_node_noprof(unsigned long size, unsigned long align, gfp_t gfp_mask,
int node, const void *caller) __alloc_size(1);
-void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+#define __vmalloc_node(...) alloc_hooks(__vmalloc_node_noprof(__VA_ARGS__))
+
+void *vmalloc_huge_noprof(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+#define vmalloc_huge(...) alloc_hooks(vmalloc_huge_noprof(__VA_ARGS__))
+
+extern void *__vmalloc_array_noprof(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
+#define __vmalloc_array(...) alloc_hooks(__vmalloc_array_noprof(__VA_ARGS__))
+
+extern void *vmalloc_array_noprof(size_t n, size_t size) __alloc_size(1, 2);
+#define vmalloc_array(...) alloc_hooks(vmalloc_array_noprof(__VA_ARGS__))
+
+extern void *__vcalloc_noprof(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
+#define __vcalloc(...) alloc_hooks(__vcalloc_noprof(__VA_ARGS__))

-extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
-extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
-extern void *__vcalloc(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
-extern void *vcalloc(size_t n, size_t size) __alloc_size(1, 2);
+extern void *vcalloc_noprof(size_t n, size_t size) __alloc_size(1, 2);
+#define vcalloc(...) alloc_hooks(vcalloc_noprof(__VA_ARGS__))

extern void vfree(const void *addr);
extern void vfree_atomic(const void *addr);
diff --git a/kernel/kallsyms_selftest.c b/kernel/kallsyms_selftest.c
index b4cac76ea5e9..3ea9be364e32 100644
--- a/kernel/kallsyms_selftest.c
+++ b/kernel/kallsyms_selftest.c
@@ -82,7 +82,7 @@ static struct test_item test_items[] = {
ITEM_FUNC(kallsyms_test_func_static),
ITEM_FUNC(kallsyms_test_func),
ITEM_FUNC(kallsyms_test_func_weak),
- ITEM_FUNC(vmalloc),
+ ITEM_FUNC(vmalloc_noprof),
ITEM_FUNC(vfree),
#ifdef CONFIG_KALLSYMS_ALL
ITEM_DATA(kallsyms_test_var_bss_static),
diff --git a/mm/nommu.c b/mm/nommu.c
index b6dc558d3144..face0938e9e3 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -139,28 +139,28 @@ void vfree(const void *addr)
}
EXPORT_SYMBOL(vfree);

-void *__vmalloc(unsigned long size, gfp_t gfp_mask)
+void *__vmalloc_noprof(unsigned long size, gfp_t gfp_mask)
{
/*
* You can't specify __GFP_HIGHMEM with kmalloc() since kmalloc()
* returns only a logical address.
*/
- return kmalloc(size, (gfp_mask | __GFP_COMP) & ~__GFP_HIGHMEM);
+ return kmalloc_noprof(size, (gfp_mask | __GFP_COMP) & ~__GFP_HIGHMEM);
}
-EXPORT_SYMBOL(__vmalloc);
+EXPORT_SYMBOL(__vmalloc_noprof);

-void *__vmalloc_node_range(unsigned long size, unsigned long align,
+void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
unsigned long start, unsigned long end, gfp_t gfp_mask,
pgprot_t prot, unsigned long vm_flags, int node,
const void *caller)
{
- return __vmalloc(size, gfp_mask);
+ return __vmalloc_noprof(size, gfp_mask);
}

-void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
+void *__vmalloc_node_noprof(unsigned long size, unsigned long align, gfp_t gfp_mask,
int node, const void *caller)
{
- return __vmalloc(size, gfp_mask);
+ return __vmalloc_noprof(size, gfp_mask);
}

static void *__vmalloc_user_flags(unsigned long size, gfp_t flags)
@@ -181,11 +181,11 @@ static void *__vmalloc_user_flags(unsigned long size, gfp_t flags)
return ret;
}

-void *vmalloc_user(unsigned long size)
+void *vmalloc_user_noprof(unsigned long size)
{
return __vmalloc_user_flags(size, GFP_KERNEL | __GFP_ZERO);
}
-EXPORT_SYMBOL(vmalloc_user);
+EXPORT_SYMBOL(vmalloc_user_noprof);

struct page *vmalloc_to_page(const void *addr)
{
@@ -219,13 +219,13 @@ long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
* For tight control over page level allocator and protection flags
* use __vmalloc() instead.
*/
-void *vmalloc(unsigned long size)
+void *vmalloc_noprof(unsigned long size)
{
- return __vmalloc(size, GFP_KERNEL);
+ return __vmalloc_noprof(size, GFP_KERNEL);
}
-EXPORT_SYMBOL(vmalloc);
+EXPORT_SYMBOL(vmalloc_noprof);

-void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __weak __alias(__vmalloc);
+void *vmalloc_huge_noprof(unsigned long size, gfp_t gfp_mask) __weak __alias(__vmalloc_noprof);

/*
* vzalloc - allocate virtually contiguous memory with zero fill
@@ -239,14 +239,14 @@ void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __weak __alias(__vmalloc)
* For tight control over page level allocator and protection flags
* use __vmalloc() instead.
*/
-void *vzalloc(unsigned long size)
+void *vzalloc_noprof(unsigned long size)
{
- return __vmalloc(size, GFP_KERNEL | __GFP_ZERO);
+ return __vmalloc_noprof(size, GFP_KERNEL | __GFP_ZERO);
}
-EXPORT_SYMBOL(vzalloc);
+EXPORT_SYMBOL(vzalloc_noprof);

/**
- * vmalloc_node - allocate memory on a specific node
+ * vmalloc_node_noprof - allocate memory on a specific node
* @size: allocation size
* @node: numa node
*
@@ -256,14 +256,14 @@ EXPORT_SYMBOL(vzalloc);
* For tight control over page level allocator and protection flags
* use __vmalloc() instead.
*/
-void *vmalloc_node(unsigned long size, int node)
+void *vmalloc_node_noprof(unsigned long size, int node)
{
- return vmalloc(size);
+ return vmalloc_noprof(size);
}
-EXPORT_SYMBOL(vmalloc_node);
+EXPORT_SYMBOL(vmalloc_node_noprof);

/**
- * vzalloc_node - allocate memory on a specific node with zero fill
+ * vzalloc_node_noprof - allocate memory on a specific node with zero fill
* @size: allocation size
* @node: numa node
*
@@ -274,27 +274,27 @@ EXPORT_SYMBOL(vmalloc_node);
* For tight control over page level allocator and protection flags
* use __vmalloc() instead.
*/
-void *vzalloc_node(unsigned long size, int node)
+void *vzalloc_node_noprof(unsigned long size, int node)
{
- return vzalloc(size);
+ return vzalloc_noprof(size);
}
-EXPORT_SYMBOL(vzalloc_node);
+EXPORT_SYMBOL(vzalloc_node_noprof);

/**
- * vmalloc_32 - allocate virtually contiguous memory (32bit addressable)
+ * vmalloc_32_noprof - allocate virtually contiguous memory (32bit addressable)
* @size: allocation size
*
* Allocate enough 32bit PA addressable pages to cover @size from the
* page level allocator and map them into contiguous kernel virtual space.
*/
-void *vmalloc_32(unsigned long size)
+void *vmalloc_32_noprof(unsigned long size)
{
- return __vmalloc(size, GFP_KERNEL);
+ return __vmalloc_noprof(size, GFP_KERNEL);
}
-EXPORT_SYMBOL(vmalloc_32);
+EXPORT_SYMBOL(vmalloc_32_noprof);

/**
- * vmalloc_32_user - allocate zeroed virtually contiguous 32bit memory
+ * vmalloc_32_user_noprof - allocate zeroed virtually contiguous 32bit memory
* @size: allocation size
*
* The resulting memory area is 32bit addressable and zeroed so it can be
@@ -303,15 +303,15 @@ EXPORT_SYMBOL(vmalloc_32);
* VM_USERMAP is set on the corresponding VMA so that subsequent calls to
* remap_vmalloc_range() are permissible.
*/
-void *vmalloc_32_user(unsigned long size)
+void *vmalloc_32_user_noprof(unsigned long size)
{
/*
* We'll have to sort out the ZONE_DMA bits for 64-bit,
* but for now this can simply use vmalloc_user() directly.
*/
- return vmalloc_user(size);
+ return vmalloc_user_noprof(size);
}
-EXPORT_SYMBOL(vmalloc_32_user);
+EXPORT_SYMBOL(vmalloc_32_user_noprof);

void *vmap(struct page **pages, unsigned int count, unsigned long flags, pgprot_t prot)
{
diff --git a/mm/util.c b/mm/util.c
index 291f7945190f..19c90036d3cc 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -639,7 +639,7 @@ void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
* about the resulting pointer, and cannot play
* protection games.
*/
- return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
node, __builtin_return_address(0));
}
@@ -698,12 +698,12 @@ void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flag
EXPORT_SYMBOL(kvrealloc_noprof);

/**
- * __vmalloc_array - allocate memory for a virtually contiguous array.
+ * __vmalloc_array_noprof - allocate memory for a virtually contiguous array.
* @n: number of elements.
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-void *__vmalloc_array(size_t n, size_t size, gfp_t flags)
+void *__vmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
{
size_t bytes;

@@ -711,18 +711,18 @@ void *__vmalloc_array(size_t n, size_t size, gfp_t flags)
return NULL;
return __vmalloc(bytes, flags);
}
-EXPORT_SYMBOL(__vmalloc_array);
+EXPORT_SYMBOL(__vmalloc_array_noprof);

/**
- * vmalloc_array - allocate memory for a virtually contiguous array.
+ * vmalloc_array_noprof - allocate memory for a virtually contiguous array.
* @n: number of elements.
* @size: element size.
*/
-void *vmalloc_array(size_t n, size_t size)
+void *vmalloc_array_noprof(size_t n, size_t size)
{
return __vmalloc_array(n, size, GFP_KERNEL);
}
-EXPORT_SYMBOL(vmalloc_array);
+EXPORT_SYMBOL(vmalloc_array_noprof);

/**
* __vcalloc - allocate and zero memory for a virtually contiguous array.
@@ -730,22 +730,22 @@ EXPORT_SYMBOL(vmalloc_array);
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-void *__vcalloc(size_t n, size_t size, gfp_t flags)
+void *__vcalloc_noprof(size_t n, size_t size, gfp_t flags)
{
return __vmalloc_array(n, size, flags | __GFP_ZERO);
}
-EXPORT_SYMBOL(__vcalloc);
+EXPORT_SYMBOL(__vcalloc_noprof);

/**
- * vcalloc - allocate and zero memory for a virtually contiguous array.
+ * vcalloc_noprof - allocate and zero memory for a virtually contiguous array.
* @n: number of elements.
* @size: element size.
*/
-void *vcalloc(size_t n, size_t size)
+void *vcalloc_noprof(size_t n, size_t size)
{
return __vmalloc_array(n, size, GFP_KERNEL | __GFP_ZERO);
}
-EXPORT_SYMBOL(vcalloc);
+EXPORT_SYMBOL(vcalloc_noprof);

struct anon_vma *folio_anon_vma(struct folio *folio)
{
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d12a17fc0c17..5239f2c9ecae 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3025,12 +3025,12 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
* but mempolicy wants to alloc memory by interleaving.
*/
if (IS_ENABLED(CONFIG_NUMA) && nid == NUMA_NO_NODE)
- nr = alloc_pages_bulk_array_mempolicy(bulk_gfp,
+ nr = alloc_pages_bulk_array_mempolicy_noprof(bulk_gfp,
nr_pages_request,
pages + nr_allocated);

else
- nr = alloc_pages_bulk_array_node(bulk_gfp, nid,
+ nr = alloc_pages_bulk_array_node_noprof(bulk_gfp, nid,
nr_pages_request,
pages + nr_allocated);

@@ -3060,9 +3060,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
break;

if (nid == NUMA_NO_NODE)
- page = alloc_pages(alloc_gfp, order);
+ page = alloc_pages_noprof(alloc_gfp, order);
else
- page = alloc_pages_node(nid, alloc_gfp, order);
+ page = alloc_pages_node_noprof(nid, alloc_gfp, order);
if (unlikely(!page)) {
if (!nofail)
break;
@@ -3119,10 +3119,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,

/* Please note that the recursion is strictly bounded. */
if (array_size > PAGE_SIZE) {
- area->pages = __vmalloc_node(array_size, 1, nested_gfp, node,
+ area->pages = __vmalloc_node_noprof(array_size, 1, nested_gfp, node,
area->caller);
} else {
- area->pages = kmalloc_node(array_size, nested_gfp, node);
+ area->pages = kmalloc_node_noprof(array_size, nested_gfp, node);
}

if (!area->pages) {
@@ -3205,7 +3205,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
}

/**
- * __vmalloc_node_range - allocate virtually contiguous memory
+ * __vmalloc_node_range_noprof - allocate virtually contiguous memory
* @size: allocation size
* @align: desired alignment
* @start: vm area range start
@@ -3232,7 +3232,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
*
* Return: the address of the area or %NULL on failure
*/
-void *__vmalloc_node_range(unsigned long size, unsigned long align,
+void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
unsigned long start, unsigned long end, gfp_t gfp_mask,
pgprot_t prot, unsigned long vm_flags, int node,
const void *caller)
@@ -3361,7 +3361,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
}

/**
- * __vmalloc_node - allocate virtually contiguous memory
+ * __vmalloc_node_noprof - allocate virtually contiguous memory
* @size: allocation size
* @align: desired alignment
* @gfp_mask: flags for the page level allocator
@@ -3379,10 +3379,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *__vmalloc_node(unsigned long size, unsigned long align,
+void *__vmalloc_node_noprof(unsigned long size, unsigned long align,
gfp_t gfp_mask, int node, const void *caller)
{
- return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, align, VMALLOC_START, VMALLOC_END,
gfp_mask, PAGE_KERNEL, 0, node, caller);
}
/*
@@ -3391,15 +3391,15 @@ void *__vmalloc_node(unsigned long size, unsigned long align,
* than that.
*/
#ifdef CONFIG_TEST_VMALLOC_MODULE
-EXPORT_SYMBOL_GPL(__vmalloc_node);
+EXPORT_SYMBOL_GPL(__vmalloc_node_noprof);
#endif

-void *__vmalloc(unsigned long size, gfp_t gfp_mask)
+void *__vmalloc_noprof(unsigned long size, gfp_t gfp_mask)
{
- return __vmalloc_node(size, 1, gfp_mask, NUMA_NO_NODE,
+ return __vmalloc_node_noprof(size, 1, gfp_mask, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(__vmalloc);
+EXPORT_SYMBOL(__vmalloc_noprof);

/**
* vmalloc - allocate virtually contiguous memory
@@ -3413,12 +3413,12 @@ EXPORT_SYMBOL(__vmalloc);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc(unsigned long size)
+void *vmalloc_noprof(unsigned long size)
{
- return __vmalloc_node(size, 1, GFP_KERNEL, NUMA_NO_NODE,
+ return __vmalloc_node_noprof(size, 1, GFP_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc);
+EXPORT_SYMBOL(vmalloc_noprof);

/**
* vmalloc_huge - allocate virtually contiguous memory, allow huge pages
@@ -3432,16 +3432,16 @@ EXPORT_SYMBOL(vmalloc);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_huge(unsigned long size, gfp_t gfp_mask)
+void *vmalloc_huge_noprof(unsigned long size, gfp_t gfp_mask)
{
- return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
NUMA_NO_NODE, __builtin_return_address(0));
}
-EXPORT_SYMBOL_GPL(vmalloc_huge);
+EXPORT_SYMBOL_GPL(vmalloc_huge_noprof);

/**
- * vzalloc - allocate virtually contiguous memory with zero fill
+ * vzalloc_noprof - allocate virtually contiguous memory with zero fill
* @size: allocation size
*
* Allocate enough pages to cover @size from the page level
@@ -3453,12 +3453,12 @@ EXPORT_SYMBOL_GPL(vmalloc_huge);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vzalloc(unsigned long size)
+void *vzalloc_noprof(unsigned long size)
{
- return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
+ return __vmalloc_node_noprof(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vzalloc);
+EXPORT_SYMBOL(vzalloc_noprof);

/**
* vmalloc_user - allocate zeroed virtually contiguous memory for userspace
@@ -3469,17 +3469,17 @@ EXPORT_SYMBOL(vzalloc);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_user(unsigned long size)
+void *vmalloc_user_noprof(unsigned long size)
{
- return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, SHMLBA, VMALLOC_START, VMALLOC_END,
GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
VM_USERMAP, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_user);
+EXPORT_SYMBOL(vmalloc_user_noprof);

/**
- * vmalloc_node - allocate memory on a specific node
+ * vmalloc_node_noprof - allocate memory on a specific node
* @size: allocation size
* @node: numa node
*
@@ -3491,15 +3491,15 @@ EXPORT_SYMBOL(vmalloc_user);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_node(unsigned long size, int node)
+void *vmalloc_node_noprof(unsigned long size, int node)
{
- return __vmalloc_node(size, 1, GFP_KERNEL, node,
+ return __vmalloc_node_noprof(size, 1, GFP_KERNEL, node,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_node);
+EXPORT_SYMBOL(vmalloc_node_noprof);

/**
- * vzalloc_node - allocate memory on a specific node with zero fill
+ * vzalloc_node_noprof - allocate memory on a specific node with zero fill
* @size: allocation size
* @node: numa node
*
@@ -3509,12 +3509,12 @@ EXPORT_SYMBOL(vmalloc_node);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vzalloc_node(unsigned long size, int node)
+void *vzalloc_node_noprof(unsigned long size, int node)
{
- return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, node,
+ return __vmalloc_node_noprof(size, 1, GFP_KERNEL | __GFP_ZERO, node,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vzalloc_node);
+EXPORT_SYMBOL(vzalloc_node_noprof);

#if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
#define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
@@ -3529,7 +3529,7 @@ EXPORT_SYMBOL(vzalloc_node);
#endif

/**
- * vmalloc_32 - allocate virtually contiguous memory (32bit addressable)
+ * vmalloc_32_noprof - allocate virtually contiguous memory (32bit addressable)
* @size: allocation size
*
* Allocate enough 32bit PA addressable pages to cover @size from the
@@ -3537,15 +3537,15 @@ EXPORT_SYMBOL(vzalloc_node);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_32(unsigned long size)
+void *vmalloc_32_noprof(unsigned long size)
{
- return __vmalloc_node(size, 1, GFP_VMALLOC32, NUMA_NO_NODE,
+ return __vmalloc_node_noprof(size, 1, GFP_VMALLOC32, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_32);
+EXPORT_SYMBOL(vmalloc_32_noprof);

/**
- * vmalloc_32_user - allocate zeroed virtually contiguous 32bit memory
+ * vmalloc_32_user_noprof - allocate zeroed virtually contiguous 32bit memory
* @size: allocation size
*
* The resulting memory area is 32bit addressable and zeroed so it can be
@@ -3553,14 +3553,14 @@ EXPORT_SYMBOL(vmalloc_32);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_32_user(unsigned long size)
+void *vmalloc_32_user_noprof(unsigned long size)
{
- return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, SHMLBA, VMALLOC_START, VMALLOC_END,
GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
VM_USERMAP, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_32_user);
+EXPORT_SYMBOL(vmalloc_32_user_noprof);

/*
* Atomically zero bytes in the iterator.
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:53:57

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 30/36] rhashtable: Plumb through alloc tag

From: Kent Overstreet <[email protected]>

This gives better memory allocation profiling results; rhashtable
allocations will be accounted to the code that initialized the
rhashtable.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/alloc_tag.h | 3 +++
include/linux/rhashtable-types.h | 11 +++++++++--
lib/rhashtable.c | 28 +++++++++++++++++-----------
3 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 86ed5d24a030..29636719b276 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -130,6 +130,8 @@ static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
this_cpu_add(tag->counters->bytes, bytes);
}

+#define alloc_tag_record(p) ((p) = current->alloc_tag)
+
#else /* CONFIG_MEM_ALLOC_PROFILING */

#define DEFINE_ALLOC_TAG(_alloc_tag)
@@ -138,6 +140,7 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes) {}
static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
size_t bytes) {}
+#define alloc_tag_record(p) do {} while (0)

#endif /* CONFIG_MEM_ALLOC_PROFILING */

diff --git a/include/linux/rhashtable-types.h b/include/linux/rhashtable-types.h
index b6f3797277ff..015c8298bebc 100644
--- a/include/linux/rhashtable-types.h
+++ b/include/linux/rhashtable-types.h
@@ -9,6 +9,7 @@
#ifndef _LINUX_RHASHTABLE_TYPES_H
#define _LINUX_RHASHTABLE_TYPES_H

+#include <linux/alloc_tag.h>
#include <linux/atomic.h>
#include <linux/compiler.h>
#include <linux/mutex.h>
@@ -88,6 +89,9 @@ struct rhashtable {
struct mutex mutex;
spinlock_t lock;
atomic_t nelems;
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ struct alloc_tag *alloc_tag;
+#endif
};

/**
@@ -127,9 +131,12 @@ struct rhashtable_iter {
bool end_of_table;
};

-int rhashtable_init(struct rhashtable *ht,
+int rhashtable_init_noprof(struct rhashtable *ht,
const struct rhashtable_params *params);
-int rhltable_init(struct rhltable *hlt,
+#define rhashtable_init(...) alloc_hooks(rhashtable_init_noprof(__VA_ARGS__))
+
+int rhltable_init_noprof(struct rhltable *hlt,
const struct rhashtable_params *params);
+#define rhltable_init(...) alloc_hooks(rhltable_init_noprof(__VA_ARGS__))

#endif /* _LINUX_RHASHTABLE_TYPES_H */
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6ae2ba8e06a2..35d841cf2b43 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -130,7 +130,8 @@ static union nested_table *nested_table_alloc(struct rhashtable *ht,
if (ntbl)
return ntbl;

- ntbl = kzalloc(PAGE_SIZE, GFP_ATOMIC);
+ ntbl = alloc_hooks_tag(ht->alloc_tag,
+ kmalloc_noprof(PAGE_SIZE, GFP_ATOMIC|__GFP_ZERO));

if (ntbl && leaf) {
for (i = 0; i < PAGE_SIZE / sizeof(ntbl[0]); i++)
@@ -157,7 +158,8 @@ static struct bucket_table *nested_bucket_table_alloc(struct rhashtable *ht,

size = sizeof(*tbl) + sizeof(tbl->buckets[0]);

- tbl = kzalloc(size, gfp);
+ tbl = alloc_hooks_tag(ht->alloc_tag,
+ kmalloc_noprof(size, gfp|__GFP_ZERO));
if (!tbl)
return NULL;

@@ -181,7 +183,9 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
int i;
static struct lock_class_key __key;

- tbl = kvzalloc(struct_size(tbl, buckets, nbuckets), gfp);
+ tbl = alloc_hooks_tag(ht->alloc_tag,
+ kvmalloc_node_noprof(struct_size(tbl, buckets, nbuckets),
+ gfp|__GFP_ZERO, NUMA_NO_NODE));

size = nbuckets;

@@ -975,7 +979,7 @@ static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
}

/**
- * rhashtable_init - initialize a new hash table
+ * rhashtable_init_noprof - initialize a new hash table
* @ht: hash table to be initialized
* @params: configuration parameters
*
@@ -1016,7 +1020,7 @@ static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
* .obj_hashfn = my_hash_fn,
* };
*/
-int rhashtable_init(struct rhashtable *ht,
+int rhashtable_init_noprof(struct rhashtable *ht,
const struct rhashtable_params *params)
{
struct bucket_table *tbl;
@@ -1031,6 +1035,8 @@ int rhashtable_init(struct rhashtable *ht,
spin_lock_init(&ht->lock);
memcpy(&ht->p, params, sizeof(*params));

+ alloc_tag_record(ht->alloc_tag);
+
if (params->min_size)
ht->p.min_size = roundup_pow_of_two(params->min_size);

@@ -1076,26 +1082,26 @@ int rhashtable_init(struct rhashtable *ht,

return 0;
}
-EXPORT_SYMBOL_GPL(rhashtable_init);
+EXPORT_SYMBOL_GPL(rhashtable_init_noprof);

/**
- * rhltable_init - initialize a new hash list table
+ * rhltable_init_noprof - initialize a new hash list table
* @hlt: hash list table to be initialized
* @params: configuration parameters
*
* Initializes a new hash list table.
*
- * See documentation for rhashtable_init.
+ * See documentation for rhashtable_init_noprof.
*/
-int rhltable_init(struct rhltable *hlt, const struct rhashtable_params *params)
+int rhltable_init_noprof(struct rhltable *hlt, const struct rhashtable_params *params)
{
int err;

- err = rhashtable_init(&hlt->ht, params);
+ err = rhashtable_init_noprof(&hlt->ht, params);
hlt->ht.rhlist = true;
return err;
}
-EXPORT_SYMBOL_GPL(rhltable_init);
+EXPORT_SYMBOL_GPL(rhltable_init_noprof);

static void rhashtable_free_one(struct rhashtable *ht, struct rhash_head *obj,
void (*free_fn)(void *ptr, void *arg),
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:55:11

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 33/36] codetag: debug: mark codetags for reserved pages as empty

To avoid debug warnings while freeing reserved pages which were not
allocated with usual allocators, mark their codetags as empty before
freeing.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
include/linux/alloc_tag.h | 1 +
include/linux/mm.h | 9 +++++++++
include/linux/pgalloc_tag.h | 2 ++
mm/mm_init.c | 12 +++++++++++-
4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 4a3fc865d878..64aa9557341e 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -46,6 +46,7 @@ static inline void set_codetag_empty(union codetag_ref *ref)
#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */

static inline bool is_codetag_empty(union codetag_ref *ref) { return false; }
+static inline void set_codetag_empty(union codetag_ref *ref) {}

#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f5a97dec5169..b9a4e2cb3ac1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -5,6 +5,7 @@
#include <linux/errno.h>
#include <linux/mmdebug.h>
#include <linux/gfp.h>
+#include <linux/pgalloc_tag.h>
#include <linux/bug.h>
#include <linux/list.h>
#include <linux/mmzone.h>
@@ -3112,6 +3113,14 @@ extern void reserve_bootmem_region(phys_addr_t start,
/* Free the reserved page into the buddy system, so it gets managed. */
static inline void free_reserved_page(struct page *page)
{
+ if (mem_alloc_profiling_enabled()) {
+ union codetag_ref *ref = get_page_tag_ref(page);
+
+ if (ref) {
+ set_codetag_empty(ref);
+ put_page_tag_ref(ref);
+ }
+ }
ClearPageReserved(page);
init_page_count(page);
__free_page(page);
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 9e6ad8e0e4aa..7a41ed612423 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -98,6 +98,8 @@ static inline void pgalloc_tag_split(struct page *page, unsigned int nr)

#else /* CONFIG_MEM_ALLOC_PROFILING */

+static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
+static inline void put_page_tag_ref(union codetag_ref *ref) {}
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int order) {}
static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index e9ea2919d02d..6b5410a5112c 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2566,7 +2566,6 @@ void __init set_dma_reserve(unsigned long new_dma_reserve)
void __init memblock_free_pages(struct page *page, unsigned long pfn,
unsigned int order)
{
-
if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT)) {
int nid = early_pfn_to_nid(pfn);

@@ -2578,6 +2577,17 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
/* KMSAN will take care of these pages. */
return;
}
+
+ /* pages were reserved and not allocated */
+ if (mem_alloc_profiling_enabled()) {
+ union codetag_ref *ref = get_page_tag_ref(page);
+
+ if (ref) {
+ set_codetag_empty(ref);
+ put_page_tag_ref(ref);
+ }
+ }
+
__free_pages_core(page, order);
}

--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:56:20

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 34/36] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations

If slabobj_ext vector allocation for a slab object fails and later on it
succeeds for another object in the same slab, the slabobj_ext for the
original object will be NULL and will be flagged in case when
CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled.
Mark failed slabobj_ext vector allocations using a new objext_flags flag
stored in the lower bits of slab->obj_exts. When new allocation succeeds
it marks all tag references in the same slabobj_ext vector as empty to
avoid warnings implemented by CONFIG_MEM_ALLOC_PROFILING_DEBUG checks.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/memcontrol.h | 4 +++-
mm/slub.c | 46 ++++++++++++++++++++++++++++++++------
2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 2b010316016c..f95241ca9052 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -365,8 +365,10 @@ enum page_memcg_data_flags {
#endif /* CONFIG_MEMCG */

enum objext_flags {
+ /* slabobj_ext vector failed to allocate */
+ OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
/* the next bit after the last actual flag */
- __NR_OBJEXTS_FLAGS = __FIRST_OBJEXT_FLAG,
+ __NR_OBJEXTS_FLAGS = (__FIRST_OBJEXT_FLAG << 1),
};

#define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
diff --git a/mm/slub.c b/mm/slub.c
index 3e41d45f9fa4..43d63747cad2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1901,9 +1901,33 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
}
}

+static inline void mark_failed_objexts_alloc(struct slab *slab)
+{
+ slab->obj_exts = OBJEXTS_ALLOC_FAIL;
+}
+
+static inline void handle_failed_objexts_alloc(unsigned long obj_exts,
+ struct slabobj_ext *vec, unsigned int objects)
+{
+ /*
+ * If vector previously failed to allocate then we have live
+ * objects with no tag reference. Mark all references in this
+ * vector as empty to avoid warnings later on.
+ */
+ if (obj_exts & OBJEXTS_ALLOC_FAIL) {
+ unsigned int i;
+
+ for (i = 0; i < objects; i++)
+ set_codetag_empty(&vec[i].ref);
+ }
+}
+
#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */

static inline void mark_objexts_empty(struct slabobj_ext *obj_exts) {}
+static inline void mark_failed_objexts_alloc(struct slab *slab) {}
+static inline void handle_failed_objexts_alloc(unsigned long obj_exts,
+ struct slabobj_ext *vec, unsigned int objects) {}

#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */

@@ -1919,29 +1943,37 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
gfp_t gfp, bool new_slab)
{
unsigned int objects = objs_per_slab(s, slab);
- unsigned long obj_exts;
- void *vec;
+ unsigned long new_exts;
+ unsigned long old_exts;
+ struct slabobj_ext *vec;

gfp &= ~OBJCGS_CLEAR_MASK;
/* Prevent recursive extension vector allocation */
gfp |= __GFP_NO_OBJ_EXT;
vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
slab_nid(slab));
- if (!vec)
+ if (!vec) {
+ /* Mark vectors which failed to allocate */
+ if (new_slab)
+ mark_failed_objexts_alloc(slab);
+
return -ENOMEM;
+ }

- obj_exts = (unsigned long)vec;
+ new_exts = (unsigned long)vec;
#ifdef CONFIG_MEMCG
- obj_exts |= MEMCG_DATA_OBJEXTS;
+ new_exts |= MEMCG_DATA_OBJEXTS;
#endif
+ old_exts = slab->obj_exts;
+ handle_failed_objexts_alloc(old_exts, vec, objects);
if (new_slab) {
/*
* If the slab is brand new and nobody can yet access its
* obj_exts, no synchronization is required and obj_exts can
* be simply assigned.
*/
- slab->obj_exts = obj_exts;
- } else if (cmpxchg(&slab->obj_exts, 0, obj_exts)) {
+ slab->obj_exts = new_exts;
+ } else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) {
/*
* If the slab is already in use, somebody can allocate and
* assign slabobj_exts in parallel. In this case the existing
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 19:56:25

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 35/36] MAINTAINERS: Add entries for code tagging and memory allocation profiling

From: Kent Overstreet <[email protected]>

The new code & libraries added are being maintained - mark them as such.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
MAINTAINERS | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9ed4d3868539..4f131872da27 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5210,6 +5210,13 @@ S: Supported
F: Documentation/process/code-of-conduct-interpretation.rst
F: Documentation/process/code-of-conduct.rst

+CODE TAGGING
+M: Suren Baghdasaryan <[email protected]>
+M: Kent Overstreet <[email protected]>
+S: Maintained
+F: include/linux/codetag.h
+F: lib/codetag.c
+
COMEDI DRIVERS
M: Ian Abbott <[email protected]>
M: H Hartley Sweeten <[email protected]>
@@ -14061,6 +14068,16 @@ F: mm/memblock.c
F: mm/mm_init.c
F: tools/testing/memblock/

+MEMORY ALLOCATION PROFILING
+M: Suren Baghdasaryan <[email protected]>
+M: Kent Overstreet <[email protected]>
+L: [email protected]
+S: Maintained
+F: include/linux/alloc_tag.h
+F: include/linux/codetag_ctx.h
+F: lib/alloc_tag.c
+F: lib/pgalloc_tag.c
+
MEMORY CONTROLLER DRIVERS
M: Krzysztof Kozlowski <[email protected]>
L: [email protected]
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 20:00:44

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags

Introduce objext_flags to store additional objext flags unrelated to memcg.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
include/linux/memcontrol.h | 29 ++++++++++++++++++++++-------
mm/slab.h | 4 +---
2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index eb1dc181e412..f3584e98b640 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -356,7 +356,22 @@ enum page_memcg_data_flags {
__NR_MEMCG_DATA_FLAGS = (1UL << 2),
};

-#define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
+#define __FIRST_OBJEXT_FLAG __NR_MEMCG_DATA_FLAGS
+
+#else /* CONFIG_MEMCG */
+
+#define __FIRST_OBJEXT_FLAG (1UL << 0)
+
+#endif /* CONFIG_MEMCG */
+
+enum objext_flags {
+ /* the next bit after the last actual flag */
+ __NR_OBJEXTS_FLAGS = __FIRST_OBJEXT_FLAG,
+};
+
+#define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
+
+#ifdef CONFIG_MEMCG

static inline bool folio_memcg_kmem(struct folio *folio);

@@ -390,7 +405,7 @@ static inline struct mem_cgroup *__folio_memcg(struct folio *folio)
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio);

- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}

/*
@@ -411,7 +426,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio);

- return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct obj_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}

/*
@@ -468,11 +483,11 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
if (memcg_data & MEMCG_DATA_KMEM) {
struct obj_cgroup *objcg;

- objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
return obj_cgroup_memcg(objcg);
}

- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}

/*
@@ -511,11 +526,11 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
if (memcg_data & MEMCG_DATA_KMEM) {
struct obj_cgroup *objcg;

- objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
return obj_cgroup_memcg(objcg);
}

- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}

static inline struct mem_cgroup *page_memcg_check(struct page *page)
diff --git a/mm/slab.h b/mm/slab.h
index 7f19b0a2acd8..13b6ba2abd74 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -560,10 +560,8 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
slab_page(slab));
VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));

- return (struct slabobj_ext *)(obj_exts & ~MEMCG_DATA_FLAGS_MASK);
-#else
- return (struct slabobj_ext *)obj_exts;
#endif
+ return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
}

int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 20:02:41

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed

Skip freeing module's data section if there are non-zero allocation tags
because otherwise, once these allocations are freed, the access to their
code tag would cause UAF.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/codetag.h | 6 +++---
kernel/module/main.c | 23 +++++++++++++++--------
lib/codetag.c | 11 ++++++++---
3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c44f5b83f24d..bfd0ba5c4185 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -35,7 +35,7 @@ struct codetag_type_desc {
size_t tag_size;
void (*module_load)(struct codetag_type *cttype,
struct codetag_module *cmod);
- void (*module_unload)(struct codetag_type *cttype,
+ bool (*module_unload)(struct codetag_type *cttype,
struct codetag_module *cmod);
};

@@ -71,10 +71,10 @@ codetag_register_type(const struct codetag_type_desc *desc);

#if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
void codetag_load_module(struct module *mod);
-void codetag_unload_module(struct module *mod);
+bool codetag_unload_module(struct module *mod);
#else
static inline void codetag_load_module(struct module *mod) {}
-static inline void codetag_unload_module(struct module *mod) {}
+static inline bool codetag_unload_module(struct module *mod) { return true; }
#endif

#endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index f400ba076cc7..658b631e76ad 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1211,15 +1211,19 @@ static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
return module_alloc(size);
}

-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(void *ptr, enum mod_mem_type type,
+ bool unload_codetags)
{
+ if (!unload_codetags && mod_mem_type_is_core_data(type))
+ return;
+
if (mod_mem_use_vmalloc(type))
vfree(ptr);
else
module_memfree(ptr);
}

-static void free_mod_mem(struct module *mod)
+static void free_mod_mem(struct module *mod, bool unload_codetags)
{
for_each_mod_mem_type(type) {
struct module_memory *mod_mem = &mod->mem[type];
@@ -1230,20 +1234,23 @@ static void free_mod_mem(struct module *mod)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
- module_memory_free(mod_mem->base, type);
+ module_memory_free(mod_mem->base, type,
+ unload_codetags);
}

/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
- module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+ module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA, unload_codetags);
}

/* Free a module, remove from lists, etc. */
static void free_module(struct module *mod)
{
+ bool unload_codetags;
+
trace_module_free(mod);

- codetag_unload_module(mod);
+ unload_codetags = codetag_unload_module(mod);
mod_sysfs_teardown(mod);

/*
@@ -1285,7 +1292,7 @@ static void free_module(struct module *mod)
kfree(mod->args);
percpu_modfree(mod);

- free_mod_mem(mod);
+ free_mod_mem(mod, unload_codetags);
}

void *__symbol_get(const char *symbol)
@@ -2298,7 +2305,7 @@ static int move_module(struct module *mod, struct load_info *info)
return 0;
out_enomem:
for (t--; t >= 0; t--)
- module_memory_free(mod->mem[t].base, t);
+ module_memory_free(mod->mem[t].base, t, true);
return ret;
}

@@ -2428,7 +2435,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
percpu_modfree(mod);
module_arch_freeing_init(mod);

- free_mod_mem(mod);
+ free_mod_mem(mod, true);
}

int __weak module_finalize(const Elf_Ehdr *hdr,
diff --git a/lib/codetag.c b/lib/codetag.c
index 9af22648dbfa..b13412ca57cc 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -5,6 +5,7 @@
#include <linux/module.h>
#include <linux/seq_buf.h>
#include <linux/slab.h>
+#include <linux/vmalloc.h>

struct codetag_type {
struct list_head link;
@@ -239,12 +240,13 @@ void codetag_load_module(struct module *mod)
mutex_unlock(&codetag_lock);
}

-void codetag_unload_module(struct module *mod)
+bool codetag_unload_module(struct module *mod)
{
struct codetag_type *cttype;
+ bool unload_ok = true;

if (!mod)
- return;
+ return true;

mutex_lock(&codetag_lock);
list_for_each_entry(cttype, &codetag_types, link) {
@@ -261,7 +263,8 @@ void codetag_unload_module(struct module *mod)
}
if (found) {
if (cttype->desc.module_unload)
- cttype->desc.module_unload(cttype, cmod);
+ if (!cttype->desc.module_unload(cttype, cmod))
+ unload_ok = false;

cttype->count -= range_size(cttype, &cmod->range);
idr_remove(&cttype->mod_idr, mod_id);
@@ -270,4 +273,6 @@ void codetag_unload_module(struct module *mod)
up_write(&cttype->mod_lock);
}
mutex_unlock(&codetag_lock);
+
+ return unload_ok;
}
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 20:05:41

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 19/36] mm: create new codetag references during page splitting

When a high-order page is split into smaller ones, each newly split
page should get its codetag. The original codetag is reused for these
pages but it's recorded as 0-byte allocation because original codetag
already accounts for the original high-order allocated page.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++
mm/huge_memory.c | 2 ++
mm/page_alloc.c | 2 ++
3 files changed, 34 insertions(+)

diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index b49ab955300f..9e6ad8e0e4aa 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -67,11 +67,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order)
}
}

+static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
+{
+ int i;
+ struct page_ext *page_ext;
+ union codetag_ref *ref;
+ struct alloc_tag *tag;
+
+ if (!mem_alloc_profiling_enabled())
+ return;
+
+ page_ext = page_ext_get(page);
+ if (unlikely(!page_ext))
+ return;
+
+ ref = codetag_ref_from_page_ext(page_ext);
+ if (!ref->ct)
+ goto out;
+
+ tag = ct_to_alloc_tag(ref->ct);
+ page_ext = page_ext_next(page_ext);
+ for (i = 1; i < nr; i++) {
+ /* Set new reference to point to the original tag */
+ alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag);
+ page_ext = page_ext_next(page_ext);
+ }
+out:
+ page_ext_put(page_ext);
+}
+
#else /* CONFIG_MEM_ALLOC_PROFILING */

static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int order) {}
static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
+static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}

#endif /* CONFIG_MEM_ALLOC_PROFILING */

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 94c958f7ebb5..86daae671319 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -38,6 +38,7 @@
#include <linux/sched/sysctl.h>
#include <linux/memory-tiers.h>
#include <linux/compat.h>
+#include <linux/pgalloc_tag.h>

#include <asm/tlb.h>
#include <asm/pgalloc.h>
@@ -2899,6 +2900,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
/* Caller disabled irqs, so they are still disabled here */

split_page_owner(head, nr);
+ pgalloc_tag_split(head, nr);

/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 58c0e8b948a4..4bc5b4720fee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2621,6 +2621,7 @@ void split_page(struct page *page, unsigned int order)
for (i = 1; i < (1 << order); i++)
set_page_refcounted(page + i);
split_page_owner(page, 1 << order);
+ pgalloc_tag_split(page, 1 << order);
split_page_memcg(page, 1 << order);
}
EXPORT_SYMBOL_GPL(split_page);
@@ -4806,6 +4807,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
struct page *last = page + nr;

split_page_owner(page, 1 << order);
+ pgalloc_tag_split(page, 1 << order);
split_page_memcg(page, 1 << order);
while (page < --last)
set_page_refcounted(last);
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 20:08:53

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 21/36] lib: add codetag reference into slabobj_ext

To store code tag for every slab object, a codetag reference is embedded
into slabobj_ext when CONFIG_MEM_ALLOC_PROFILING=y.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/memcontrol.h | 5 +++++
lib/Kconfig.debug | 1 +
2 files changed, 6 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f3584e98b640..2b010316016c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1653,7 +1653,12 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
* if MEMCG_DATA_OBJEXTS is set.
*/
struct slabobj_ext {
+#ifdef CONFIG_MEMCG_KMEM
struct obj_cgroup *objcg;
+#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ union codetag_ref ref;
+#endif
} __aligned(8);

static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7bbdb0ddb011..9ecfcdb54417 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -979,6 +979,7 @@ config MEM_ALLOC_PROFILING
depends on !DEBUG_FORCE_WEAK_PER_CPU
select CODE_TAGGING
select PAGE_EXTENSION
+ select SLAB_OBJ_EXT
help
Track allocation source code and record total allocation size
initiated at that code location. The mechanism can be used to track
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 20:13:02

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 28/36] mm: percpu: enable per-cpu allocation tagging

Redefine __alloc_percpu, __alloc_percpu_gfp and __alloc_reserved_percpu
to record allocations and deallocations done by these functions.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/percpu.h | 23 ++++++++++-----
mm/percpu.c | 64 +++++-------------------------------------
2 files changed, 23 insertions(+), 64 deletions(-)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 62b5eb45bd89..e54921c79c9a 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -2,6 +2,7 @@
#ifndef __LINUX_PERCPU_H
#define __LINUX_PERCPU_H

+#include <linux/alloc_tag.h>
#include <linux/mmdebug.h>
#include <linux/preempt.h>
#include <linux/smp.h>
@@ -9,6 +10,7 @@
#include <linux/pfn.h>
#include <linux/init.h>
#include <linux/cleanup.h>
+#include <linux/sched.h>

#include <asm/percpu.h>

@@ -125,7 +127,6 @@ extern int __init pcpu_page_first_chunk(size_t reserved_size,
pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
#endif

-extern void __percpu *__alloc_reserved_percpu(size_t size, size_t align) __alloc_size(1);
extern bool __is_kernel_percpu_address(unsigned long addr, unsigned long *can_addr);
extern bool is_kernel_percpu_address(unsigned long addr);

@@ -133,14 +134,16 @@ extern bool is_kernel_percpu_address(unsigned long addr);
extern void __init setup_per_cpu_areas(void);
#endif

-extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp) __alloc_size(1);
-extern void __percpu *__alloc_percpu(size_t size, size_t align) __alloc_size(1);
-extern void free_percpu(void __percpu *__pdata);
+extern void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
+ gfp_t gfp) __alloc_size(1);
extern size_t pcpu_alloc_size(void __percpu *__pdata);

-DEFINE_FREE(free_percpu, void __percpu *, free_percpu(_T))
-
-extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
+#define __alloc_percpu_gfp(_size, _align, _gfp) \
+ alloc_hooks(pcpu_alloc_noprof(_size, _align, false, _gfp))
+#define __alloc_percpu(_size, _align) \
+ alloc_hooks(pcpu_alloc_noprof(_size, _align, false, GFP_KERNEL))
+#define __alloc_reserved_percpu(_size, _align) \
+ alloc_hooks(pcpu_alloc_noprof(_size, _align, true, GFP_KERNEL))

#define alloc_percpu_gfp(type, gfp) \
(typeof(type) __percpu *)__alloc_percpu_gfp(sizeof(type), \
@@ -149,6 +152,12 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
(typeof(type) __percpu *)__alloc_percpu(sizeof(type), \
__alignof__(type))

+extern void free_percpu(void __percpu *__pdata);
+
+DEFINE_FREE(free_percpu, void __percpu *, free_percpu(_T))
+
+extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
+
extern unsigned long pcpu_nr_pages(void);

#endif /* __LINUX_PERCPU_H */
diff --git a/mm/percpu.c b/mm/percpu.c
index 578531ea1f43..2badcc5e0e71 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1726,7 +1726,7 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
#endif

/**
- * pcpu_alloc - the percpu allocator
+ * pcpu_alloc_noprof - the percpu allocator
* @size: size of area to allocate in bytes
* @align: alignment of area (max PAGE_SIZE)
* @reserved: allocate from the reserved chunk if available
@@ -1740,7 +1740,7 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
* RETURNS:
* Percpu pointer to the allocated area on success, NULL on failure.
*/
-static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
+void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
gfp_t gfp)
{
gfp_t pcpu_gfp;
@@ -1907,6 +1907,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,

pcpu_memcg_post_alloc_hook(objcg, chunk, off, size);

+ pcpu_alloc_tag_alloc_hook(chunk, off, size);
+
return ptr;

fail_unlock:
@@ -1935,61 +1937,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,

return NULL;
}
-
-/**
- * __alloc_percpu_gfp - allocate dynamic percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- * @gfp: allocation flags
- *
- * Allocate zero-filled percpu area of @size bytes aligned at @align. If
- * @gfp doesn't contain %GFP_KERNEL, the allocation doesn't block and can
- * be called from any context but is a lot more likely to fail. If @gfp
- * has __GFP_NOWARN then no warning will be triggered on invalid or failed
- * allocation requests.
- *
- * RETURNS:
- * Percpu pointer to the allocated area on success, NULL on failure.
- */
-void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp)
-{
- return pcpu_alloc(size, align, false, gfp);
-}
-EXPORT_SYMBOL_GPL(__alloc_percpu_gfp);
-
-/**
- * __alloc_percpu - allocate dynamic percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- *
- * Equivalent to __alloc_percpu_gfp(size, align, %GFP_KERNEL).
- */
-void __percpu *__alloc_percpu(size_t size, size_t align)
-{
- return pcpu_alloc(size, align, false, GFP_KERNEL);
-}
-EXPORT_SYMBOL_GPL(__alloc_percpu);
-
-/**
- * __alloc_reserved_percpu - allocate reserved percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- *
- * Allocate zero-filled percpu area of @size bytes aligned at @align
- * from reserved percpu area if arch has set it up; otherwise,
- * allocation is served from the same dynamic area. Might sleep.
- * Might trigger writeouts.
- *
- * CONTEXT:
- * Does GFP_KERNEL allocation.
- *
- * RETURNS:
- * Percpu pointer to the allocated area on success, NULL on failure.
- */
-void __percpu *__alloc_reserved_percpu(size_t size, size_t align)
-{
- return pcpu_alloc(size, align, true, GFP_KERNEL);
-}
+EXPORT_SYMBOL_GPL(pcpu_alloc_noprof);

/**
* pcpu_balance_free - manage the amount of free chunks
@@ -2328,6 +2276,8 @@ void free_percpu(void __percpu *ptr)
spin_lock_irqsave(&pcpu_lock, flags);
size = pcpu_free_area(chunk, off);

+ pcpu_alloc_tag_free_hook(chunk, off, size);
+
pcpu_memcg_free_hook(chunk, off, size);

/*
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 20:14:34

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 31/36] lib: add memory allocations report in show_mem()

Include allocations in show_mem reports.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/alloc_tag.h | 7 +++++++
include/linux/codetag.h | 1 +
lib/alloc_tag.c | 38 ++++++++++++++++++++++++++++++++++++++
lib/codetag.c | 5 +++++
mm/show_mem.c | 26 ++++++++++++++++++++++++++
5 files changed, 77 insertions(+)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 29636719b276..85a24a027403 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,6 +30,13 @@ struct alloc_tag {

#ifdef CONFIG_MEM_ALLOC_PROFILING

+struct codetag_bytes {
+ struct codetag *ct;
+ s64 bytes;
+};
+
+size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep);
+
static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
{
return container_of(ct, struct alloc_tag, ct);
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index bfd0ba5c4185..c2a579ccd455 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -61,6 +61,7 @@ struct codetag_iterator {
}

void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
+bool codetag_trylock_module_list(struct codetag_type *cttype);
struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
struct codetag *codetag_next_ct(struct codetag_iterator *iter);

diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index cb5adec4b2e2..ec54f29482dc 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -86,6 +86,44 @@ static const struct seq_operations allocinfo_seq_op = {
.show = allocinfo_show,
};

+size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep)
+{
+ struct codetag_iterator iter;
+ struct codetag *ct;
+ struct codetag_bytes n;
+ unsigned int i, nr = 0;
+
+ if (can_sleep)
+ codetag_lock_module_list(alloc_tag_cttype, true);
+ else if (!codetag_trylock_module_list(alloc_tag_cttype))
+ return 0;
+
+ iter = codetag_get_ct_iter(alloc_tag_cttype);
+ while ((ct = codetag_next_ct(&iter))) {
+ struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct));
+
+ n.ct = ct;
+ n.bytes = counter.bytes;
+
+ for (i = 0; i < nr; i++)
+ if (n.bytes > tags[i].bytes)
+ break;
+
+ if (i < count) {
+ nr -= nr == count;
+ memmove(&tags[i + 1],
+ &tags[i],
+ sizeof(tags[0]) * (nr - i));
+ nr++;
+ tags[i] = n;
+ }
+ }
+
+ codetag_lock_module_list(alloc_tag_cttype, false);
+
+ return nr;
+}
+
static void __init procfs_init(void)
{
proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op);
diff --git a/lib/codetag.c b/lib/codetag.c
index b13412ca57cc..7b39cec9648a 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -36,6 +36,11 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
up_read(&cttype->mod_lock);
}

+bool codetag_trylock_module_list(struct codetag_type *cttype)
+{
+ return down_read_trylock(&cttype->mod_lock) != 0;
+}
+
struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
{
struct codetag_iterator iter = {
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 8dcfafbd283c..1e41f8d6e297 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -423,4 +423,30 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
#ifdef CONFIG_MEMORY_FAILURE
printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ {
+ struct codetag_bytes tags[10];
+ size_t i, nr;
+
+ nr = alloc_tag_top_users(tags, ARRAY_SIZE(tags), false);
+ if (nr) {
+ printk(KERN_NOTICE "Memory allocations:\n");
+ for (i = 0; i < nr; i++) {
+ struct codetag *ct = tags[i].ct;
+ struct alloc_tag *tag = ct_to_alloc_tag(ct);
+ struct alloc_tag_counters counter = alloc_tag_read(tag);
+
+ /* Same as alloc_tag_to_text() but w/o intermediate buffer */
+ if (ct->modname)
+ printk(KERN_NOTICE "%12lli %8llu %s:%u [%s] func:%s\n",
+ counter.bytes, counter.calls, ct->filename,
+ ct->lineno, ct->modname, ct->function);
+ else
+ printk(KERN_NOTICE "%12lli %8llu %s:%u func:%s\n",
+ counter.bytes, counter.calls, ct->filename,
+ ct->lineno, ct->function);
+ }
+ }
+ }
+#endif
}
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 20:16:42

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 32/36] codetag: debug: skip objext checking when it's for objext itself

objext objects are created with __GFP_NO_OBJ_EXT flag and therefore have
no corresponding objext themselves (otherwise we would get an infinite
recursion). When freeing these objects their codetag will be empty and
when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled this will lead to false
warnings. Introduce CODETAG_EMPTY special codetag value to mark
allocations which intentionally lack codetag to avoid these warnings.
Set objext codetags to CODETAG_EMPTY before freeing to indicate that
the codetag is expected to be empty.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/alloc_tag.h | 26 ++++++++++++++++++++++++++
mm/slub.c | 33 +++++++++++++++++++++++++++++++++
2 files changed, 59 insertions(+)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 85a24a027403..4a3fc865d878 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -28,6 +28,27 @@ struct alloc_tag {
struct alloc_tag_counters __percpu *counters;
} __aligned(8);

+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+#define CODETAG_EMPTY ((void *)1)
+
+static inline bool is_codetag_empty(union codetag_ref *ref)
+{
+ return ref->ct == CODETAG_EMPTY;
+}
+
+static inline void set_codetag_empty(union codetag_ref *ref)
+{
+ if (ref)
+ ref->ct = CODETAG_EMPTY;
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
+static inline bool is_codetag_empty(union codetag_ref *ref) { return false; }
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
#ifdef CONFIG_MEM_ALLOC_PROFILING

struct codetag_bytes {
@@ -91,6 +112,11 @@ static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes)
if (!ref || !ref->ct)
return;

+ if (is_codetag_empty(ref)) {
+ ref->ct = NULL;
+ return;
+ }
+
tag = ct_to_alloc_tag(ref->ct);

this_cpu_sub(tag->counters->bytes, bytes);
diff --git a/mm/slub.c b/mm/slub.c
index 920b24b4140e..3e41d45f9fa4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1883,6 +1883,30 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)

#ifdef CONFIG_SLAB_OBJ_EXT

+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
+{
+ struct slabobj_ext *slab_exts;
+ struct slab *obj_exts_slab;
+
+ obj_exts_slab = virt_to_slab(obj_exts);
+ slab_exts = slab_obj_exts(obj_exts_slab);
+ if (slab_exts) {
+ unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
+ obj_exts_slab, obj_exts);
+ /* codetag should be NULL */
+ WARN_ON(slab_exts[offs].ref.ct);
+ set_codetag_empty(&slab_exts[offs].ref);
+ }
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
+static inline void mark_objexts_empty(struct slabobj_ext *obj_exts) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
/*
* The allocated objcg pointers array is not accounted directly.
* Moreover, it should not come from DMA buffer and is not readily
@@ -1923,6 +1947,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
* assign slabobj_exts in parallel. In this case the existing
* objcg vector should be reused.
*/
+ mark_objexts_empty(vec);
kfree(vec);
return 0;
}
@@ -1939,6 +1964,14 @@ static inline void free_slab_obj_exts(struct slab *slab)
if (!obj_exts)
return;

+ /*
+ * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
+ * corresponding extension will be NULL. alloc_tag_sub() will throw a
+ * warning if slab has extensions but the extension of an object is
+ * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
+ * the extension for obj_exts is expected to be NULL.
+ */
+ mark_objexts_empty(obj_exts);
kfree(obj_exts);
slab->obj_exts = 0;
}
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 20:19:13

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v4 36/36] memprofiling: Documentation

From: Kent Overstreet <[email protected]>

Provide documentation for memory allocation profiling.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
Documentation/mm/allocation-profiling.rst | 86 +++++++++++++++++++++++
1 file changed, 86 insertions(+)
create mode 100644 Documentation/mm/allocation-profiling.rst

diff --git a/Documentation/mm/allocation-profiling.rst b/Documentation/mm/allocation-profiling.rst
new file mode 100644
index 000000000000..2bcbd9e51fe4
--- /dev/null
+++ b/Documentation/mm/allocation-profiling.rst
@@ -0,0 +1,86 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+MEMORY ALLOCATION PROFILING
+===========================
+
+Low overhead (suitable for production) accounting of all memory allocations,
+tracked by file and line number.
+
+Usage:
+kconfig options:
+ - CONFIG_MEM_ALLOC_PROFILING
+ - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
+ - CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ adds warnings for allocations that weren't accounted because of a
+ missing annotation
+
+Boot parameter:
+ sysctl.vm.mem_profiling=1
+
+sysctl:
+ /proc/sys/vm/mem_profiling
+
+Runtime info:
+ /proc/allocinfo
+
+Example output:
+ root@moria-kvm:~# sort -g /proc/allocinfo|tail|numfmt --to=iec
+ 2.8M 22648 fs/kernfs/dir.c:615 func:__kernfs_new_node
+ 3.8M 953 mm/memory.c:4214 func:alloc_anon_folio
+ 4.0M 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
+ 4.1M 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
+ 6.0M 1532 mm/filemap.c:1919 func:__filemap_get_folio
+ 8.8M 2785 kernel/fork.c:307 func:alloc_thread_stack_node
+ 13M 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
+ 14M 3520 mm/mm_init.c:2530 func:alloc_large_system_hash
+ 15M 3656 mm/readahead.c:247 func:page_cache_ra_unbounded
+ 55M 4887 mm/slub.c:2259 func:alloc_slab_page
+ 122M 31168 mm/page_ext.c:270 func:alloc_page_ext
+===================
+Theory of operation
+===================
+
+Memory allocation profiling builds off of code tagging, which is a library for
+declaring static structs (that typcially describe a file and line number in
+some way, hence code tagging) and then finding and operating on them at runtime
+- i.e. iterating over them to print them in debugfs/procfs.
+
+To add accounting for an allocation call, we replace it with a macro
+invocation, alloc_hooks(), that
+ - declares a code tag
+ - stashes a pointer to it in task_struct
+ - calls the real allocation function
+ - and finally, restores the task_struct alloc tag pointer to its previous value.
+
+This allows for alloc_hooks() calls to be nested, with the most recent one
+taking effect. This is important for allocations internal to the mm/ code that
+do not properly belong to the outer allocation context and should be counted
+separately: for example, slab object extension vectors, or when the slab
+allocates pages from the page allocator.
+
+Thus, proper usage requires determining which function in an allocation call
+stack should be tagged. There are many helper functions that essentially wrap
+e.g. kmalloc() and do a little more work, then are called in multiple places;
+we'll generally want the accounting to happen in the callers of these helpers,
+not in the helpers themselves.
+
+To fix up a given helper, for example foo(), do the following:
+ - switch its allocation call to the _noprof() version, e.g. kmalloc_noprof()
+ - rename it to foo_noprof()
+ - define a macro version of foo() like so:
+ #define foo(...) alloc_hooks(foo_noprof(__VA_ARGS__))
+
+It's also possible to stash a pointer to an alloc tag in your own data structures.
+
+Do this when you're implementing a generic data structure that does allocations
+"on behalf of" some other code - for example, the rhashtable code. This way,
+instead of seeing a large line in /proc/allocinfo for rhashtable.c, we can
+break it out by rhashtable type.
+
+To do so:
+ - Hook your data structure's init function, like any other allocation function
+ - Within your init function, use the convenience macro alloc_tag_record() to
+ record alloc tag in your data structure.
+ - Then, use the following form for your allocations:
+ alloc_hooks_tag(ht->your_saved_tag, kmalloc_noprof(...))
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-21 21:10:11

by Pasha Tatashin

[permalink] [raw]
Subject: Re: [PATCH v4 01/36] fix missing vmalloc.h includes

On Wed, Feb 21, 2024 at 2:40 PM Suren Baghdasaryan <[email protected]> wrote:
>
> From: Kent Overstreet <[email protected]>
>
> The next patch drops vmalloc.h from a system header in order to fix
> a circular dependency; this adds it to all the files that were pulling
> it in implicitly.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Signed-off-by: Suren Baghdasaryan <[email protected]>

Reviewed-by: Pasha Tatashin <[email protected]>

2024-02-21 21:11:58

by Pasha Tatashin

[permalink] [raw]
Subject: Re: [PATCH v4 02/36] asm-generic/io.h: Kill vmalloc.h dependency

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <[email protected]> wrote:
>
> From: Kent Overstreet <[email protected]>
>
> Needed to avoid a new circular dependency with the memory allocation
> profiling series.
>
> Naturally, a whole bunch of files needed to include vmalloc.h that were
> previously getting it implicitly.
>
> Signed-off-by: Kent Overstreet <[email protected]>

Reviewed-by: Pasha Tatashin <[email protected]>

2024-02-21 21:16:47

by Pasha Tatashin

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <[email protected]> wrote:
>
> From: Kent Overstreet <[email protected]>
>
> It seems we need to be more forceful with the compiler on this one.
> This is done for performance reasons only.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>
> ---
> mm/slub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 2ef88bbf56a3..d31b03a8d9d5 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
> return !kasan_slab_free(s, x, init);
> }
>
> -static inline bool slab_free_freelist_hook(struct kmem_cache *s,
> +static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,

__fastpath_inline seems to me more appropriate here. It prioritizes
memory vs performance.

> void **head, void **tail,
> int *cnt)
> {
> --
> 2.44.0.rc0.258.g7320e95886-goog
>

2024-02-22 00:13:11

by Pasha Tatashin

[permalink] [raw]
Subject: Re: [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <[email protected]> wrote:
>
> Introduce objext_flags to store additional objext flags unrelated to memcg.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>

Reviewed-by: Pasha Tatashin <[email protected]>

2024-02-22 10:01:39

by Alice Ryhl

[permalink] [raw]
Subject: Re: [PATCH v4 24/36] rust: Add a rust helper for krealloc()

On Wed, Feb 21, 2024 at 8:41 PM Suren Baghdasaryan <[email protected]> wrote:
>
> From: Kent Overstreet <[email protected]>
>
> Memory allocation profiling is turning krealloc() into a nontrivial
> macro - so for now, we need a helper for it.
>
> Until we have proper support on the rust side for memory allocation
> profiling this does mean that all Rust allocations will be accounted to
> the helper.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Cc: Miguel Ojeda <[email protected]>
> Cc: Alex Gaynor <[email protected]>
> Cc: Wedson Almeida Filho <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Gary Guo <[email protected]>
> Cc: "Björn Roy Baron" <[email protected]>
> Cc: Benno Lossin <[email protected]>
> Cc: Andreas Hindborg <[email protected]>
> Cc: Alice Ryhl <[email protected]>
> Cc: [email protected]
> Signed-off-by: Suren Baghdasaryan <[email protected]>

Currently, the Rust build doesn't work throughout the entire series
since there are some commits where krealloc is missing before you
introduce the helper. If you introduce the helper first before
krealloc stops being an exported function, then the Rust build should
work throughout the entire series. (Having both the helper and the
exported function at the same time is not a problem.)

With the patch reordered:

Reviewed-by: Alice Ryhl <[email protected]>

Alice

2024-02-26 14:34:15

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline

On 2/24/24 03:02, Suren Baghdasaryan wrote:
> On Wed, Feb 21, 2024 at 1:16 PM Pasha Tatashin
> <[email protected]> wrote:
>>
>> On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <[email protected]> wrote:
>> >
>> > From: Kent Overstreet <[email protected]>
>> >
>> > It seems we need to be more forceful with the compiler on this one.
>> > This is done for performance reasons only.
>> >
>> > Signed-off-by: Kent Overstreet <[email protected]>
>> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>> > Reviewed-by: Kees Cook <[email protected]>
>> > ---
>> > mm/slub.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/mm/slub.c b/mm/slub.c
>> > index 2ef88bbf56a3..d31b03a8d9d5 100644
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
>> > return !kasan_slab_free(s, x, init);
>> > }
>> >
>> > -static inline bool slab_free_freelist_hook(struct kmem_cache *s,
>> > +static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
>>
>> __fastpath_inline seems to me more appropriate here. It prioritizes
>> memory vs performance.
>
> Hmm. AFAIKT this function is used only in one place and we do not add
> any additional users, so I don't think changing to __fastpath_inline
> here would gain us anything.

It would have been more future-proof and self-documenting. But I don't insist.

Reviewed-by: Vlastimil Babka <[email protected]>

>>
>> > void **head, void **tail,
>> > int *cnt)
>> > {
>> > --
>> > 2.44.0.rc0.258.g7320e95886-goog
>> >


2024-02-26 16:10:32

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline

On Mon, Feb 26, 2024 at 7:21 AM Pasha Tatashin
<[email protected]> wrote:
>
>
>
> On Mon, Feb 26, 2024, 9:31 AM Vlastimil Babka <[email protected]> wrote:
>>
>> On 2/24/24 03:02, Suren Baghdasaryan wrote:
>> > On Wed, Feb 21, 2024 at 1:16 PM Pasha Tatashin
>> > <[email protected]> wrote:
>> >>
>> >> On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <[email protected]> wrote:
>> >> >
>> >> > From: Kent Overstreet <[email protected]>
>> >> >
>> >> > It seems we need to be more forceful with the compiler on this one.
>> >> > This is done for performance reasons only.
>> >> >
>> >> > Signed-off-by: Kent Overstreet <[email protected]>
>> >> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>> >> > Reviewed-by: Kees Cook <[email protected]>
>> >> > ---
>> >> > mm/slub.c | 2 +-
>> >> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/mm/slub.c b/mm/slub.c
>> >> > index 2ef88bbf56a3..d31b03a8d9d5 100644
>> >> > --- a/mm/slub.c
>> >> > +++ b/mm/slub.c
>> >> > @@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
>> >> > return !kasan_slab_free(s, x, init);
>> >> > }
>> >> >
>> >> > -static inline bool slab_free_freelist_hook(struct kmem_cache *s,
>> >> > +static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
>> >>
>> >> __fastpath_inline seems to me more appropriate here. It prioritizes
>> >> memory vs performance.
>> >
>> > Hmm. AFAIKT this function is used only in one place and we do not add
>> > any additional users, so I don't think changing to __fastpath_inline
>> > here would gain us anything.
>
>
> For consistency __fastpath_inline makes more sense, but I am ok with or without this change.

Ok, I'll update in the next revision. Thanks!

>
> Reviewed-by: Pasha Tatashin <[email protected]>
>
>>
>> It would have been more future-proof and self-documenting. But I don't insist.
>>
>> Reviewed-by: Vlastimil Babka <[email protected]>
>>
>> >>
>> >> > void **head, void **tail,
>> >> > int *cnt)
>> >> > {
>> >> > --
>> >> > 2.44.0.rc0.258.g7320e95886-goog
>> >> >
>>

2024-02-26 16:58:21

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Introduce objext_flags to store additional objext flags unrelated to memcg.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>


2024-02-26 17:12:02

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Skip freeing module's data section if there are non-zero allocation tags
> because otherwise, once these allocations are freed, the access to their
> code tag would cause UAF.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>

I know that module unloading was never considered really supported etc.
But should we printk something so the admin knows why it didn't unload, and
can go check those outstanding allocations?


2024-02-26 17:56:28

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed

On Mon, Feb 26, 2024 at 8:58 AM Vlastimil Babka <[email protected]> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Skip freeing module's data section if there are non-zero allocation tags
> > because otherwise, once these allocations are freed, the access to their
> > code tag would cause UAF.
> >
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> I know that module unloading was never considered really supported etc.
> But should we printk something so the admin knows why it didn't unload, and
> can go check those outstanding allocations?

Yes, that sounds reasonable. I'll add a pr_warn() in the next version.
Thanks!

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>

2024-02-27 10:11:01

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4 19/36] mm: create new codetag references during page splitting

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> When a high-order page is split into smaller ones, each newly split
> page should get its codetag. The original codetag is reused for these
> pages but it's recorded as 0-byte allocation because original codetag
> already accounts for the original high-order allocated page.

This was v3 but then you refactored (for the better) so the commit log
could reflect it?

> Signed-off-by: Suren Baghdasaryan <[email protected]>

I was going to R-b, but now I recalled the trickiness of
__free_pages() for non-compound pages if it loses the race to a
speculative reference. Will the codetag handling work fine there?

> ---
> include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++
> mm/huge_memory.c | 2 ++
> mm/page_alloc.c | 2 ++
> 3 files changed, 34 insertions(+)
>
> diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
> index b49ab955300f..9e6ad8e0e4aa 100644
> --- a/include/linux/pgalloc_tag.h
> +++ b/include/linux/pgalloc_tag.h
> @@ -67,11 +67,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order)
> }
> }
>
> +static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
> +{
> + int i;
> + struct page_ext *page_ext;
> + union codetag_ref *ref;
> + struct alloc_tag *tag;
> +
> + if (!mem_alloc_profiling_enabled())
> + return;
> +
> + page_ext = page_ext_get(page);
> + if (unlikely(!page_ext))
> + return;
> +
> + ref = codetag_ref_from_page_ext(page_ext);
> + if (!ref->ct)
> + goto out;
> +
> + tag = ct_to_alloc_tag(ref->ct);
> + page_ext = page_ext_next(page_ext);
> + for (i = 1; i < nr; i++) {
> + /* Set new reference to point to the original tag */
> + alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag);
> + page_ext = page_ext_next(page_ext);
> + }
> +out:
> + page_ext_put(page_ext);
> +}
> +
> #else /* CONFIG_MEM_ALLOC_PROFILING */
>
> static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
> unsigned int order) {}
> static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
> +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}
>
> #endif /* CONFIG_MEM_ALLOC_PROFILING */
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 94c958f7ebb5..86daae671319 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -38,6 +38,7 @@
> #include <linux/sched/sysctl.h>
> #include <linux/memory-tiers.h>
> #include <linux/compat.h>
> +#include <linux/pgalloc_tag.h>
>
> #include <asm/tlb.h>
> #include <asm/pgalloc.h>
> @@ -2899,6 +2900,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> /* Caller disabled irqs, so they are still disabled here */
>
> split_page_owner(head, nr);
> + pgalloc_tag_split(head, nr);
>
> /* See comment in __split_huge_page_tail() */
> if (PageAnon(head)) {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 58c0e8b948a4..4bc5b4720fee 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2621,6 +2621,7 @@ void split_page(struct page *page, unsigned int order)
> for (i = 1; i < (1 << order); i++)
> set_page_refcounted(page + i);
> split_page_owner(page, 1 << order);
> + pgalloc_tag_split(page, 1 << order);
> split_page_memcg(page, 1 << order);
> }
> EXPORT_SYMBOL_GPL(split_page);
> @@ -4806,6 +4807,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
> struct page *last = page + nr;
>
> split_page_owner(page, 1 << order);
> + pgalloc_tag_split(page, 1 << order);
> split_page_memcg(page, 1 << order);
> while (page < --last)
> set_page_refcounted(last);

2024-02-27 10:28:06

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4 21/36] lib: add codetag reference into slabobj_ext

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> To store code tag for every slab object, a codetag reference is embedded
> into slabobj_ext when CONFIG_MEM_ALLOC_PROFILING=y.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> Co-developed-by: Kent Overstreet <[email protected]>
> Signed-off-by: Kent Overstreet <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

> ---
> include/linux/memcontrol.h | 5 +++++
> lib/Kconfig.debug | 1 +
> 2 files changed, 6 insertions(+)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index f3584e98b640..2b010316016c 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1653,7 +1653,12 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
> * if MEMCG_DATA_OBJEXTS is set.
> */
> struct slabobj_ext {
> +#ifdef CONFIG_MEMCG_KMEM
> struct obj_cgroup *objcg;
> +#endif
> +#ifdef CONFIG_MEM_ALLOC_PROFILING
> + union codetag_ref ref;
> +#endif
> } __aligned(8);
>
> static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 7bbdb0ddb011..9ecfcdb54417 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -979,6 +979,7 @@ config MEM_ALLOC_PROFILING
> depends on !DEBUG_FORCE_WEAK_PER_CPU
> select CODE_TAGGING
> select PAGE_EXTENSION
> + select SLAB_OBJ_EXT
> help
> Track allocation source code and record total allocation size
> initiated at that code location. The mechanism can be used to track

2024-02-27 13:19:05

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4 31/36] lib: add memory allocations report in show_mem()

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Include allocations in show_mem reports.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Signed-off-by: Suren Baghdasaryan <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

Nit: there's pr_notice() that's shorter than printk(KERN_NOTICE

> ---
> include/linux/alloc_tag.h | 7 +++++++
> include/linux/codetag.h | 1 +
> lib/alloc_tag.c | 38 ++++++++++++++++++++++++++++++++++++++
> lib/codetag.c | 5 +++++
> mm/show_mem.c | 26 ++++++++++++++++++++++++++
> 5 files changed, 77 insertions(+)
>
> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> index 29636719b276..85a24a027403 100644
> --- a/include/linux/alloc_tag.h
> +++ b/include/linux/alloc_tag.h
> @@ -30,6 +30,13 @@ struct alloc_tag {
>
> #ifdef CONFIG_MEM_ALLOC_PROFILING
>
> +struct codetag_bytes {
> + struct codetag *ct;
> + s64 bytes;
> +};
> +
> +size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep);
> +
> static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
> {
> return container_of(ct, struct alloc_tag, ct);
> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> index bfd0ba5c4185..c2a579ccd455 100644
> --- a/include/linux/codetag.h
> +++ b/include/linux/codetag.h
> @@ -61,6 +61,7 @@ struct codetag_iterator {
> }
>
> void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
> +bool codetag_trylock_module_list(struct codetag_type *cttype);
> struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
> struct codetag *codetag_next_ct(struct codetag_iterator *iter);
>
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index cb5adec4b2e2..ec54f29482dc 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -86,6 +86,44 @@ static const struct seq_operations allocinfo_seq_op = {
> .show = allocinfo_show,
> };
>
> +size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep)
> +{
> + struct codetag_iterator iter;
> + struct codetag *ct;
> + struct codetag_bytes n;
> + unsigned int i, nr = 0;
> +
> + if (can_sleep)
> + codetag_lock_module_list(alloc_tag_cttype, true);
> + else if (!codetag_trylock_module_list(alloc_tag_cttype))
> + return 0;
> +
> + iter = codetag_get_ct_iter(alloc_tag_cttype);
> + while ((ct = codetag_next_ct(&iter))) {
> + struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct));
> +
> + n.ct = ct;
> + n.bytes = counter.bytes;
> +
> + for (i = 0; i < nr; i++)
> + if (n.bytes > tags[i].bytes)
> + break;
> +
> + if (i < count) {
> + nr -= nr == count;
> + memmove(&tags[i + 1],
> + &tags[i],
> + sizeof(tags[0]) * (nr - i));
> + nr++;
> + tags[i] = n;
> + }
> + }
> +
> + codetag_lock_module_list(alloc_tag_cttype, false);
> +
> + return nr;
> +}
> +
> static void __init procfs_init(void)
> {
> proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op);
> diff --git a/lib/codetag.c b/lib/codetag.c
> index b13412ca57cc..7b39cec9648a 100644
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -36,6 +36,11 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
> up_read(&cttype->mod_lock);
> }
>
> +bool codetag_trylock_module_list(struct codetag_type *cttype)
> +{
> + return down_read_trylock(&cttype->mod_lock) != 0;
> +}
> +
> struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
> {
> struct codetag_iterator iter = {
> diff --git a/mm/show_mem.c b/mm/show_mem.c
> index 8dcfafbd283c..1e41f8d6e297 100644
> --- a/mm/show_mem.c
> +++ b/mm/show_mem.c
> @@ -423,4 +423,30 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
> #ifdef CONFIG_MEMORY_FAILURE
> printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
> #endif
> +#ifdef CONFIG_MEM_ALLOC_PROFILING
> + {
> + struct codetag_bytes tags[10];
> + size_t i, nr;
> +
> + nr = alloc_tag_top_users(tags, ARRAY_SIZE(tags), false);
> + if (nr) {
> + printk(KERN_NOTICE "Memory allocations:\n");
> + for (i = 0; i < nr; i++) {
> + struct codetag *ct = tags[i].ct;
> + struct alloc_tag *tag = ct_to_alloc_tag(ct);
> + struct alloc_tag_counters counter = alloc_tag_read(tag);
> +
> + /* Same as alloc_tag_to_text() but w/o intermediate buffer */
> + if (ct->modname)
> + printk(KERN_NOTICE "%12lli %8llu %s:%u [%s] func:%s\n",
> + counter.bytes, counter.calls, ct->filename,
> + ct->lineno, ct->modname, ct->function);
> + else
> + printk(KERN_NOTICE "%12lli %8llu %s:%u func:%s\n",
> + counter.bytes, counter.calls, ct->filename,
> + ct->lineno, ct->function);
> + }
> + }
> + }
> +#endif
> }

2024-02-27 13:43:21

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4 00/36] Memory allocation profiling

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Overview:
> Low overhead [1] per-callsite memory allocation profiling. Not just for
> debug kernels, overhead low enough to be deployed in production.
>
> Example output:
> root@moria-kvm:~# sort -rn /proc/allocinfo
> 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext
> 56373248 4737 mm/slub.c:2259 func:alloc_slab_page
> 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded
> 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash
> 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
> 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio
> 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node
> 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
> 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
> 3940352 962 mm/memory.c:4214 func:alloc_anon_folio
> 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
> ...
>
> Since v3:
> - Dropped patch changing string_get_size() [2] as not needed
> - Dropped patch modifying xfs allocators [3] as non needed,
> per Dave Chinner
> - Added Reviewed-by, per Kees Cook
> - Moved prepare_slab_obj_exts_hook() and alloc_slab_obj_exts() where they
> are used, per Vlastimil Babka
> - Fixed SLAB_NO_OBJ_EXT definition to use unused bit, per Vlastimil Babka
> - Refactored patch [4] into other patches, per Vlastimil Babka
> - Replaced snprintf() with seq_buf_printf(), per Kees Cook
> - Changed output to report bytes, per Andrew Morton and Pasha Tatashin
> - Changed output to report [module] only for loadable modules,
> per Vlastimil Babka
> - Moved mem_alloc_profiling_enabled() check earlier, per Vlastimil Babka
> - Changed the code to handle page splitting to be more understandable,
> per Vlastimil Babka
> - Moved alloc_tagging_slab_free_hook(), mark_objexts_empty(),
> mark_failed_objexts_alloc() and handle_failed_objexts_alloc(),
> per Vlastimil Babka
> - Fixed loss of __alloc_size(1, 2) in kvmalloc functions,
> per Vlastimil Babka
> - Refactored the code in show_mem() to avoid memory allocations,
> per Michal Hocko
> - Changed to trylock in show_mem() to avoid blocking in atomic context,
> per Tetsuo Handa
> - Added mm mailing list into MAINTAINERS, per Kees Cook
> - Added base commit SHA, per Andy Shevchenko
> - Added a patch with documentation, per Jani Nikula
> - Fixed 0day bugs
> - Added benchmark results [5], per Steven Rostedt
> - Rebased over Linux 6.8-rc5
>
> Items not yet addressed:
> - An early_boot option to prevent pageext overhead. We are looking into
> ways for using the same sysctr instead of adding additional early boot
> parameter.

I have reviewed the parts that integrate the tracking with page and slab
allocators, and besides some details to improve it seems ok to me. The
early boot option seems coming so that might eventually be suitable for
build-time enablement in a distro kernel.

The macros (and their potential spread to upper layers to keep the
information useful enough) are of course ugly, but guess it can't be
currently helped and I'm unable to decide whether it's worth it or not.
That's up to those providing their success stories I guess. If there's
at least a path ahead to replace that part with compiler support in the
future, great. So I'm not against merging this. BTW, do we know Linus's
opinion on the macros approach?

2024-02-27 16:11:17

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 00/36] Memory allocation profiling

On Tue, Feb 27, 2024 at 5:35 AM Vlastimil Babka <[email protected]> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Overview:
> > Low overhead [1] per-callsite memory allocation profiling. Not just for
> > debug kernels, overhead low enough to be deployed in production.
> >
> > Example output:
> > root@moria-kvm:~# sort -rn /proc/allocinfo
> > 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext
> > 56373248 4737 mm/slub.c:2259 func:alloc_slab_page
> > 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded
> > 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash
> > 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
> > 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio
> > 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node
> > 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
> > 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
> > 3940352 962 mm/memory.c:4214 func:alloc_anon_folio
> > 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
> > ...
> >
> > Since v3:
> > - Dropped patch changing string_get_size() [2] as not needed
> > - Dropped patch modifying xfs allocators [3] as non needed,
> > per Dave Chinner
> > - Added Reviewed-by, per Kees Cook
> > - Moved prepare_slab_obj_exts_hook() and alloc_slab_obj_exts() where they
> > are used, per Vlastimil Babka
> > - Fixed SLAB_NO_OBJ_EXT definition to use unused bit, per Vlastimil Babka
> > - Refactored patch [4] into other patches, per Vlastimil Babka
> > - Replaced snprintf() with seq_buf_printf(), per Kees Cook
> > - Changed output to report bytes, per Andrew Morton and Pasha Tatashin
> > - Changed output to report [module] only for loadable modules,
> > per Vlastimil Babka
> > - Moved mem_alloc_profiling_enabled() check earlier, per Vlastimil Babka
> > - Changed the code to handle page splitting to be more understandable,
> > per Vlastimil Babka
> > - Moved alloc_tagging_slab_free_hook(), mark_objexts_empty(),
> > mark_failed_objexts_alloc() and handle_failed_objexts_alloc(),
> > per Vlastimil Babka
> > - Fixed loss of __alloc_size(1, 2) in kvmalloc functions,
> > per Vlastimil Babka
> > - Refactored the code in show_mem() to avoid memory allocations,
> > per Michal Hocko
> > - Changed to trylock in show_mem() to avoid blocking in atomic context,
> > per Tetsuo Handa
> > - Added mm mailing list into MAINTAINERS, per Kees Cook
> > - Added base commit SHA, per Andy Shevchenko
> > - Added a patch with documentation, per Jani Nikula
> > - Fixed 0day bugs
> > - Added benchmark results [5], per Steven Rostedt
> > - Rebased over Linux 6.8-rc5
> >
> > Items not yet addressed:
> > - An early_boot option to prevent pageext overhead. We are looking into
> > ways for using the same sysctr instead of adding additional early boot
> > parameter.
>
> I have reviewed the parts that integrate the tracking with page and slab
> allocators, and besides some details to improve it seems ok to me. The
> early boot option seems coming so that might eventually be suitable for
> build-time enablement in a distro kernel.

Thanks for reviewing Vlastimil!

>
> The macros (and their potential spread to upper layers to keep the
> information useful enough) are of course ugly, but guess it can't be
> currently helped and I'm unable to decide whether it's worth it or not.
> That's up to those providing their success stories I guess. If there's
> at least a path ahead to replace that part with compiler support in the
> future, great. So I'm not against merging this. BTW, do we know Linus's
> opinion on the macros approach?

We haven't run it by Linus specifically but hopefully we will see a
comment from him on the mailing list at some point.

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>

2024-02-27 16:13:16

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 31/36] lib: add memory allocations report in show_mem()

On Tue, Feb 27, 2024 at 5:18 AM Vlastimil Babka <[email protected]> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Include allocations in show_mem reports.
> >
> > Signed-off-by: Kent Overstreet <[email protected]>
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>
>
> Nit: there's pr_notice() that's shorter than printk(KERN_NOTICE

I used printk() since other parts of show_mem() used it but I can
change if that's preferable.

>
> > ---
> > include/linux/alloc_tag.h | 7 +++++++
> > include/linux/codetag.h | 1 +
> > lib/alloc_tag.c | 38 ++++++++++++++++++++++++++++++++++++++
> > lib/codetag.c | 5 +++++
> > mm/show_mem.c | 26 ++++++++++++++++++++++++++
> > 5 files changed, 77 insertions(+)
> >
> > diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> > index 29636719b276..85a24a027403 100644
> > --- a/include/linux/alloc_tag.h
> > +++ b/include/linux/alloc_tag.h
> > @@ -30,6 +30,13 @@ struct alloc_tag {
> >
> > #ifdef CONFIG_MEM_ALLOC_PROFILING
> >
> > +struct codetag_bytes {
> > + struct codetag *ct;
> > + s64 bytes;
> > +};
> > +
> > +size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep);
> > +
> > static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
> > {
> > return container_of(ct, struct alloc_tag, ct);
> > diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> > index bfd0ba5c4185..c2a579ccd455 100644
> > --- a/include/linux/codetag.h
> > +++ b/include/linux/codetag.h
> > @@ -61,6 +61,7 @@ struct codetag_iterator {
> > }
> >
> > void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
> > +bool codetag_trylock_module_list(struct codetag_type *cttype);
> > struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
> > struct codetag *codetag_next_ct(struct codetag_iterator *iter);
> >
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index cb5adec4b2e2..ec54f29482dc 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -86,6 +86,44 @@ static const struct seq_operations allocinfo_seq_op = {
> > .show = allocinfo_show,
> > };
> >
> > +size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep)
> > +{
> > + struct codetag_iterator iter;
> > + struct codetag *ct;
> > + struct codetag_bytes n;
> > + unsigned int i, nr = 0;
> > +
> > + if (can_sleep)
> > + codetag_lock_module_list(alloc_tag_cttype, true);
> > + else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > + return 0;
> > +
> > + iter = codetag_get_ct_iter(alloc_tag_cttype);
> > + while ((ct = codetag_next_ct(&iter))) {
> > + struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct));
> > +
> > + n.ct = ct;
> > + n.bytes = counter.bytes;
> > +
> > + for (i = 0; i < nr; i++)
> > + if (n.bytes > tags[i].bytes)
> > + break;
> > +
> > + if (i < count) {
> > + nr -= nr == count;
> > + memmove(&tags[i + 1],
> > + &tags[i],
> > + sizeof(tags[0]) * (nr - i));
> > + nr++;
> > + tags[i] = n;
> > + }
> > + }
> > +
> > + codetag_lock_module_list(alloc_tag_cttype, false);
> > +
> > + return nr;
> > +}
> > +
> > static void __init procfs_init(void)
> > {
> > proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op);
> > diff --git a/lib/codetag.c b/lib/codetag.c
> > index b13412ca57cc..7b39cec9648a 100644
> > --- a/lib/codetag.c
> > +++ b/lib/codetag.c
> > @@ -36,6 +36,11 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
> > up_read(&cttype->mod_lock);
> > }
> >
> > +bool codetag_trylock_module_list(struct codetag_type *cttype)
> > +{
> > + return down_read_trylock(&cttype->mod_lock) != 0;
> > +}
> > +
> > struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
> > {
> > struct codetag_iterator iter = {
> > diff --git a/mm/show_mem.c b/mm/show_mem.c
> > index 8dcfafbd283c..1e41f8d6e297 100644
> > --- a/mm/show_mem.c
> > +++ b/mm/show_mem.c
> > @@ -423,4 +423,30 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
> > #ifdef CONFIG_MEMORY_FAILURE
> > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
> > #endif
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING
> > + {
> > + struct codetag_bytes tags[10];
> > + size_t i, nr;
> > +
> > + nr = alloc_tag_top_users(tags, ARRAY_SIZE(tags), false);
> > + if (nr) {
> > + printk(KERN_NOTICE "Memory allocations:\n");
> > + for (i = 0; i < nr; i++) {
> > + struct codetag *ct = tags[i].ct;
> > + struct alloc_tag *tag = ct_to_alloc_tag(ct);
> > + struct alloc_tag_counters counter = alloc_tag_read(tag);
> > +
> > + /* Same as alloc_tag_to_text() but w/o intermediate buffer */
> > + if (ct->modname)
> > + printk(KERN_NOTICE "%12lli %8llu %s:%u [%s] func:%s\n",
> > + counter.bytes, counter.calls, ct->filename,
> > + ct->lineno, ct->modname, ct->function);
> > + else
> > + printk(KERN_NOTICE "%12lli %8llu %s:%u func:%s\n",
> > + counter.bytes, counter.calls, ct->filename,
> > + ct->lineno, ct->function);
> > + }
> > + }
> > + }
> > +#endif
> > }
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>

2024-02-27 16:42:13

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 19/36] mm: create new codetag references during page splitting

On Tue, Feb 27, 2024 at 2:10 AM Vlastimil Babka <[email protected]> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > When a high-order page is split into smaller ones, each newly split
> > page should get its codetag. The original codetag is reused for these
> > pages but it's recorded as 0-byte allocation because original codetag
> > already accounts for the original high-order allocated page.
>
> This was v3 but then you refactored (for the better) so the commit log
> could reflect it?

Yes, technically mechnism didn't change but I should word it better.
Smth like this:

When a high-order page is split into smaller ones, each newly split
page should get its codetag. After the split each split page will be
referencing the original codetag. The codetag's "bytes" counter
remains the same because the amount of allocated memory has not
changed, however the "calls" counter gets increased to keep the
counter correct when these individual pages get freed.

>
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> I was going to R-b, but now I recalled the trickiness of
> __free_pages() for non-compound pages if it loses the race to a
> speculative reference. Will the codetag handling work fine there?

I think so. Each non-compoud page has its individual reference to its
codetag and will decrement it whenever the page is freed. IIUC the
logic in __free_pages(), when it loses race to a speculative
reference it will free all pages except for the first one and the
first one will be freed when the last put_page() happens. If prior to
this all these pages were split from one page then all of them will
have their own reference which points to the same codetag. Every time
one of these pages are freed that codetag's "bytes" and "calls"
counters will be decremented. I think accounting will work correctly
irrespective of where these pages are freed, in __free_pages() or by
put_page().

>
> > ---
> > include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++
> > mm/huge_memory.c | 2 ++
> > mm/page_alloc.c | 2 ++
> > 3 files changed, 34 insertions(+)
> >
> > diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
> > index b49ab955300f..9e6ad8e0e4aa 100644
> > --- a/include/linux/pgalloc_tag.h
> > +++ b/include/linux/pgalloc_tag.h
> > @@ -67,11 +67,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order)
> > }
> > }
> >
> > +static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
> > +{
> > + int i;
> > + struct page_ext *page_ext;
> > + union codetag_ref *ref;
> > + struct alloc_tag *tag;
> > +
> > + if (!mem_alloc_profiling_enabled())
> > + return;
> > +
> > + page_ext = page_ext_get(page);
> > + if (unlikely(!page_ext))
> > + return;
> > +
> > + ref = codetag_ref_from_page_ext(page_ext);
> > + if (!ref->ct)
> > + goto out;
> > +
> > + tag = ct_to_alloc_tag(ref->ct);
> > + page_ext = page_ext_next(page_ext);
> > + for (i = 1; i < nr; i++) {
> > + /* Set new reference to point to the original tag */
> > + alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag);
> > + page_ext = page_ext_next(page_ext);
> > + }
> > +out:
> > + page_ext_put(page_ext);
> > +}
> > +
> > #else /* CONFIG_MEM_ALLOC_PROFILING */
> >
> > static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
> > unsigned int order) {}
> > static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
> > +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}
> >
> > #endif /* CONFIG_MEM_ALLOC_PROFILING */
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 94c958f7ebb5..86daae671319 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -38,6 +38,7 @@
> > #include <linux/sched/sysctl.h>
> > #include <linux/memory-tiers.h>
> > #include <linux/compat.h>
> > +#include <linux/pgalloc_tag.h>
> >
> > #include <asm/tlb.h>
> > #include <asm/pgalloc.h>
> > @@ -2899,6 +2900,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> > /* Caller disabled irqs, so they are still disabled here */
> >
> > split_page_owner(head, nr);
> > + pgalloc_tag_split(head, nr);
> >
> > /* See comment in __split_huge_page_tail() */
> > if (PageAnon(head)) {
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 58c0e8b948a4..4bc5b4720fee 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2621,6 +2621,7 @@ void split_page(struct page *page, unsigned int order)
> > for (i = 1; i < (1 << order); i++)
> > set_page_refcounted(page + i);
> > split_page_owner(page, 1 << order);
> > + pgalloc_tag_split(page, 1 << order);
> > split_page_memcg(page, 1 << order);
> > }
> > EXPORT_SYMBOL_GPL(split_page);
> > @@ -4806,6 +4807,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
> > struct page *last = page + nr;
> >
> > split_page_owner(page, 1 << order);
> > + pgalloc_tag_split(page, 1 << order);
> > split_page_memcg(page, 1 << order);
> > while (page < --last)
> > set_page_refcounted(last);

2024-02-28 08:47:37

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4 19/36] mm: create new codetag references during page splitting

On 2/27/24 17:38, Suren Baghdasaryan wrote:
> On Tue, Feb 27, 2024 at 2:10 AM Vlastimil Babka <[email protected]> wrote:
>>
>> On 2/21/24 20:40, Suren Baghdasaryan wrote:
>> > When a high-order page is split into smaller ones, each newly split
>> > page should get its codetag. The original codetag is reused for these
>> > pages but it's recorded as 0-byte allocation because original codetag
>> > already accounts for the original high-order allocated page.
>>
>> This was v3 but then you refactored (for the better) so the commit log
>> could reflect it?
>
> Yes, technically mechnism didn't change but I should word it better.
> Smth like this:
>
> When a high-order page is split into smaller ones, each newly split
> page should get its codetag. After the split each split page will be
> referencing the original codetag. The codetag's "bytes" counter
> remains the same because the amount of allocated memory has not
> changed, however the "calls" counter gets increased to keep the
> counter correct when these individual pages get freed.

Great, thanks.
The concern with __free_pages() is not really related to splitting, so for
this patch:

Reviewed-by: Vlastimil Babka <[email protected]>

>
>>
>> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>>
>> I was going to R-b, but now I recalled the trickiness of
>> __free_pages() for non-compound pages if it loses the race to a
>> speculative reference. Will the codetag handling work fine there?
>
> I think so. Each non-compoud page has its individual reference to its
> codetag and will decrement it whenever the page is freed. IIUC the
> logic in __free_pages(), when it loses race to a speculative
> reference it will free all pages except for the first one and the

The "tail" pages of this non-compound high-order page will AFAICS not have
code tags assigned, so alloc_tag_sub() will be a no-op (or a warning with
_DEBUG).

> first one will be freed when the last put_page() happens. If prior to
> this all these pages were split from one page then all of them will
> have their own reference which points to the same codetag.

Yeah I'm assuming there's no split before the freeing. This patch about
splitting just reminded me of that tricky freeing scenario.

So IIUC the "else if (!head)" path of __free_pages() will do nothing about
the "tail" pages wrt code tags as there are no code tags.
Then whoever took the speculative "head" page reference will put_page() and
free it, which will end up in alloc_tag_sub(). This will decrement calls
properly, but bytes will become imbalanced, because that put_page() will
pass order-0 worth of bytes - the original order is lost.

Now this might be rare enough that it's not worth fixing if that would be
too complicated, just FYI.


> Every time
> one of these pages are freed that codetag's "bytes" and "calls"
> counters will be decremented. I think accounting will work correctly
> irrespective of where these pages are freed, in __free_pages() or by
> put_page().
>


2024-02-28 17:51:17

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 19/36] mm: create new codetag references during page splitting

On Wed, Feb 28, 2024 at 12:47 AM Vlastimil Babka <[email protected]> wrote:
>
> On 2/27/24 17:38, Suren Baghdasaryan wrote:
> > On Tue, Feb 27, 2024 at 2:10 AM Vlastimil Babka <[email protected]> wrote:
> >>
> >> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> >> > When a high-order page is split into smaller ones, each newly split
> >> > page should get its codetag. The original codetag is reused for these
> >> > pages but it's recorded as 0-byte allocation because original codetag
> >> > already accounts for the original high-order allocated page.
> >>
> >> This was v3 but then you refactored (for the better) so the commit log
> >> could reflect it?
> >
> > Yes, technically mechnism didn't change but I should word it better.
> > Smth like this:
> >
> > When a high-order page is split into smaller ones, each newly split
> > page should get its codetag. After the split each split page will be
> > referencing the original codetag. The codetag's "bytes" counter
> > remains the same because the amount of allocated memory has not
> > changed, however the "calls" counter gets increased to keep the
> > counter correct when these individual pages get freed.
>
> Great, thanks.
> The concern with __free_pages() is not really related to splitting, so for
> this patch:
>
> Reviewed-by: Vlastimil Babka <[email protected]>
>
> >
> >>
> >> > Signed-off-by: Suren Baghdasaryan <[email protected]>
> >>
> >> I was going to R-b, but now I recalled the trickiness of
> >> __free_pages() for non-compound pages if it loses the race to a
> >> speculative reference. Will the codetag handling work fine there?
> >
> > I think so. Each non-compoud page has its individual reference to its
> > codetag and will decrement it whenever the page is freed. IIUC the
> > logic in __free_pages(), when it loses race to a speculative
> > reference it will free all pages except for the first one and the
>
> The "tail" pages of this non-compound high-order page will AFAICS not have
> code tags assigned, so alloc_tag_sub() will be a no-op (or a warning with
> _DEBUG).

Yes, that is correct.

>
> > first one will be freed when the last put_page() happens. If prior to
> > this all these pages were split from one page then all of them will
> > have their own reference which points to the same codetag.
>
> Yeah I'm assuming there's no split before the freeing. This patch about
> splitting just reminded me of that tricky freeing scenario.

Ah, I see. I thought you were talking about a page that was previously split.

>
> So IIUC the "else if (!head)" path of __free_pages() will do nothing about
> the "tail" pages wrt code tags as there are no code tags.
> Then whoever took the speculative "head" page reference will put_page() and
> free it, which will end up in alloc_tag_sub(). This will decrement calls
> properly, but bytes will become imbalanced, because that put_page() will
> pass order-0 worth of bytes - the original order is lost.

Yeah, that's true. put_page() will end up calling
free_unref_page(&folio->page, 0) even if the original order was more
than 0.

>
> Now this might be rare enough that it's not worth fixing if that would be
> too complicated, just FYI.

Yeah. We can fix this by subtracting the "bytes" counter of the "head"
page for all free_the_page(page + (1 << order), order) calls we do
inside __free_pages(). But we can't simply use pgalloc_tag_sub()
because the "calls" counter will get over-decremented (we allocated
all of these pages with one call). I'll need to introduce a new
pgalloc_tag_sub_bytes() API and use it here. I feel it's too targeted
of a solution but OTOH this is a special situation, so maybe it's
acceptable. WDYT?

>
>
> > Every time
> > one of these pages are freed that codetag's "bytes" and "calls"
> > counters will be decremented. I think accounting will work correctly
> > irrespective of where these pages are freed, in __free_pages() or by
> > put_page().
> >
>

2024-02-28 18:28:29

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4 19/36] mm: create new codetag references during page splitting

On 2/28/24 18:50, Suren Baghdasaryan wrote:
> On Wed, Feb 28, 2024 at 12:47 AM Vlastimil Babka <[email protected]> wrote:
>
>>
>> Now this might be rare enough that it's not worth fixing if that would be
>> too complicated, just FYI.
>
> Yeah. We can fix this by subtracting the "bytes" counter of the "head"
> page for all free_the_page(page + (1 << order), order) calls we do
> inside __free_pages(). But we can't simply use pgalloc_tag_sub()
> because the "calls" counter will get over-decremented (we allocated
> all of these pages with one call). I'll need to introduce a new
> pgalloc_tag_sub_bytes() API and use it here. I feel it's too targeted
> of a solution but OTOH this is a special situation, so maybe it's
> acceptable. WDYT?

Hmm I think there's a problem that once you fail put_page_testzero() and
detect you need to do this, the page might be already gone or reallocated so
you can't get to the tag for decrementing bytes. You'd have to get it
upfront (I guess for "head && order > 0" cases) just in case it happens.
Maybe it's not worth the trouble for such a rare case.

>>
>>
>> > Every time
>> > one of these pages are freed that codetag's "bytes" and "calls"
>> > counters will be decremented. I think accounting will work correctly
>> > irrespective of where these pages are freed, in __free_pages() or by
>> > put_page().
>> >
>>


2024-02-28 18:48:30

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 19/36] mm: create new codetag references during page splitting

On Wed, Feb 28, 2024 at 10:28 AM Vlastimil Babka <[email protected]> wrote:
>
> On 2/28/24 18:50, Suren Baghdasaryan wrote:
> > On Wed, Feb 28, 2024 at 12:47 AM Vlastimil Babka <[email protected]> wrote:
> >
> >>
> >> Now this might be rare enough that it's not worth fixing if that would be
> >> too complicated, just FYI.
> >
> > Yeah. We can fix this by subtracting the "bytes" counter of the "head"
> > page for all free_the_page(page + (1 << order), order) calls we do
> > inside __free_pages(). But we can't simply use pgalloc_tag_sub()
> > because the "calls" counter will get over-decremented (we allocated
> > all of these pages with one call). I'll need to introduce a new
> > pgalloc_tag_sub_bytes() API and use it here. I feel it's too targeted
> > of a solution but OTOH this is a special situation, so maybe it's
> > acceptable. WDYT?
>
> Hmm I think there's a problem that once you fail put_page_testzero() and
> detect you need to do this, the page might be already gone or reallocated so
> you can't get to the tag for decrementing bytes. You'd have to get it
> upfront (I guess for "head && order > 0" cases) just in case it happens.
> Maybe it's not worth the trouble for such a rare case.

Yes, that hit me when I tried to implement it but there is a simple
solution around that. I can obtain alloc_tag before doing
put_page_testzero() and then decrement bytes counter directly as
needed.
Not sure if it is a rare enough case that we can ignore it but if the
fix is simple enough then might as well do it?

>
> >>
> >>
> >> > Every time
> >> > one of these pages are freed that codetag's "bytes" and "calls"
> >> > counters will be decremented. I think accounting will work correctly
> >> > irrespective of where these pages are freed, in __free_pages() or by
> >> > put_page().
> >> >
> >>
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>

2024-02-24 02:02:47

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline

On Wed, Feb 21, 2024 at 1:16 PM Pasha Tatashin
<[email protected]> wrote:
>
> On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@googlecom> wrote:
> >
> > From: Kent Overstreet <[email protected]>
> >
> > It seems we need to be more forceful with the compiler on this one.
> > This is done for performance reasons only.
> >
> > Signed-off-by: Kent Overstreet <[email protected]>
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > Reviewed-by: Kees Cook <[email protected]>
> > ---
> > mm/slub.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 2ef88bbf56a3..d31b03a8d9d5 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
> > return !kasan_slab_free(s, x, init);
> > }
> >
> > -static inline bool slab_free_freelist_hook(struct kmem_cache *s,
> > +static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
>
> __fastpath_inline seems to me more appropriate here. It prioritizes
> memory vs performance.

Hmm. AFAIKT this function is used only in one place and we do not add
any additional users, so I don't think changing to __fastpath_inline
here would gain us anything.

>
> > void **head, void **tail,
> > int *cnt)
> > {
> > --
> > 2.44.0.rc0.258.g7320e95886-goog
> >

2024-02-23 22:24:59

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 24/36] rust: Add a rust helper for krealloc()

On Thu, Feb 22, 2024 at 2:00 AM Alice Ryhl <[email protected]> wrote:
>
> On Wed, Feb 21, 2024 at 8:41 PM Suren Baghdasaryan <surenb@googlecom> wrote:
> >
> > From: Kent Overstreet <[email protected]>
> >
> > Memory allocation profiling is turning krealloc() into a nontrivial
> > macro - so for now, we need a helper for it.
> >
> > Until we have proper support on the rust side for memory allocation
> > profiling this does mean that all Rust allocations will be accounted to
> > the helper.
> >
> > Signed-off-by: Kent Overstreet <[email protected]>
> > Cc: Miguel Ojeda <[email protected]>
> > Cc: Alex Gaynor <[email protected]>
> > Cc: Wedson Almeida Filho <[email protected]>
> > Cc: Boqun Feng <[email protected]>
> > Cc: Gary Guo <[email protected]>
> > Cc: "Björn Roy Baron" <[email protected]>
> > Cc: Benno Lossin <[email protected]>
> > Cc: Andreas Hindborg <[email protected]>
> > Cc: Alice Ryhl <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> Currently, the Rust build doesn't work throughout the entire series
> since there are some commits where krealloc is missing before you
> introduce the helper. If you introduce the helper first before
> krealloc stops being an exported function, then the Rust build should
> work throughout the entire series. (Having both the helper and the
> exported function at the same time is not a problem.)

Ack. I'll move it up in the series.

>
> With the patch reordered:
>
> Reviewed-by: Alice Ryhl <[email protected]>

Thanks Alice!

>
> Alice

2024-03-12 20:08:27

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed

On Tue, Mar 12, 2024 at 11:23:45AM -0700, Luis Chamberlain wrote:
> On Mon, Feb 26, 2024 at 05:58:40PM +0100, Vlastimil Babka wrote:
> > On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > > Skip freeing module's data section if there are non-zero allocation tags
> > > because otherwise, once these allocations are freed, the access to their
> > > code tag would cause UAF.
> > >
> > > Signed-off-by: Suren Baghdasaryan <[email protected]>
> >
> > I know that module unloading was never considered really supported etc.
>
> If its not supported then we should not have it on modules. Module
> loading and unloading should just work, otherwise then this should not
> work with modules and leave them in a zombie state.

Not have memory allocation profiling on modules?

2024-03-13 00:27:31

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed

On Mon, Feb 26, 2024 at 05:58:40PM +0100, Vlastimil Babka wrote:
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Skip freeing module's data section if there are non-zero allocation tags
> > because otherwise, once these allocations are freed, the access to their
> > code tag would cause UAF.
> >
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> I know that module unloading was never considered really supported etc.

If its not supported then we should not have it on modules. Module
loading and unloading should just work, otherwise then this should not
work with modules and leave them in a zombie state.

Luis

2024-03-12 19:58:02

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed

On Tue, Mar 12, 2024 at 11:23 AM Luis Chamberlain <[email protected]> wrote:
>
> On Mon, Feb 26, 2024 at 05:58:40PM +0100, Vlastimil Babka wrote:
> > On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > > Skip freeing module's data section if there are non-zero allocation tags
> > > because otherwise, once these allocations are freed, the access to their
> > > code tag would cause UAF.
> > >
> > > Signed-off-by: Suren Baghdasaryan <[email protected]>
> >
> > I know that module unloading was never considered really supported etc.
>
> If its not supported then we should not have it on modules. Module
> loading and unloading should just work, otherwise then this should not
> work with modules and leave them in a zombie state.

I replied on the v5 thread here:
https://lore.kernel.org/all/[email protected]/
Let's continue the discussion in that thread. Thanks!

>
> Luis