2021-10-12 18:03:35

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

PageSlab() currently imposes a compound_head() call on all callsites
even though only very few situations encounter tailpages. This short
series bubbles tailpage resolution up to the few sites that need it,
and eliminates it everywhere else.

This is a stand-alone improvement. However, it's inspired by Willy's
patches to split struct slab from struct page. Those patches currently
resolve a slab object pointer to its struct slab as follows:

slab = virt_to_slab(p); /* tailpage resolution */
if (slab_test_cache(slab)) { /* slab or page alloc bypass? */
do_slab_stuff(slab);
} else {
page = (struct page *)slab;
do_page_stuff(page);
}

which makes struct slab an ambiguous type that needs to self-identify
with the slab_test_cache() test (which in turn relies on PG_slab in
the flags field shared between page and slab).

It would be preferable to do:

page = virt_to_head_page(p); /* tailpage resolution */
if (PageSlab(page)) { /* slab or page alloc bypass? */
slab = page_slab(page);
do_slab_stuff(slab);
} else {
do_page_stuff(page);
}

and leave the ambiguity and the need to self-identify with struct
page, so that struct slab is a strong and unambiguous type, and a
non-NULL struct slab encountered in the wild is always a valid object
without the need to check another dedicated flag for validity first.

However, because PageSlab() currently implies tailpage resolution,
writing the virt->slab lookup in the preferred way would add yet more
unnecessary compound_head() call to the hottest MM paths.

The page flag helpers should eventually all be weaned off of those
compound_head() calls for their unnecessary overhead alone. But this
one in particular is now getting in the way of writing code in the
preferred manner, and bleeding page ambiguity into the new types that
are supposed to eliminate specifically that. It's ripe for a cleanup.

Based on v5.15-rc4.

arch/ia64/kernel/mca_drv.c | 2 +-
drivers/ata/libata-sff.c | 2 +-
fs/proc/page.c | 12 +++++++-----
include/linux/net.h | 2 +-
include/linux/page-flags.h | 10 +++++-----
mm/kasan/generic.c | 2 +-
mm/kasan/kasan.h | 2 +-
mm/kasan/report.c | 4 ++--
mm/kasan/report_tags.c | 2 +-
mm/memory-failure.c | 6 +++---
mm/memory.c | 3 ++-
mm/slab.h | 2 +-
mm/slob.c | 4 ++--
mm/slub.c | 6 +++---
mm/util.c | 2 +-
15 files changed, 32 insertions(+), 29 deletions(-)



2021-10-12 18:04:03

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 04/11] PageSlab: eliminate unnecessary compound_head() calls in mm/debug

head is resolved at the beginning of the function.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/debug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/debug.c b/mm/debug.c
index 500f5adce00e..fae0f81ad831 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -84,7 +84,7 @@ static void __dump_page(struct page *page)
* page->_mapcount space in struct page is used by sl[aou]b pages to
* encode own info.
*/
- mapcount = PageSlab(compound_head(head)) ? 0 : page_mapcount(page);
+ mapcount = PageSlab(head) ? 0 : page_mapcount(page);

pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
page, page_ref_count(head), mapcount, mapping,
--
2.32.0

2021-10-12 18:04:05

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 01/11] PageSlab: bubble compound_head() into callsites

In order to be safe to call on tail pages, PageSlab() currently does
an unconditional compound_head() lookup. This adds overhead to many
contexts in which tail pages cannot occur.

To have tailpage resolution only in places that need it, move the
compound_head() call from PageSlab() into all current callsites. This
is a mechanical replacement with no change in behavior or overhead.

Subsequent patches will be able to eliminate the compound_head() calls
from contexts in which they are not needed.

Signed-off-by: Johannes Weiner <[email protected]>
---
arch/ia64/kernel/mca_drv.c | 2 +-
drivers/ata/libata-sff.c | 2 +-
fs/proc/page.c | 6 ++++--
include/linux/memcontrol.h | 6 +++---
include/linux/net.h | 2 +-
include/linux/page-flags.h | 10 +++++-----
kernel/resource.c | 2 +-
mm/debug.c | 2 +-
mm/kasan/common.c | 4 ++--
mm/kasan/generic.c | 2 +-
mm/kasan/report.c | 2 +-
mm/kasan/report_tags.c | 2 +-
mm/memory-failure.c | 6 +++---
mm/memory.c | 3 ++-
mm/nommu.c | 2 +-
mm/slab.c | 2 +-
mm/slab.h | 5 +++--
mm/slab_common.c | 4 ++--
mm/slob.c | 4 ++--
mm/slub.c | 12 ++++++------
mm/usercopy.c | 2 +-
mm/util.c | 2 +-
22 files changed, 44 insertions(+), 40 deletions(-)

diff --git a/arch/ia64/kernel/mca_drv.c b/arch/ia64/kernel/mca_drv.c
index 5bfc79be4cef..903e7c26b63e 100644
--- a/arch/ia64/kernel/mca_drv.c
+++ b/arch/ia64/kernel/mca_drv.c
@@ -136,7 +136,7 @@ mca_page_isolate(unsigned long paddr)
return ISOLATE_NG;

/* kick pages having attribute 'SLAB' or 'Reserved' */
- if (PageSlab(p) || PageReserved(p))
+ if (PageSlab(compound_head(p)) || PageReserved(p))
return ISOLATE_NG;

/* add attribute 'Reserved' and register the page */
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index b71ea4a680b0..3a46d305616e 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -647,7 +647,7 @@ static void ata_pio_xfer(struct ata_queued_cmd *qc, struct page *page,
qc->ap->ops->sff_data_xfer(qc, buf + offset, xfer_size, do_write);
kunmap_atomic(buf);

- if (!do_write && !PageSlab(page))
+ if (!do_write && !PageSlab(compound_head(page)))
flush_dcache_page(page);
}

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 9f1077d94cde..2c249f84e1fd 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -66,7 +66,8 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
*/
ppage = pfn_to_online_page(pfn);

- if (!ppage || PageSlab(ppage) || page_has_type(ppage))
+ if (!ppage ||
+ PageSlab(compound_head(ppage)) || page_has_type(ppage))
pcount = 0;
else
pcount = page_mapcount(ppage);
@@ -126,7 +127,7 @@ u64 stable_page_flags(struct page *page)
* Note that page->_mapcount is overloaded in SLOB/SLUB/SLQB, so the
* simple test in page_mapped() is not enough.
*/
- if (!PageSlab(page) && page_mapped(page))
+ if (!PageSlab(compound_head(page)) && page_mapped(page))
u |= 1 << KPF_MMAP;
if (PageAnon(page))
u |= 1 << KPF_ANON;
@@ -152,6 +153,7 @@ u64 stable_page_flags(struct page *page)
else if (PageTransCompound(page)) {
struct page *head = compound_head(page);

+ /* XXX: misses isolated file THPs */
if (PageLRU(head) || PageAnon(head))
u |= 1 << KPF_THP;
else if (is_huge_zero_page(head)) {
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3096c9a0ee01..02394f802698 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -397,7 +397,7 @@ static inline struct mem_cgroup *__page_memcg(struct page *page)
{
unsigned long memcg_data = page->memcg_data;

- VM_BUG_ON_PAGE(PageSlab(page), page);
+ VM_BUG_ON_PAGE(PageSlab(compound_head(page)), page);
VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);

@@ -418,7 +418,7 @@ static inline struct obj_cgroup *__page_objcg(struct page *page)
{
unsigned long memcg_data = page->memcg_data;

- VM_BUG_ON_PAGE(PageSlab(page), page);
+ VM_BUG_ON_PAGE(PageSlab(compound_head(page)), page);
VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);

@@ -466,7 +466,7 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
{
unsigned long memcg_data = READ_ONCE(page->memcg_data);

- VM_BUG_ON_PAGE(PageSlab(page), page);
+ VM_BUG_ON_PAGE(PageSlab(compound_head(page)), page);
WARN_ON_ONCE(!rcu_read_lock_held());

if (memcg_data & MEMCG_DATA_KMEM) {
diff --git a/include/linux/net.h b/include/linux/net.h
index ba736b457a06..79767ae262ef 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -299,7 +299,7 @@ do { \
*/
static inline bool sendpage_ok(struct page *page)
{
- return !PageSlab(page) && page_count(page) >= 1;
+ return !PageSlab(compound_head(page)) && page_count(page) >= 1;
}

int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec,
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a558d67ee86f..e96c9cb5bf8b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -344,7 +344,7 @@ PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
PAGEFLAG(Workingset, workingset, PF_HEAD)
TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
-__PAGEFLAG(Slab, slab, PF_NO_TAIL)
+__PAGEFLAG(Slab, slab, PF_ONLY_HEAD)
__PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */

@@ -776,7 +776,7 @@ __PAGEFLAG(Isolated, isolated, PF_ANY);
*/
static inline int PageSlabPfmemalloc(struct page *page)
{
- VM_BUG_ON_PAGE(!PageSlab(page), page);
+ VM_BUG_ON_PAGE(!PageSlab(compound_head(page)), page);
return PageActive(page);
}

@@ -791,19 +791,19 @@ static inline int __PageSlabPfmemalloc(struct page *page)

static inline void SetPageSlabPfmemalloc(struct page *page)
{
- VM_BUG_ON_PAGE(!PageSlab(page), page);
+ VM_BUG_ON_PAGE(!PageSlab(compound_head(page)), page);
SetPageActive(page);
}

static inline void __ClearPageSlabPfmemalloc(struct page *page)
{
- VM_BUG_ON_PAGE(!PageSlab(page), page);
+ VM_BUG_ON_PAGE(!PageSlab(compound_head(page)), page);
__ClearPageActive(page);
}

static inline void ClearPageSlabPfmemalloc(struct page *page)
{
- VM_BUG_ON_PAGE(!PageSlab(page), page);
+ VM_BUG_ON_PAGE(!PageSlab(compound_head(page)), page);
ClearPageActive(page);
}

diff --git a/kernel/resource.c b/kernel/resource.c
index ca9f5198a01f..a363211fda99 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -151,7 +151,7 @@ static void free_resource(struct resource *res)
if (!res)
return;

- if (!PageSlab(virt_to_head_page(res))) {
+ if (!PageSlab(compound_head(virt_to_head_page(res)))) {
spin_lock(&bootmem_resource_lock);
res->sibling = bootmem_resource_free;
bootmem_resource_free = res;
diff --git a/mm/debug.c b/mm/debug.c
index fae0f81ad831..500f5adce00e 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -84,7 +84,7 @@ static void __dump_page(struct page *page)
* page->_mapcount space in struct page is used by sl[aou]b pages to
* encode own info.
*/
- mapcount = PageSlab(head) ? 0 : page_mapcount(page);
+ mapcount = PageSlab(compound_head(head)) ? 0 : page_mapcount(page);

pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
page, page_ref_count(head), mapcount, mapping,
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 2baf121fb8c5..b5e81273fc6b 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -411,7 +411,7 @@ void __kasan_slab_free_mempool(void *ptr, unsigned long ip)
* !PageSlab() when the size provided to kmalloc is larger than
* KMALLOC_MAX_SIZE, and kmalloc falls back onto page_alloc.
*/
- if (unlikely(!PageSlab(page))) {
+ if (unlikely(!PageSlab(compound_head(page)))) {
if (____kasan_kfree_large(ptr, ip))
return;
kasan_poison(ptr, page_size(page), KASAN_FREE_PAGE, false);
@@ -575,7 +575,7 @@ void * __must_check __kasan_krealloc(const void *object, size_t size, gfp_t flag
page = virt_to_head_page(object);

/* Piggy-back on kmalloc() instrumentation to poison the redzone. */
- if (unlikely(!PageSlab(page)))
+ if (unlikely(!PageSlab(compound_head(page))))
return __kasan_kmalloc_large(object, size, flags);
else
return ____kasan_kmalloc(page->slab_cache, object, size, flags);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index c3f5ba7a294a..94c0c86c79d9 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -335,7 +335,7 @@ void kasan_record_aux_stack(void *addr)
struct kasan_alloc_meta *alloc_meta;
void *object;

- if (is_kfence_address(addr) || !(page && PageSlab(page)))
+ if (is_kfence_address(addr) || !(page && PageSlab(compound_head(page))))
return;

cache = page->slab_cache;
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 884a950c7026..7cdcf968f43f 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -256,7 +256,7 @@ static void print_address_description(void *addr, u8 tag)
dump_stack_lvl(KERN_ERR);
pr_err("\n");

- if (page && PageSlab(page)) {
+ if (page && PageSlab(compound_head(page))) {
struct kmem_cache *cache = page->slab_cache;
void *object = nearest_obj(cache, page, addr);

diff --git a/mm/kasan/report_tags.c b/mm/kasan/report_tags.c
index 8a319fc16dab..32f955d98e76 100644
--- a/mm/kasan/report_tags.c
+++ b/mm/kasan/report_tags.c
@@ -21,7 +21,7 @@ const char *kasan_get_bug_type(struct kasan_access_info *info)
tag = get_tag(info->access_addr);
addr = kasan_reset_tag(info->access_addr);
page = kasan_addr_to_page(addr);
- if (page && PageSlab(page)) {
+ if (page && PageSlab(compound_head(page))) {
cache = page->slab_cache;
object = nearest_obj(cache, page, (void *)addr);
alloc_meta = kasan_get_alloc_meta(cache, object);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..0d214f800a4e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -131,7 +131,7 @@ static int hwpoison_filter_dev(struct page *p)
/*
* page_mapping() does not accept slab pages.
*/
- if (PageSlab(p))
+ if (PageSlab(compound_head(p)))
return -EINVAL;

mapping = page_mapping(p);
@@ -289,7 +289,7 @@ void shake_page(struct page *p)
if (PageHuge(p))
return;

- if (!PageSlab(p)) {
+ if (!PageSlab(compound_head(p))) {
lru_add_drain_all();
if (PageLRU(p) || is_free_buddy_page(p))
return;
@@ -1285,7 +1285,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
* Here we are interested only in user-mapped pages, so skip any
* other types of pages.
*/
- if (PageReserved(p) || PageSlab(p))
+ if (PageReserved(p) || PageSlab(compound_head(p)))
return true;
if (!(PageLRU(hpage) || PageHuge(p)))
return true;
diff --git a/mm/memory.c b/mm/memory.c
index adf9b9ef8277..a789613af270 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1738,7 +1738,8 @@ pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr,

static int validate_page_before_insert(struct page *page)
{
- if (PageAnon(page) || PageSlab(page) || page_has_type(page))
+ if (PageAnon(page) ||
+ PageSlab(compound_head(page)) || page_has_type(page))
return -EINVAL;
flush_dcache_page(page);
return 0;
diff --git a/mm/nommu.c b/mm/nommu.c
index 02d2427b8f9e..c233126dd476 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -87,7 +87,7 @@ unsigned int kobjsize(const void *objp)
* If the allocator sets PageSlab, we know the pointer came from
* kmalloc().
*/
- if (PageSlab(page))
+ if (PageSlab(compound_head(page)))
return ksize(objp);

/*
diff --git a/mm/slab.c b/mm/slab.c
index d0f725637663..829f2b6d4af7 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1396,7 +1396,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
{
int order = cachep->gfporder;

- BUG_ON(!PageSlab(page));
+ BUG_ON(!PageSlab(compound_head(page)));
__ClearPageSlabPfmemalloc(page);
__ClearPageSlab(page);
page_mapcount_reset(page);
diff --git a/mm/slab.h b/mm/slab.h
index 58c01a34e5b8..0446948c9c4e 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -410,8 +410,9 @@ static inline struct kmem_cache *virt_to_cache(const void *obj)
struct page *page;

page = virt_to_head_page(obj);
- if (WARN_ONCE(!PageSlab(page), "%s: Object is not a Slab page!\n",
- __func__))
+ if (WARN_ONCE(!PageSlab(compound_head(page)),
+ "%s: Object is not a Slab page!\n",
+ __func__))
return NULL;
return page->slab_cache;
}
diff --git a/mm/slab_common.c b/mm/slab_common.c
index ec2bb0beed75..5f7063797f0e 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -564,7 +564,7 @@ bool kmem_valid_obj(void *object)
if (object < (void *)PAGE_SIZE || !virt_addr_valid(object))
return false;
page = virt_to_head_page(object);
- return PageSlab(page);
+ return PageSlab(compound_head(page));
}
EXPORT_SYMBOL_GPL(kmem_valid_obj);

@@ -594,7 +594,7 @@ void kmem_dump_obj(void *object)
if (WARN_ON_ONCE(!virt_addr_valid(object)))
return;
page = virt_to_head_page(object);
- if (WARN_ON_ONCE(!PageSlab(page))) {
+ if (WARN_ON_ONCE(!PageSlab(compound_head(page)))) {
pr_cont(" non-slab memory.\n");
return;
}
diff --git a/mm/slob.c b/mm/slob.c
index 74d3f6e60666..4115788227fb 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -553,7 +553,7 @@ void kfree(const void *block)
kmemleak_free(block);

sp = virt_to_page(block);
- if (PageSlab(sp)) {
+ if (PageSlab(compound_head(sp))) {
int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
unsigned int *m = (unsigned int *)(block - align);
slob_free(m, *m + align);
@@ -579,7 +579,7 @@ size_t __ksize(const void *block)
return 0;

sp = virt_to_page(block);
- if (unlikely(!PageSlab(sp)))
+ if (unlikely(!PageSlab(compound_head(sp))))
return page_size(sp);

align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
diff --git a/mm/slub.c b/mm/slub.c
index 3d2025f7163b..37a4cc1e73a7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1089,7 +1089,7 @@ static int check_slab(struct kmem_cache *s, struct page *page)
{
int maxobj;

- if (!PageSlab(page)) {
+ if (!PageSlab(compound_head(page))) {
slab_err(s, page, "Not a valid slab page");
return 0;
}
@@ -1295,7 +1295,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
return 1;

bad:
- if (PageSlab(page)) {
+ if (PageSlab(compound_head(page))) {
/*
* If this is a slab page then lets do the best we can
* to avoid issues in the future. Marking all objects
@@ -1325,7 +1325,7 @@ static inline int free_consistency_checks(struct kmem_cache *s,
return 0;

if (unlikely(s != page->slab_cache)) {
- if (!PageSlab(page)) {
+ if (!PageSlab(compound_head(page))) {
slab_err(s, page, "Attempt to free object(0x%p) outside of slab",
object);
} else if (!page->slab_cache) {
@@ -3554,7 +3554,7 @@ int build_detached_freelist(struct kmem_cache *s, size_t size,
page = virt_to_head_page(object);
if (!s) {
/* Handle kalloc'ed objects */
- if (unlikely(!PageSlab(page))) {
+ if (unlikely(!PageSlab(compound_head(page)))) {
free_nonslab_page(page, object);
p[size] = NULL; /* mark object processed */
return size;
@@ -4516,7 +4516,7 @@ size_t __ksize(const void *object)

page = virt_to_head_page(object);

- if (unlikely(!PageSlab(page))) {
+ if (unlikely(!PageSlab(compound_head(page)))) {
WARN_ON(!PageCompound(page));
return page_size(page);
}
@@ -4536,7 +4536,7 @@ void kfree(const void *x)
return;

page = virt_to_head_page(x);
- if (unlikely(!PageSlab(page))) {
+ if (unlikely(!PageSlab(compound_head(page)))) {
free_nonslab_page(page, object);
return;
}
diff --git a/mm/usercopy.c b/mm/usercopy.c
index b3de3c4eefba..924e236522da 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -235,7 +235,7 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
*/
page = compound_head(kmap_to_page((void *)ptr));

- if (PageSlab(page)) {
+ if (PageSlab(compound_head(page))) {
/* Check slab allocator for flags and size. */
__check_heap_object(ptr, n, page, to_user);
} else {
diff --git a/mm/util.c b/mm/util.c
index bacabe446906..6e6abdc9f62e 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -712,7 +712,7 @@ struct address_space *page_mapping(struct page *page)
page = compound_head(page);

/* This happens if someone calls flush_dcache_page on slab page */
- if (unlikely(PageSlab(page)))
+ if (unlikely(PageSlab(compound_head(page))))
return NULL;

if (unlikely(PageSwapCache(page))) {
--
2.32.0

2021-10-12 18:04:06

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 02/11] PageSlab: eliminate unnecessary compound_head() calls in fs/proc/page

Cache multiple lookups in a local variable.

Signed-off-by: Johannes Weiner <[email protected]>
---
fs/proc/page.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 2c249f84e1fd..37d95309e1c1 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -108,6 +108,7 @@ static inline u64 kpf_copy_bit(u64 kflags, int ubit, int kbit)

u64 stable_page_flags(struct page *page)
{
+ struct page *head;
u64 k;
u64 u;

@@ -118,6 +119,7 @@ u64 stable_page_flags(struct page *page)
if (!page)
return 1 << KPF_NOPAGE;

+ head = compound_head(page);
k = page->flags;
u = 0;

@@ -127,7 +129,7 @@ u64 stable_page_flags(struct page *page)
* Note that page->_mapcount is overloaded in SLOB/SLUB/SLQB, so the
* simple test in page_mapped() is not enough.
*/
- if (!PageSlab(compound_head(page)) && page_mapped(page))
+ if (!PageSlab(head) && page_mapped(page))
u |= 1 << KPF_MMAP;
if (PageAnon(page))
u |= 1 << KPF_ANON;
@@ -151,8 +153,6 @@ u64 stable_page_flags(struct page *page)
* to make sure a given page is a thp, not a non-huge compound page.
*/
else if (PageTransCompound(page)) {
- struct page *head = compound_head(page);
-
/* XXX: misses isolated file THPs */
if (PageLRU(head) || PageAnon(head))
u |= 1 << KPF_THP;
@@ -185,7 +185,7 @@ u64 stable_page_flags(struct page *page)
u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);

u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
- if (PageTail(page) && PageSlab(compound_head(page)))
+ if (PageTail(page) && PageSlab(head))
u |= 1 << KPF_SLAB;

u |= kpf_copy_bit(k, KPF_ERROR, PG_error);
--
2.32.0

2021-10-12 18:04:14

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 03/11] PageSlab: eliminate unnecessary compound_head() calls in kernel/resource

virt_to_head_page() implies it.

Signed-off-by: Johannes Weiner <[email protected]>
---
kernel/resource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index a363211fda99..ca9f5198a01f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -151,7 +151,7 @@ static void free_resource(struct resource *res)
if (!res)
return;

- if (!PageSlab(compound_head(virt_to_head_page(res)))) {
+ if (!PageSlab(virt_to_head_page(res))) {
spin_lock(&bootmem_resource_lock);
res->sibling = bootmem_resource_free;
bootmem_resource_free = res;
--
2.32.0

2021-10-12 18:04:29

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 06/11] PageSlab: eliminate unnecessary compound_head() call in mm/nommu

The page comes from virt_to_head_page().

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/nommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/nommu.c b/mm/nommu.c
index c233126dd476..02d2427b8f9e 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -87,7 +87,7 @@ unsigned int kobjsize(const void *objp)
* If the allocator sets PageSlab, we know the pointer came from
* kmalloc().
*/
- if (PageSlab(compound_head(page)))
+ if (PageSlab(page))
return ksize(objp);

/*
--
2.32.0

2021-10-12 18:04:38

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 09/11] PageSlab: eliminate unnecessary compound_head() calls in mm/slub

virt_to_head_page() implies it.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/slub.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 37a4cc1e73a7..d1bd004c8c2f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3554,7 +3554,7 @@ int build_detached_freelist(struct kmem_cache *s, size_t size,
page = virt_to_head_page(object);
if (!s) {
/* Handle kalloc'ed objects */
- if (unlikely(!PageSlab(compound_head(page)))) {
+ if (unlikely(!PageSlab(page))) {
free_nonslab_page(page, object);
p[size] = NULL; /* mark object processed */
return size;
@@ -4516,7 +4516,7 @@ size_t __ksize(const void *object)

page = virt_to_head_page(object);

- if (unlikely(!PageSlab(compound_head(page)))) {
+ if (unlikely(!PageSlab(page))) {
WARN_ON(!PageCompound(page));
return page_size(page);
}
@@ -4536,7 +4536,7 @@ void kfree(const void *x)
return;

page = virt_to_head_page(x);
- if (unlikely(!PageSlab(compound_head(page)))) {
+ if (unlikely(!PageSlab(page))) {
free_nonslab_page(page, object);
return;
}
--
2.32.0

2021-10-12 18:04:48

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 08/11] PageSlab: eliminate unnecessary compound_head() calls in mm/slab_common

virt_to_head_page() implies it.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/slab.h | 3 +--
mm/slab_common.c | 4 ++--
2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 0446948c9c4e..4bcaa08320e6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -410,8 +410,7 @@ static inline struct kmem_cache *virt_to_cache(const void *obj)
struct page *page;

page = virt_to_head_page(obj);
- if (WARN_ONCE(!PageSlab(compound_head(page)),
- "%s: Object is not a Slab page!\n",
+ if (WARN_ONCE(!PageSlab(page), "%s: Object is not a Slab page!\n",
__func__))
return NULL;
return page->slab_cache;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 5f7063797f0e..ec2bb0beed75 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -564,7 +564,7 @@ bool kmem_valid_obj(void *object)
if (object < (void *)PAGE_SIZE || !virt_addr_valid(object))
return false;
page = virt_to_head_page(object);
- return PageSlab(compound_head(page));
+ return PageSlab(page);
}
EXPORT_SYMBOL_GPL(kmem_valid_obj);

@@ -594,7 +594,7 @@ void kmem_dump_obj(void *object)
if (WARN_ON_ONCE(!virt_addr_valid(object)))
return;
page = virt_to_head_page(object);
- if (WARN_ON_ONCE(!PageSlab(compound_head(page)))) {
+ if (WARN_ON_ONCE(!PageSlab(page))) {
pr_cont(" non-slab memory.\n");
return;
}
--
2.32.0

2021-10-12 18:04:53

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 11/11] PageSlab: eliminate unnecessary compound_head() calls in mm/memcontrol

page->memcg_data is generally not valid on tailpages, so the accessor
functions don't need to worry about encountering them outside of
bugs. The only place where the memory controller may encounter tail
pages is in mem_cgroup_from_obj(), but that uses virt_to_head_page()
first thing to resolve slab objects to slab pages.

Signed-off-by: Johannes Weiner <[email protected]>
---
include/linux/memcontrol.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 02394f802698..3096c9a0ee01 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -397,7 +397,7 @@ static inline struct mem_cgroup *__page_memcg(struct page *page)
{
unsigned long memcg_data = page->memcg_data;

- VM_BUG_ON_PAGE(PageSlab(compound_head(page)), page);
+ VM_BUG_ON_PAGE(PageSlab(page), page);
VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);

@@ -418,7 +418,7 @@ static inline struct obj_cgroup *__page_objcg(struct page *page)
{
unsigned long memcg_data = page->memcg_data;

- VM_BUG_ON_PAGE(PageSlab(compound_head(page)), page);
+ VM_BUG_ON_PAGE(PageSlab(page), page);
VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);

@@ -466,7 +466,7 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
{
unsigned long memcg_data = READ_ONCE(page->memcg_data);

- VM_BUG_ON_PAGE(PageSlab(compound_head(page)), page);
+ VM_BUG_ON_PAGE(PageSlab(page), page);
WARN_ON_ONCE(!rcu_read_lock_held());

if (memcg_data & MEMCG_DATA_KMEM) {
--
2.32.0

2021-10-12 18:04:56

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 05/11] PageSlab: eliminate unnecessary compound_head() calls in mm/kasan

All tested pages come from virt_to_head_page(): either directly or
through kasan_addr_to_page(). Remove the redundant compound_head()
calls. Rename the helper to kasan_addr_to_head_page() to clarify.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/kasan/common.c | 4 ++--
mm/kasan/generic.c | 4 ++--
mm/kasan/kasan.h | 2 +-
mm/kasan/report.c | 6 +++---
mm/kasan/report_tags.c | 4 ++--
5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index b5e81273fc6b..2baf121fb8c5 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -411,7 +411,7 @@ void __kasan_slab_free_mempool(void *ptr, unsigned long ip)
* !PageSlab() when the size provided to kmalloc is larger than
* KMALLOC_MAX_SIZE, and kmalloc falls back onto page_alloc.
*/
- if (unlikely(!PageSlab(compound_head(page)))) {
+ if (unlikely(!PageSlab(page))) {
if (____kasan_kfree_large(ptr, ip))
return;
kasan_poison(ptr, page_size(page), KASAN_FREE_PAGE, false);
@@ -575,7 +575,7 @@ void * __must_check __kasan_krealloc(const void *object, size_t size, gfp_t flag
page = virt_to_head_page(object);

/* Piggy-back on kmalloc() instrumentation to poison the redzone. */
- if (unlikely(!PageSlab(compound_head(page))))
+ if (unlikely(!PageSlab(page)))
return __kasan_kmalloc_large(object, size, flags);
else
return ____kasan_kmalloc(page->slab_cache, object, size, flags);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 94c0c86c79d9..d4303a6722ab 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -330,12 +330,12 @@ DEFINE_ASAN_SET_SHADOW(f8);

void kasan_record_aux_stack(void *addr)
{
- struct page *page = kasan_addr_to_page(addr);
+ struct page *page = kasan_addr_to_head_page(addr);
struct kmem_cache *cache;
struct kasan_alloc_meta *alloc_meta;
void *object;

- if (is_kfence_address(addr) || !(page && PageSlab(compound_head(page))))
+ if (is_kfence_address(addr) || !(page && PageSlab(page)))
return;

cache = page->slab_cache;
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8bf568a80eb8..fe39eeee6b59 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -249,7 +249,7 @@ bool kasan_report(unsigned long addr, size_t size,
bool is_write, unsigned long ip);
void kasan_report_invalid_free(void *object, unsigned long ip);

-struct page *kasan_addr_to_page(const void *addr);
+struct page *kasan_addr_to_head_page(const void *addr);

depot_stack_handle_t kasan_save_stack(gfp_t flags);
void kasan_set_track(struct kasan_track *track, gfp_t flags);
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 7cdcf968f43f..405ecf3c9301 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -151,7 +151,7 @@ static void print_track(struct kasan_track *track, const char *prefix)
}
}

-struct page *kasan_addr_to_page(const void *addr)
+struct page *kasan_addr_to_head_page(const void *addr)
{
if ((addr >= (void *)PAGE_OFFSET) &&
(addr < high_memory))
@@ -251,12 +251,12 @@ static inline bool init_task_stack_addr(const void *addr)

static void print_address_description(void *addr, u8 tag)
{
- struct page *page = kasan_addr_to_page(addr);
+ struct page *page = kasan_addr_to_head_page(addr);

dump_stack_lvl(KERN_ERR);
pr_err("\n");

- if (page && PageSlab(compound_head(page))) {
+ if (page && PageSlab(page)) {
struct kmem_cache *cache = page->slab_cache;
void *object = nearest_obj(cache, page, addr);

diff --git a/mm/kasan/report_tags.c b/mm/kasan/report_tags.c
index 32f955d98e76..5ae9df06ed44 100644
--- a/mm/kasan/report_tags.c
+++ b/mm/kasan/report_tags.c
@@ -20,8 +20,8 @@ const char *kasan_get_bug_type(struct kasan_access_info *info)

tag = get_tag(info->access_addr);
addr = kasan_reset_tag(info->access_addr);
- page = kasan_addr_to_page(addr);
- if (page && PageSlab(compound_head(page))) {
+ page = kasan_addr_to_head_page(addr);
+ if (page && PageSlab(page)) {
cache = page->slab_cache;
object = nearest_obj(cache, page, (void *)addr);
alloc_meta = kasan_get_alloc_meta(cache, object);
--
2.32.0

2021-10-12 18:05:39

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 07/11] PageSlab: eliminate unnecessary compound_head() call in mm/slab

kmem_freepages() is only ever called on the full slab, i.e. the
headpage. Further, after the PageSlab() check, the callsite continues
with a series of operations that are not legal on tailpages and would
trigger asserts on their own already.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/slab.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 829f2b6d4af7..d0f725637663 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1396,7 +1396,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
{
int order = cachep->gfporder;

- BUG_ON(!PageSlab(compound_head(page)));
+ BUG_ON(!PageSlab(page));
__ClearPageSlabPfmemalloc(page);
__ClearPageSlab(page);
page_mapcount_reset(page);
--
2.32.0

2021-10-12 18:05:48

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 10/11] PageSlab: eliminate unnecessary compound_head() call in mm/usercopy

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/usercopy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/usercopy.c b/mm/usercopy.c
index 924e236522da..b3de3c4eefba 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -235,7 +235,7 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
*/
page = compound_head(kmap_to_page((void *)ptr));

- if (PageSlab(compound_head(page))) {
+ if (PageSlab(page)) {
/* Check slab allocator for flags and size. */
__check_heap_object(ptr, n, page, to_user);
} else {
--
2.32.0

2021-10-12 19:27:41

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

On Tue, Oct 12, 2021 at 02:01:37PM -0400, Johannes Weiner wrote:
> PageSlab() currently imposes a compound_head() call on all callsites
> even though only very few situations encounter tailpages. This short
> series bubbles tailpage resolution up to the few sites that need it,
> and eliminates it everywhere else.
>
> This is a stand-alone improvement. However, it's inspired by Willy's
> patches to split struct slab from struct page. Those patches currently
> resolve a slab object pointer to its struct slab as follows:
>
> slab = virt_to_slab(p); /* tailpage resolution */
> if (slab_test_cache(slab)) { /* slab or page alloc bypass? */
> do_slab_stuff(slab);
> } else {
> page = (struct page *)slab;
> do_page_stuff(page);
> }
>
> which makes struct slab an ambiguous type that needs to self-identify
> with the slab_test_cache() test (which in turn relies on PG_slab in
> the flags field shared between page and slab).
>
> It would be preferable to do:
>
> page = virt_to_head_page(p); /* tailpage resolution */
> if (PageSlab(page)) { /* slab or page alloc bypass? */
> slab = page_slab(page);
> do_slab_stuff(slab);
> } else {
> do_page_stuff(page);
> }
>
> and leave the ambiguity and the need to self-identify with struct
> page, so that struct slab is a strong and unambiguous type, and a
> non-NULL struct slab encountered in the wild is always a valid object
> without the need to check another dedicated flag for validity first.
>
> However, because PageSlab() currently implies tailpage resolution,
> writing the virt->slab lookup in the preferred way would add yet more
> unnecessary compound_head() call to the hottest MM paths.
>
> The page flag helpers should eventually all be weaned off of those
> compound_head() calls for their unnecessary overhead alone. But this
> one in particular is now getting in the way of writing code in the
> preferred manner, and bleeding page ambiguity into the new types that
> are supposed to eliminate specifically that. It's ripe for a cleanup.

So what I had in mind was more the patch at the end (which I now realise
is missing the corresponding changes to __ClearPageSlab()). There is,
however, some weirdness with kfence, which appears to be abusing PageSlab
by setting it on pages which are not slab pages???

page = virt_to_page(p);
if (PageSlab(page)) { /* slab or page alloc bypass? */
slab = page_slab(page); /* tail page resolution */
do_slab_stuff(slab);
} else {
do_page_stuff(page); /* or possibly compound_head(page) */
}

There could also be a PageTail check in there for some of the cases --
catch people doing something like:
kfree(kmalloc(65536, GFP_KERNEL) + 16384);
which happens to work today, but should probably not.

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8951da34aa51..b4b62fa100eb 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -344,7 +344,7 @@ PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
PAGEFLAG(Workingset, workingset, PF_HEAD)
TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
-__PAGEFLAG(Slab, slab, PF_NO_TAIL)
+__PAGEFLAG(Slab, slab, PF_ANY)
__PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */

diff --git a/mm/slab.c b/mm/slab.c
index d0f725637663..c8c6e8576936 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1371,6 +1371,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
int nodeid)
{
struct page *page;
+ int i;

flags |= cachep->allocflags;

@@ -1381,7 +1382,8 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
}

account_slab_page(page, cachep->gfporder, cachep, flags);
- __SetPageSlab(page);
+ for (i = 0; i < compound_nr(page); i++)
+ __SetPageSlab(page + i);
/* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
if (sk_memalloc_socks() && page_is_pfmemalloc(page))
SetPageSlabPfmemalloc(page);
diff --git a/mm/slub.c b/mm/slub.c
index f7ac28646580..e442cebda4ef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1916,7 +1916,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
account_slab_page(page, oo_order(oo), s, flags);

page->slab_cache = s;
- __SetPageSlab(page);
+ for (idx = 0; idx < compound_nr(page); idx++)
+ __SetPageSlab(page + idx);
if (page_is_pfmemalloc(page))
SetPageSlabPfmemalloc(page);


2021-10-12 20:33:27

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

On Tue, Oct 12, 2021 at 08:23:26PM +0100, Matthew Wilcox wrote:
> On Tue, Oct 12, 2021 at 02:01:37PM -0400, Johannes Weiner wrote:
> > PageSlab() currently imposes a compound_head() call on all callsites
> > even though only very few situations encounter tailpages. This short
> > series bubbles tailpage resolution up to the few sites that need it,
> > and eliminates it everywhere else.
> >
> > This is a stand-alone improvement. However, it's inspired by Willy's
> > patches to split struct slab from struct page. Those patches currently
> > resolve a slab object pointer to its struct slab as follows:
> >
> > slab = virt_to_slab(p); /* tailpage resolution */
> > if (slab_test_cache(slab)) { /* slab or page alloc bypass? */
> > do_slab_stuff(slab);
> > } else {
> > page = (struct page *)slab;
> > do_page_stuff(page);
> > }
> >
> > which makes struct slab an ambiguous type that needs to self-identify
> > with the slab_test_cache() test (which in turn relies on PG_slab in
> > the flags field shared between page and slab).
> >
> > It would be preferable to do:
> >
> > page = virt_to_head_page(p); /* tailpage resolution */
> > if (PageSlab(page)) { /* slab or page alloc bypass? */
> > slab = page_slab(page);
> > do_slab_stuff(slab);
> > } else {
> > do_page_stuff(page);
> > }
> >
> > and leave the ambiguity and the need to self-identify with struct
> > page, so that struct slab is a strong and unambiguous type, and a
> > non-NULL struct slab encountered in the wild is always a valid object
> > without the need to check another dedicated flag for validity first.
> >
> > However, because PageSlab() currently implies tailpage resolution,
> > writing the virt->slab lookup in the preferred way would add yet more
> > unnecessary compound_head() call to the hottest MM paths.
> >
> > The page flag helpers should eventually all be weaned off of those
> > compound_head() calls for their unnecessary overhead alone. But this
> > one in particular is now getting in the way of writing code in the
> > preferred manner, and bleeding page ambiguity into the new types that
> > are supposed to eliminate specifically that. It's ripe for a cleanup.
>
> So what I had in mind was more the patch at the end (which I now realise
> is missing the corresponding changes to __ClearPageSlab()). There is,
> however, some weirdness with kfence, which appears to be abusing PageSlab
> by setting it on pages which are not slab pages???
>
> page = virt_to_page(p);
> if (PageSlab(page)) { /* slab or page alloc bypass? */
> slab = page_slab(page); /* tail page resolution */
> do_slab_stuff(slab);
> } else {
> do_page_stuff(page); /* or possibly compound_head(page) */
> }

Can you elaborate why you think this would be better?

If the object is sitting in a tailpage, the flag test itself could
avoid the compound_head(), but at the end of the day it's the slab or
the headpage that needs to be operated on in the fastpaths, and we
need to do the compound_head() whether the flag is set or not.

I suppose it could make some debugging checks marginally cheaper?

But OTOH it comes at the cost of the flag setting and clearing loops
in the slab allocation path, even when debugging checks are disabled.

And it would further complicate the compound page model by introducing
another distinct flag handling scheme (would there be other users for
it?). The open-coded loops as a means to ensure flag integrity seem
error prone; but creating Set and Clear variants that encapsulate the
loops sounds like a move in the wrong direction, given the pain the
compound_head() alone in them has already created.

> There could also be a PageTail check in there for some of the cases --
> catch people doing something like:
> kfree(kmalloc(65536, GFP_KERNEL) + 16384);
> which happens to work today, but should probably not.

I actually wondered about that when looking at the slob code. Its
kfree does this:

sp = virt_to_page(block);
if (PageSlab(compound_head(sp))) {
int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
unsigned int *m = (unsigned int *)(block - align);
slob_free(m, *m + align);
} else {
unsigned int order = compound_order(sp);
mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
-(PAGE_SIZE << order));
__free_pages(sp, order);

}

Note the virt_to_page(), instead of virt_to_head_page(). It does test
PG_slab correctly, but if this is a bypass page, it operates on
whatever tail page the kfree() argument points into. If you did what
you write above, it would leak the pages before the object.

2021-10-13 03:24:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

On Tue, Oct 12, 2021 at 04:30:20PM -0400, Johannes Weiner wrote:
> On Tue, Oct 12, 2021 at 08:23:26PM +0100, Matthew Wilcox wrote:
> > On Tue, Oct 12, 2021 at 02:01:37PM -0400, Johannes Weiner wrote:
> > > PageSlab() currently imposes a compound_head() call on all callsites
> > > even though only very few situations encounter tailpages. This short
> > > series bubbles tailpage resolution up to the few sites that need it,
> > > and eliminates it everywhere else.
> > >
> > > This is a stand-alone improvement. However, it's inspired by Willy's
> > > patches to split struct slab from struct page. Those patches currently
> > > resolve a slab object pointer to its struct slab as follows:
> > >
> > > slab = virt_to_slab(p); /* tailpage resolution */
> > > if (slab_test_cache(slab)) { /* slab or page alloc bypass? */
> > > do_slab_stuff(slab);
> > > } else {
> > > page = (struct page *)slab;
> > > do_page_stuff(page);
> > > }
> > >
> > > which makes struct slab an ambiguous type that needs to self-identify
> > > with the slab_test_cache() test (which in turn relies on PG_slab in
> > > the flags field shared between page and slab).
> > >
> > > It would be preferable to do:
> > >
> > > page = virt_to_head_page(p); /* tailpage resolution */
> > > if (PageSlab(page)) { /* slab or page alloc bypass? */
> > > slab = page_slab(page);
> > > do_slab_stuff(slab);
> > > } else {
> > > do_page_stuff(page);
> > > }
> > >
> > > and leave the ambiguity and the need to self-identify with struct
> > > page, so that struct slab is a strong and unambiguous type, and a
> > > non-NULL struct slab encountered in the wild is always a valid object
> > > without the need to check another dedicated flag for validity first.
> > >
> > > However, because PageSlab() currently implies tailpage resolution,
> > > writing the virt->slab lookup in the preferred way would add yet more
> > > unnecessary compound_head() call to the hottest MM paths.
> > >
> > > The page flag helpers should eventually all be weaned off of those
> > > compound_head() calls for their unnecessary overhead alone. But this
> > > one in particular is now getting in the way of writing code in the
> > > preferred manner, and bleeding page ambiguity into the new types that
> > > are supposed to eliminate specifically that. It's ripe for a cleanup.
> >
> > So what I had in mind was more the patch at the end (which I now realise
> > is missing the corresponding changes to __ClearPageSlab()). There is,
> > however, some weirdness with kfence, which appears to be abusing PageSlab
> > by setting it on pages which are not slab pages???
> >
> > page = virt_to_page(p);
> > if (PageSlab(page)) { /* slab or page alloc bypass? */
> > slab = page_slab(page); /* tail page resolution */
> > do_slab_stuff(slab);
> > } else {
> > do_page_stuff(page); /* or possibly compound_head(page) */
> > }
>
> Can you elaborate why you think this would be better?
>
> If the object is sitting in a tailpage, the flag test itself could
> avoid the compound_head(), but at the end of the day it's the slab or
> the headpage that needs to be operated on in the fastpaths, and we
> need to do the compound_head() whether the flag is set or not.
>
> I suppose it could make some debugging checks marginally cheaper?
>
> But OTOH it comes at the cost of the flag setting and clearing loops
> in the slab allocation path, even when debugging checks are disabled.
>
> And it would further complicate the compound page model by introducing
> another distinct flag handling scheme (would there be other users for
> it?). The open-coded loops as a means to ensure flag integrity seem
> error prone; but creating Set and Clear variants that encapsulate the
> loops sounds like a move in the wrong direction, given the pain the
> compound_head() alone in them has already created.

I see this as a move towards the dynamically allocated future.
In that future, I think we'll do something like:

static inline struct slab *alloc_slab_pages(...)
{
struct page *page;
struct slab *slab = kmalloc(sizeof(struct slab), gfp);
if (!slab)
return -ENOMEM;
... init slab ...
page = palloc(slab, MEM_TYPE_SLAB, node, gfp, order);
if (!page) {
kfree(slab);
return -ENOMEM;
}
slab->virt = page_address(page);

return slab;
}

where palloc() does something like:

unsigned long page_desc = mem_type | (unsigned long)mem_desc;

... allocates the struct pages ...

for (i = 0; i < (1 << order); i++)
page[i] = page_desc;

return page;
}


For today, testing PageSlab on the tail page helps the test proceed
in parallel with the action. Looking at slub's kfree() for an example:

page = virt_to_head_page(x);
if (unlikely(!PageSlab(page))) {
free_nonslab_page(page, object);
return;
}
slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_);

Your proposal is certainly an improvement (since gcc doesn't know
that compound_head(compound_head(x)) == compound_head(x)), but I
think checking on the tail page is even better:

page = virt_to_page(x);
if (unlikely(!PageSlab(page))) {
free_nonslab_page(compound_head(page), object);
return;
}
slab = page_slab(page);
slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_);

The compound_head() parts can proceed in parallel with the check of
PageSlab().

As far as the cost of setting PageSlab, those cachelines are already
dirty because we set compound_head on each of those pages already
(or in the case of freeing, we're about to clear compound_head on
each of those pages).

> > There could also be a PageTail check in there for some of the cases --
> > catch people doing something like:
> > kfree(kmalloc(65536, GFP_KERNEL) + 16384);
> > which happens to work today, but should probably not.
>
> I actually wondered about that when looking at the slob code. Its
> kfree does this:
>
> sp = virt_to_page(block);
> if (PageSlab(compound_head(sp))) {
> int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
> unsigned int *m = (unsigned int *)(block - align);
> slob_free(m, *m + align);
> } else {
> unsigned int order = compound_order(sp);
> mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
> -(PAGE_SIZE << order));
> __free_pages(sp, order);
>
> }
>
> Note the virt_to_page(), instead of virt_to_head_page(). It does test
> PG_slab correctly, but if this is a bypass page, it operates on
> whatever tail page the kfree() argument points into. If you did what
> you write above, it would leak the pages before the object.

slob doesn't have to be as careful because it falls back to the page
allocator for all allocations > PAGE_SIZE (see calls to
slob_new_pages())

2021-10-13 13:51:59

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

On Wed, Oct 13, 2021 at 04:19:18AM +0100, Matthew Wilcox wrote:
> On Tue, Oct 12, 2021 at 04:30:20PM -0400, Johannes Weiner wrote:
> > On Tue, Oct 12, 2021 at 08:23:26PM +0100, Matthew Wilcox wrote:
> > > On Tue, Oct 12, 2021 at 02:01:37PM -0400, Johannes Weiner wrote:
> > > > PageSlab() currently imposes a compound_head() call on all callsites
> > > > even though only very few situations encounter tailpages. This short
> > > > series bubbles tailpage resolution up to the few sites that need it,
> > > > and eliminates it everywhere else.
> > > >
> > > > This is a stand-alone improvement. However, it's inspired by Willy's
> > > > patches to split struct slab from struct page. Those patches currently
> > > > resolve a slab object pointer to its struct slab as follows:
> > > >
> > > > slab = virt_to_slab(p); /* tailpage resolution */
> > > > if (slab_test_cache(slab)) { /* slab or page alloc bypass? */
> > > > do_slab_stuff(slab);
> > > > } else {
> > > > page = (struct page *)slab;
> > > > do_page_stuff(page);
> > > > }
> > > >
> > > > which makes struct slab an ambiguous type that needs to self-identify
> > > > with the slab_test_cache() test (which in turn relies on PG_slab in
> > > > the flags field shared between page and slab).
> > > >
> > > > It would be preferable to do:
> > > >
> > > > page = virt_to_head_page(p); /* tailpage resolution */
> > > > if (PageSlab(page)) { /* slab or page alloc bypass? */
> > > > slab = page_slab(page);
> > > > do_slab_stuff(slab);
> > > > } else {
> > > > do_page_stuff(page);
> > > > }
> > > >
> > > > and leave the ambiguity and the need to self-identify with struct
> > > > page, so that struct slab is a strong and unambiguous type, and a
> > > > non-NULL struct slab encountered in the wild is always a valid object
> > > > without the need to check another dedicated flag for validity first.
> > > >
> > > > However, because PageSlab() currently implies tailpage resolution,
> > > > writing the virt->slab lookup in the preferred way would add yet more
> > > > unnecessary compound_head() call to the hottest MM paths.
> > > >
> > > > The page flag helpers should eventually all be weaned off of those
> > > > compound_head() calls for their unnecessary overhead alone. But this
> > > > one in particular is now getting in the way of writing code in the
> > > > preferred manner, and bleeding page ambiguity into the new types that
> > > > are supposed to eliminate specifically that. It's ripe for a cleanup.
> > >
> > > So what I had in mind was more the patch at the end (which I now realise
> > > is missing the corresponding changes to __ClearPageSlab()). There is,
> > > however, some weirdness with kfence, which appears to be abusing PageSlab
> > > by setting it on pages which are not slab pages???
> > >
> > > page = virt_to_page(p);
> > > if (PageSlab(page)) { /* slab or page alloc bypass? */
> > > slab = page_slab(page); /* tail page resolution */
> > > do_slab_stuff(slab);
> > > } else {
> > > do_page_stuff(page); /* or possibly compound_head(page) */
> > > }
> >
> > Can you elaborate why you think this would be better?
> >
> > If the object is sitting in a tailpage, the flag test itself could
> > avoid the compound_head(), but at the end of the day it's the slab or
> > the headpage that needs to be operated on in the fastpaths, and we
> > need to do the compound_head() whether the flag is set or not.
> >
> > I suppose it could make some debugging checks marginally cheaper?
> >
> > But OTOH it comes at the cost of the flag setting and clearing loops
> > in the slab allocation path, even when debugging checks are disabled.
> >
> > And it would further complicate the compound page model by introducing
> > another distinct flag handling scheme (would there be other users for
> > it?). The open-coded loops as a means to ensure flag integrity seem
> > error prone; but creating Set and Clear variants that encapsulate the
> > loops sounds like a move in the wrong direction, given the pain the
> > compound_head() alone in them has already created.
>
> I see this as a move towards the dynamically allocated future.
> In that future, I think we'll do something like:
>
> static inline struct slab *alloc_slab_pages(...)
> {
> struct page *page;
> struct slab *slab = kmalloc(sizeof(struct slab), gfp);
> if (!slab)
> return -ENOMEM;
> ... init slab ...
> page = palloc(slab, MEM_TYPE_SLAB, node, gfp, order);
> if (!page) {
> kfree(slab);
> return -ENOMEM;
> }
> slab->virt = page_address(page);
>
> return slab;
> }
>
> where palloc() does something like:
>
> unsigned long page_desc = mem_type | (unsigned long)mem_desc;
>
> ... allocates the struct pages ...
>
> for (i = 0; i < (1 << order); i++)
> page[i] = page_desc;
>
> return page;
> }

Yeah having the page point to the full descriptor makes sense. That's
efficient. And it's very simple, conceptually...

> For today, testing PageSlab on the tail page helps the test proceed
> in parallel with the action. Looking at slub's kfree() for an example:
>
> page = virt_to_head_page(x);
> if (unlikely(!PageSlab(page))) {
> free_nonslab_page(page, object);
> return;
> }
> slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_);
>
> Your proposal is certainly an improvement (since gcc doesn't know
> that compound_head(compound_head(x)) == compound_head(x)), but I
> think checking on the tail page is even better:
>
> page = virt_to_page(x);
> if (unlikely(!PageSlab(page))) {
> free_nonslab_page(compound_head(page), object);
> return;
> }
> slab = page_slab(page);
> slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_);
>
> The compound_head() parts can proceed in parallel with the check of
> PageSlab().
>
> As far as the cost of setting PageSlab, those cachelines are already
> dirty because we set compound_head on each of those pages already
> (or in the case of freeing, we're about to clear compound_head on
> each of those pages).

... but this is not. I think the performance gains from this would
have to be significant to justify complicating page flags further.

2021-10-13 17:58:40

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

On Wed, Oct 13, 2021 at 09:49:31AM -0400, Johannes Weiner wrote:
> On Wed, Oct 13, 2021 at 04:19:18AM +0100, Matthew Wilcox wrote:
> > For today, testing PageSlab on the tail page helps the test proceed
> > in parallel with the action. Looking at slub's kfree() for an example:
> >
> > page = virt_to_head_page(x);
> > if (unlikely(!PageSlab(page))) {
> > free_nonslab_page(page, object);
> > return;
> > }
> > slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_);
> >
> > Your proposal is certainly an improvement (since gcc doesn't know
> > that compound_head(compound_head(x)) == compound_head(x)), but I
> > think checking on the tail page is even better:
> >
> > page = virt_to_page(x);
> > if (unlikely(!PageSlab(page))) {
> > free_nonslab_page(compound_head(page), object);
> > return;
> > }
> > slab = page_slab(page);
> > slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_);
> >
> > The compound_head() parts can proceed in parallel with the check of
> > PageSlab().
> >
> > As far as the cost of setting PageSlab, those cachelines are already
> > dirty because we set compound_head on each of those pages already
> > (or in the case of freeing, we're about to clear compound_head on
> > each of those pages).
>
> ... but this is not. I think the performance gains from this would
> have to be significant to justify complicating page flags further.

My argument isn't really "this is more efficient", because I think
the performance gains are pretty minimal. More that I would like to
be able to write code in the style which we'll want to use when we're
using dynamically allocated memory descriptors. It's all just code,
and we can change it at any time, but better to change it to something
that continues to work well in the future.

I don't think we end up with "virt_to_head_page()" in a dynamically
allocated memory descriptor world. The head page contains no different
information from the tail pages, and indeed the tail pages don't know
that they're tail pages, or where the head page is. Or maybe they're
all tail pages.

I could see a world where we had a virt_to_memdesc() which returned
a generic memory descriptor that could be cast to a struct slab if
the flags within that memdesc said it was a slab. But I think it works
out better to tag the memory descriptor pointer with a discriminator
that defines what the pointer is. Plus it saves a page flag.

Maybe that's the best way to approach it -- how would you want to write
this part of kfree() when memory descriptors are dynamically allocated?

2021-10-13 19:36:03

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

On Wed, Oct 13, 2021 at 06:55:46PM +0100, Matthew Wilcox wrote:
> On Wed, Oct 13, 2021 at 09:49:31AM -0400, Johannes Weiner wrote:
> > On Wed, Oct 13, 2021 at 04:19:18AM +0100, Matthew Wilcox wrote:
> > > For today, testing PageSlab on the tail page helps the test proceed
> > > in parallel with the action. Looking at slub's kfree() for an example:
> > >
> > > page = virt_to_head_page(x);
> > > if (unlikely(!PageSlab(page))) {
> > > free_nonslab_page(page, object);
> > > return;
> > > }
> > > slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_);
> > >
> > > Your proposal is certainly an improvement (since gcc doesn't know
> > > that compound_head(compound_head(x)) == compound_head(x)), but I
> > > think checking on the tail page is even better:
> > >
> > > page = virt_to_page(x);
> > > if (unlikely(!PageSlab(page))) {
> > > free_nonslab_page(compound_head(page), object);
> > > return;
> > > }
> > > slab = page_slab(page);
> > > slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_);
> > >
> > > The compound_head() parts can proceed in parallel with the check of
> > > PageSlab().
> > >
> > > As far as the cost of setting PageSlab, those cachelines are already
> > > dirty because we set compound_head on each of those pages already
> > > (or in the case of freeing, we're about to clear compound_head on
> > > each of those pages).
> >
> > ... but this is not. I think the performance gains from this would
> > have to be significant to justify complicating page flags further.
>
> My argument isn't really "this is more efficient", because I think
> the performance gains are pretty minimal. More that I would like to
> be able to write code in the style which we'll want to use when we're
> using dynamically allocated memory descriptors. It's all just code,
> and we can change it at any time, but better to change it to something
> that continues to work well in the future.
>
> I don't think we end up with "virt_to_head_page()" in a dynamically
> allocated memory descriptor world. The head page contains no different
> information from the tail pages, and indeed the tail pages don't know
> that they're tail pages, or where the head page is. Or maybe they're
> all tail pages.

I agree with that, but future-provisioning is a tradeoff.

It'll be trivial to replace virt_to_head_page() with virt_to_page()
and remove compound_head() calls when whatever is left of struct page
will unconditionally point to a memory descriptor. And that can be
part of the changes that make it so.

But in today's codebase, maintaining type flags in the tail pages
while having to go through the headpage to find the type descriptor
makes things unnecessarily complicated in an area that already has
accrued too much tech debt.

I don't think that's a sensible thing to do as of today.

> I could see a world where we had a virt_to_memdesc() which returned
> a generic memory descriptor that could be cast to a struct slab if
> the flags within that memdesc said it was a slab. But I think it works
> out better to tag the memory descriptor pointer with a discriminator
> that defines what the pointer is. Plus it saves a page flag.
>
> Maybe that's the best way to approach it -- how would you want to write
> this part of kfree() when memory descriptors are dynamically allocated?

There are still many question marks on how the split out memory
descriptors actually will look like, and which state is tracked
where. 'struct slab' is an excellent trial balloon.

It's good to have common north stars to set the direction of where to
place efforts ("small struct page, dynamically allocated descriptors
etc.") but I don't think it makes sense to take on yet more tech debt
in this area for a future that may not pan out the way we think now.

2021-10-13 20:02:22

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

On Wed, Oct 13, 2021 at 03:33:00PM -0400, Johannes Weiner wrote:
> On Wed, Oct 13, 2021 at 06:55:46PM +0100, Matthew Wilcox wrote:
> > On Wed, Oct 13, 2021 at 09:49:31AM -0400, Johannes Weiner wrote:
> > > On Wed, Oct 13, 2021 at 04:19:18AM +0100, Matthew Wilcox wrote:
> > > > For today, testing PageSlab on the tail page helps the test proceed
> > > > in parallel with the action. Looking at slub's kfree() for an example:
> > > >
> > > > page = virt_to_head_page(x);
> > > > if (unlikely(!PageSlab(page))) {
> > > > free_nonslab_page(page, object);
> > > > return;
> > > > }
> > > > slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_);
> > > >
> > > > Your proposal is certainly an improvement (since gcc doesn't know
> > > > that compound_head(compound_head(x)) == compound_head(x)), but I
> > > > think checking on the tail page is even better:
> > > >
> > > > page = virt_to_page(x);
> > > > if (unlikely(!PageSlab(page))) {
> > > > free_nonslab_page(compound_head(page), object);
> > > > return;
> > > > }
> > > > slab = page_slab(page);
> > > > slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_);
> > > >
> > > > The compound_head() parts can proceed in parallel with the check of
> > > > PageSlab().
> > > >
> > > > As far as the cost of setting PageSlab, those cachelines are already
> > > > dirty because we set compound_head on each of those pages already
> > > > (or in the case of freeing, we're about to clear compound_head on
> > > > each of those pages).
> > >
> > > ... but this is not. I think the performance gains from this would
> > > have to be significant to justify complicating page flags further.
> >
> > My argument isn't really "this is more efficient", because I think
> > the performance gains are pretty minimal. More that I would like to
> > be able to write code in the style which we'll want to use when we're
> > using dynamically allocated memory descriptors. It's all just code,
> > and we can change it at any time, but better to change it to something
> > that continues to work well in the future.
> >
> > I don't think we end up with "virt_to_head_page()" in a dynamically
> > allocated memory descriptor world. The head page contains no different
> > information from the tail pages, and indeed the tail pages don't know
> > that they're tail pages, or where the head page is. Or maybe they're
> > all tail pages.
>
> I agree with that, but future-provisioning is a tradeoff.
>
> It'll be trivial to replace virt_to_head_page() with virt_to_page()
> and remove compound_head() calls when whatever is left of struct page
> will unconditionally point to a memory descriptor. And that can be
> part of the changes that make it so.

I.e. remove all the *to_head() stuff when head/tail pages actually
cease to be a thing, not earlier.

Essentially, I think it's the right direction to pursue, but I'm not
sure yet that it's exactly where we will end up.

> > I could see a world where we had a virt_to_memdesc() which returned
> > a generic memory descriptor that could be cast to a struct slab if
> > the flags within that memdesc said it was a slab. But I think it works
> > out better to tag the memory descriptor pointer with a discriminator
> > that defines what the pointer is. Plus it saves a page flag.
> >
> > Maybe that's the best way to approach it -- how would you want to write
> > this part of kfree() when memory descriptors are dynamically allocated?

Yeah, or as Kent put it "how would you like the code to look like with
infinite refactoring?"

But that also implies we can do it in incremental, self-contained
steps that each leave the code base in a better place than before.

Which avoids building up dependencies on future code and unimplemented
ideas that are vague, likely look different in everybody's head, or
may not pan out at all.

2021-10-14 07:38:31

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

> It's good to have common north stars to set the direction of where to
> place efforts ("small struct page, dynamically allocated descriptors
> etc.") but I don't think it makes sense to take on yet more tech debt
> in this area for a future that may not pan out the way we think now.
>

That precisely sums up my thoughts, thanks Johannes.

--
Thanks,

David / dhildenb

2021-10-25 18:11:43

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 00/11] PageSlab: eliminate unnecessary compound_head() calls

On 10/12/21 20:01, Johannes Weiner wrote:
> PageSlab() currently imposes a compound_head() call on all callsites
> even though only very few situations encounter tailpages. This short
> series bubbles tailpage resolution up to the few sites that need it,
> and eliminates it everywhere else.
>
> This is a stand-alone improvement. However, it's inspired by Willy's
> patches to split struct slab from struct page. Those patches currently
> resolve a slab object pointer to its struct slab as follows:
>
> slab = virt_to_slab(p); /* tailpage resolution */
> if (slab_test_cache(slab)) { /* slab or page alloc bypass? */
> do_slab_stuff(slab);
> } else {
> page = (struct page *)slab;
> do_page_stuff(page);
> }
>
> which makes struct slab an ambiguous type that needs to self-identify
> with the slab_test_cache() test (which in turn relies on PG_slab in
> the flags field shared between page and slab).
>
> It would be preferable to do:
>
> page = virt_to_head_page(p); /* tailpage resolution */
> if (PageSlab(page)) { /* slab or page alloc bypass? */
> slab = page_slab(page);
> do_slab_stuff(slab);
> } else {
> do_page_stuff(page);
> }

I agree with this. Also not fond of introducing setting PG_slab on all tail
pages as willy proposed. But also would like to avoid the tree-wide changes
to PageSlab tailpage resolution that this series is doing.

What we could do is what you suggest above, but the few hotpaths in slab
itself would use a __PageSlab() variant that just doesn't do tailpage
resolution. The rest of tree could keep doing PageSlab with tailpage
resolution for now, and then we can improve on that later. Would that be
acceptable?

With that I'm proposing to take over willy's series (as he asked for that in
the cover letter) which would be modified in the above sense. And maybe in
other ways I can't immediately predict.

But I do want to at least try an approach where we would deal with the
"boundary" functions first (that need to deal with struct page) and then
convert the bulk of "struct page" to "struct slab" at once with help of a
coccinelle semantic patch. I'm hoping that this wouldn't thus be a
"slapdash" conversion, while avoiding a large series of incremental patches
- because the result of such automation wouldn't need such close manual
review as human-written patches do.

> and leave the ambiguity and the need to self-identify with struct
> page, so that struct slab is a strong and unambiguous type, and a
> non-NULL struct slab encountered in the wild is always a valid object
> without the need to check another dedicated flag for validity first.
>
> However, because PageSlab() currently implies tailpage resolution,
> writing the virt->slab lookup in the preferred way would add yet more
> unnecessary compound_head() call to the hottest MM paths.
>
> The page flag helpers should eventually all be weaned off of those
> compound_head() calls for their unnecessary overhead alone. But this
> one in particular is now getting in the way of writing code in the
> preferred manner, and bleeding page ambiguity into the new types that
> are supposed to eliminate specifically that. It's ripe for a cleanup.
>
> Based on v5.15-rc4.
>
> arch/ia64/kernel/mca_drv.c | 2 +-
> drivers/ata/libata-sff.c | 2 +-
> fs/proc/page.c | 12 +++++++-----
> include/linux/net.h | 2 +-
> include/linux/page-flags.h | 10 +++++-----
> mm/kasan/generic.c | 2 +-
> mm/kasan/kasan.h | 2 +-
> mm/kasan/report.c | 4 ++--
> mm/kasan/report_tags.c | 2 +-
> mm/memory-failure.c | 6 +++---
> mm/memory.c | 3 ++-
> mm/slab.h | 2 +-
> mm/slob.c | 4 ++--
> mm/slub.c | 6 +++---
> mm/util.c | 2 +-
> 15 files changed, 32 insertions(+), 29 deletions(-)
>
>
>