2024-05-27 14:28:29

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 0/6] mm: page_type, zsmalloc and page_mapcount_reset()

Wanting to remove the remaining abuser of _mapcount/page_type along with
page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
converted away from "struct page" [1].

Unfortunately, we cannot stop using the page_type field in zsmalloc code
completely for its own purposes. All other fields in "struct page" are
used one way or the other. Could we simply store a 2-byte offset value
at the beginning of each page? Likely, but that will require a bit more
work; and once we have memdesc we might want to move the offset in there
(struct zsalloc?) again.

.. but we can limit the abuse to 16 bit, glue it to a page type that
must be set, and document it. page_has_type() will always successfully
indicate such zsmalloc pages, and such zsmalloc pages only.

We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
We could use more bits from the page type, but 16 bit sounds like a good
idea for now.

So clarify the _mapcount/page_type documentation, use a proper page_type
for zsmalloc, and remove page_mapcount_reset().

Survived a couple of days with the built bots and my cross-compile
attempts. Briefly tested with zram on x86-64.

[1] https://lore.kernel.org/all/[email protected]/

Cc: Andrew Morton <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>

v1 -> v2:
* Rebased to v6.10-rc1
* "mm: update _mapcount and page_type documentation"
-> Exchange members and fixup doc as suggested by Mike
* "mm: allow reuse of the lower 16bit of the page type with an actual
type"
-> Remove "highest bit" comment, fixup PG_buddy, extend description
* "mm/zsmalloc: use a proper page type"
-> Add and use HAVE_ZSMALLOC to fixup compilcation
-> Fixup BUILD_BUG_ON
-> Add some VM_WARN_ON_ONCE(!PageZsmalloc(page));
* "mm/mm_init: initialize page->_mapcount directly
in __init_single_page()"
-> Fixup patch subject

David Hildenbrand (6):
mm: update _mapcount and page_type documentation
mm: allow reuse of the lower 16 bit of the page type with an actual
type
mm/zsmalloc: use a proper page type
mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages
mm/filemap: reinitialize folio->_mapcount directly
mm/mm_init: initialize page->_mapcount directly in
__init_single_page()

drivers/block/zram/Kconfig | 1 +
include/linux/mm.h | 10 ----------
include/linux/mm_types.h | 33 ++++++++++++++++++++++-----------
include/linux/page-flags.h | 23 +++++++++++++++--------
mm/Kconfig | 10 ++++++++--
mm/filemap.c | 2 +-
mm/mm_init.c | 2 +-
mm/page_alloc.c | 6 ++++--
mm/zsmalloc.c | 29 +++++++++++++++++++++++++----
9 files changed, 77 insertions(+), 39 deletions(-)


base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
--
2.45.1



2024-05-27 14:28:39

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 1/6] mm: update _mapcount and page_type documentation

Let's make it clearer that _mapcount must no longer be used for own
purposes, and how _mapcount and page_type behaves nowadays (also in the
context of hugetlb folios, which are typed folios that will be mapped
to user space).

Move the documentation regarding "-1" over from page_mapcount_reset(),
which we will remove next. Move "page_type" before "mapcount", to make
it clearer what typed folios are.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 5 -----
include/linux/mm_types.h | 28 +++++++++++++++++-----------
2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9849dfda44d4..018e7c0265ca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
return atomic_read(&folio->_entire_mapcount) + 1;
}

-/*
- * The atomic page->_mapcount, starts from -1: so that transitions
- * both from it and to it can be tracked, using atomic_inc_and_test
- * and atomic_add_negative(-1).
- */
static inline void page_mapcount_reset(struct page *page)
{
atomic_set(&(page)->_mapcount, -1);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 24323c7d0bd4..6b2aeba792c4 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -46,9 +46,7 @@ struct mem_cgroup;
* which is guaranteed to be aligned. If you use the same storage as
* page->mapping, you must restore it to NULL before freeing the page.
*
- * If your page will not be mapped to userspace, you can also use the four
- * bytes in the mapcount union, but you must call page_mapcount_reset()
- * before freeing it.
+ * The mapcount field must not be used for own purposes.
*
* If you want to use the refcount field, it must be used in such a way
* that other CPUs temporarily incrementing and then decrementing the
@@ -152,18 +150,26 @@ struct page {

union { /* This union is 4 bytes in size. */
/*
- * If the page can be mapped to userspace, encodes the number
- * of times this page is referenced by a page table.
+ * For head pages of typed folios, the value stored here
+ * allows for determining what this page is used for. The
+ * tail pages of typed folios will not store a type
+ * (mapcount == -1).
+ *
+ * See page-flags.h for a list of page types which are currently
+ * stored here.
*/
- atomic_t _mapcount;
+ unsigned int page_type;

/*
- * If the page is neither PageSlab nor mappable to userspace,
- * the value stored here may help determine what this page
- * is used for. See page-flags.h for a list of page types
- * which are currently stored here.
+ * For pages that are part of non-typed folios for which mappings
+ * are tracked via the RMAP, encodes the number of times this page
+ * is directly referenced by a page table.
+ *
+ * Note that the mapcount is always initialized to -1, so that
+ * transitions both from it and to it can be tracked, using
+ * atomic_inc_and_test() and atomic_add_negative(-1).
*/
- unsigned int page_type;
+ atomic_t _mapcount;
};

/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
--
2.45.1


2024-05-27 14:29:15

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 3/6] mm/zsmalloc: use a proper page type

Let's clean it up: use a proper page type and store our data (offset
into a page) in the lower 16 bit as documented.

We won't be able to support 256 KiB base pages, which is acceptable.
Teach Kconfig to handle that cleanly using a new CONFIG_HAVE_ZSMALLOC.

Based on this, we should do a proper "struct zsdesc" conversion, as
proposed in [1].

This removes the last _mapcount/page_type offender.

[1] https://lore.kernel.org/all/[email protected]/

Cc: Hyeonggon Yoo <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/block/zram/Kconfig | 1 +
include/linux/page-flags.h | 3 +++
mm/Kconfig | 10 ++++++++--
mm/zsmalloc.c | 29 +++++++++++++++++++++++++----
4 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 7b29cce60ab2..39c04419bf87 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -3,6 +3,7 @@ config ZRAM
tristate "Compressed RAM block device support"
depends on BLOCK && SYSFS && MMU
depends on CRYPTO_LZO || CRYPTO_ZSTD || CRYPTO_LZ4 || CRYPTO_LZ4HC || CRYPTO_842
+ depends on HAVE_ZSMALLOC
select ZSMALLOC
help
Creates virtual block devices called /dev/zramX (X = 0, 1, ...).
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index b43e380ffa0b..36d9ded4462d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -957,6 +957,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
#define PG_guard 0x00080000
#define PG_hugetlb 0x00100800
#define PG_slab 0x00200000
+#define PG_zsmalloc 0x00400000

#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -1071,6 +1072,8 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
FOLIO_TEST_FLAG_FALSE(hugetlb)
#endif

+PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
+
/**
* PageHuge - Determine if the page belongs to hugetlbfs
* @page: The page to test.
diff --git a/mm/Kconfig b/mm/Kconfig
index b4cb45255a54..67dc18c94448 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -128,7 +128,7 @@ config ZSWAP_COMPRESSOR_DEFAULT
choice
prompt "Default allocator"
depends on ZSWAP
- default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if MMU
+ default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if HAVE_ZSMALLOC
default ZSWAP_ZPOOL_DEFAULT_ZBUD
help
Selects the default allocator for the compressed cache for
@@ -154,6 +154,7 @@ config ZSWAP_ZPOOL_DEFAULT_Z3FOLD

config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
bool "zsmalloc"
+ depends on HAVE_ZSMALLOC
select ZSMALLOC
help
Use the zsmalloc allocator as the default allocator.
@@ -186,10 +187,15 @@ config Z3FOLD
page. It is a ZBUD derivative so the simplicity and determinism are
still there.

+config HAVE_ZSMALLOC
+ def_bool y
+ depends on MMU
+ depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
+
config ZSMALLOC
tristate
prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
- depends on MMU
+ depends on HAVE_ZSMALLOC
help
zsmalloc is a slab-based memory allocator designed to store
pages of various compression levels efficiently. It achieves
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b42d3545ca85..1a6af454520e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -20,7 +20,8 @@
* page->index: links together all component pages of a zspage
* For the huge page, this is always 0, so we use this field
* to store handle.
- * page->page_type: first object offset in a subpage of zspage
+ * page->page_type: PG_zsmalloc, lower 16 bit locate the first object
+ * offset in a subpage of a zspage
*
* Usage of struct page flags:
* PG_private: identifies the first component page
@@ -450,14 +451,28 @@ static inline struct page *get_first_page(struct zspage *zspage)
return first_page;
}

+#define FIRST_OBJ_PAGE_TYPE_MASK 0xffff
+
+static inline void reset_first_obj_offset(struct page *page)
+{
+ VM_WARN_ON_ONCE(!PageZsmalloc(page));
+ page->page_type |= FIRST_OBJ_PAGE_TYPE_MASK;
+}
+
static inline unsigned int get_first_obj_offset(struct page *page)
{
- return page->page_type;
+ VM_WARN_ON_ONCE(!PageZsmalloc(page));
+ return page->page_type & FIRST_OBJ_PAGE_TYPE_MASK;
}

static inline void set_first_obj_offset(struct page *page, unsigned int offset)
{
- page->page_type = offset;
+ /* With 16 bit available, we can support offsets into 64 KiB pages. */
+ BUILD_BUG_ON(PAGE_SIZE > SZ_64K);
+ VM_WARN_ON_ONCE(!PageZsmalloc(page));
+ VM_WARN_ON_ONCE(offset & ~FIRST_OBJ_PAGE_TYPE_MASK);
+ page->page_type &= ~FIRST_OBJ_PAGE_TYPE_MASK;
+ page->page_type |= offset & FIRST_OBJ_PAGE_TYPE_MASK;
}

static inline unsigned int get_freeobj(struct zspage *zspage)
@@ -791,8 +806,9 @@ static void reset_page(struct page *page)
__ClearPageMovable(page);
ClearPagePrivate(page);
set_page_private(page, 0);
- page_mapcount_reset(page);
page->index = 0;
+ reset_first_obj_offset(page);
+ __ClearPageZsmalloc(page);
}

static int trylock_zspage(struct zspage *zspage)
@@ -965,11 +981,13 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
if (!page) {
while (--i >= 0) {
dec_zone_page_state(pages[i], NR_ZSPAGES);
+ __ClearPageZsmalloc(pages[i]);
__free_page(pages[i]);
}
cache_free_zspage(pool, zspage);
return NULL;
}
+ __SetPageZsmalloc(page);

inc_zone_page_state(page, NR_ZSPAGES);
pages[i] = page;
@@ -1762,6 +1780,9 @@ static int zs_page_migrate(struct page *newpage, struct page *page,

VM_BUG_ON_PAGE(!PageIsolated(page), page);

+ /* We're committed, tell the world that this is a Zsmalloc page. */
+ __SetPageZsmalloc(newpage);
+
/* The page is locked, so this pointer must remain valid */
zspage = get_zspage(page);
pool = zspage->pool;
--
2.45.1


2024-05-27 14:29:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type

As long as the owner sets a page type first, we can allow reuse of the
lower 18 bit: sufficient to store an offset into a 64 KiB page, which
is the maximum base page size in *common* configurations (ignoring the
256 KiB variant). Restrict it to the head page.

We'll use that for zsmalloc next, to set a proper type while still
reusing that field to store information (offset into a base page) that
cannot go elsewhere for now.

Fear of running out of bits for storing the actual type? Actually, we
don't need one bit per type, we could store a single value instead.
Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm_types.h | 5 +++++
include/linux/page-flags.h | 20 ++++++++++++--------
2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6b2aeba792c4..598cfedbbfa0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -157,6 +157,11 @@ struct page {
*
* See page-flags.h for a list of page types which are currently
* stored here.
+ *
+ * Owners of typed folios may reuse the lower 16 bit of the
+ * head page page_type field after setting the page type,
+ * but must reset these 16 bit to -1 before clearing the
+ * page type.
*/
unsigned int page_type;

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 104078afe0b1..b43e380ffa0b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
*/

#define PAGE_TYPE_BASE 0xf0000000
-/* Reserve 0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE -128
-#define PG_buddy 0x00000080
-#define PG_offline 0x00000100
-#define PG_table 0x00000200
-#define PG_guard 0x00000400
-#define PG_hugetlb 0x00000800
-#define PG_slab 0x00001000
+/*
+ * Reserve 0x0000ffff to catch underflows of _mapcount and
+ * allow owners that set a type to reuse the lower 16 bit for their own
+ * purposes.
+ */
+#define PAGE_MAPCOUNT_RESERVE -65536
+#define PG_buddy 0x00010000
+#define PG_offline 0x00020000
+#define PG_table 0x00040000
+#define PG_guard 0x00080000
+#define PG_hugetlb 0x00100800
+#define PG_slab 0x00200000

#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
--
2.45.1


2024-05-27 14:30:47

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page()

Let's simply reinitialize the page->_mapcount directly. We can now
get rid of page_mapcount_reset().

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 5 -----
mm/mm_init.c | 2 +-
2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 018e7c0265ca..3e1d3b0d545e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
return atomic_read(&folio->_entire_mapcount) + 1;
}

-static inline void page_mapcount_reset(struct page *page)
-{
- atomic_set(&(page)->_mapcount, -1);
-}
-
/**
* page_mapcount() - Number of times this precise page is mapped.
* @page: The page.
diff --git a/mm/mm_init.c b/mm/mm_init.c
index f72b852bd5b8..b4916751edce 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -568,7 +568,7 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
mm_zero_struct_page(page);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
- page_mapcount_reset(page);
+ atomic_set(&page->_mapcount, -1);
page_cpupid_reset_last(page);
page_kasan_tag_reset(page);

--
2.45.1


2024-05-27 14:31:13

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages

Let's stop using page_mapcount_reset() and clear PageBuddy using
__ClearPageBuddy() instead.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/page_alloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2e22ce5675ca..b595342e73c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -498,7 +498,8 @@ static void bad_page(struct page *page, const char *reason)
dump_stack();
out:
/* Leave bad fields for debug, except PageBuddy could make trouble */
- page_mapcount_reset(page); /* remove PageBuddy */
+ if (PageBuddy(page))
+ __ClearPageBuddy(page);
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
}

@@ -1346,7 +1347,8 @@ static void check_new_page_bad(struct page *page)
{
if (unlikely(page->flags & __PG_HWPOISON)) {
/* Don't complain about hwpoisoned pages */
- page_mapcount_reset(page); /* remove PageBuddy */
+ if (PageBuddy(page))
+ __ClearPageBuddy(page);
return;
}

--
2.45.1


2024-05-27 14:31:21

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 5/6] mm/filemap: reinitialize folio->_mapcount directly

Let's get rid of the page_mapcount_reset() call and simply reinitialize
folio->_mapcount directly.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/filemap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 382c3d06bfb1..c4ac7289e88a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -177,7 +177,7 @@ static void filemap_unaccount_folio(struct address_space *mapping,
* and we'd rather not leak it: if we're wrong,
* another bad page check should catch it later.
*/
- page_mapcount_reset(&folio->page);
+ atomic_set(&folio->_mapcount, -1);
folio_ref_sub(folio, mapcount);
}
}
--
2.45.1


2024-05-27 17:32:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type

On Mon, May 27, 2024 at 04:14:50PM +0200, David Hildenbrand wrote:
> As long as the owner sets a page type first, we can allow reuse of the
> lower 18 bit: sufficient to store an offset into a 64 KiB page, which

You say 18 here and 16 below.

> is the maximum base page size in *common* configurations (ignoring the
> 256 KiB variant). Restrict it to the head page.
>
> We'll use that for zsmalloc next, to set a proper type while still
> reusing that field to store information (offset into a base page) that
> cannot go elsewhere for now.
>
> Fear of running out of bits for storing the actual type? Actually, we
> don't need one bit per type, we could store a single value instead.
> Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.

We could, but it's more instructions to check.

> +++ b/include/linux/page-flags.h
> @@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
> */
>
> #define PAGE_TYPE_BASE 0xf0000000
> -/* Reserve 0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE -128
> -#define PG_buddy 0x00000080
> -#define PG_offline 0x00000100
> -#define PG_table 0x00000200
> -#define PG_guard 0x00000400
> -#define PG_hugetlb 0x00000800
> -#define PG_slab 0x00001000
> +/*
> + * Reserve 0x0000ffff to catch underflows of _mapcount and
> + * allow owners that set a type to reuse the lower 16 bit for their own
> + * purposes.
> + */
> +#define PAGE_MAPCOUNT_RESERVE -65536

I think my original comment was misleading. This should be:

* Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflow.

How about we start at the top end and let people extend down? ie:

#define PAGE_TYPE_BASE 0x80000000
#define PG_buddy 0x40000000
#define PG_offline 0x20000000
#define PG_table 0x10000000
#define PG_guard 0x08000000
#define PG_hugetlb 0x04000000
#define PG_slab 0x02000000
#define PAGE_MAPCOUNT_RESERVE (~0x0000ffff)

Now we can see that we have 9 flags remaining, which should last until
we can have proper memdesc typing.

2024-05-27 18:49:43

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type

Am 27.05.24 um 17:26 schrieb Matthew Wilcox:
> On Mon, May 27, 2024 at 04:14:50PM +0200, David Hildenbrand wrote:
>> As long as the owner sets a page type first, we can allow reuse of the
>> lower 18 bit: sufficient to store an offset into a 64 KiB page, which
>
> You say 18 here and 16 below.

Thanks, missed to fixup one instance after going back and forth.

>
>> is the maximum base page size in *common* configurations (ignoring the
>> 256 KiB variant). Restrict it to the head page.
>>
>> We'll use that for zsmalloc next, to set a proper type while still
>> reusing that field to store information (offset into a base page) that
>> cannot go elsewhere for now.
>>
>> Fear of running out of bits for storing the actual type? Actually, we
>> don't need one bit per type, we could store a single value instead.
>> Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.
>
> We could, but it's more instructions to check.

Maybe, and maybe not sufficient more that we care.

I was thinking of something like the following (probably broken but you should
get the idea):

/*
* If the _mapcount is negative, we might store a page type. The
* page_type field corresponds to the most significant byte of the
* _mapcount field. As the mapcount is initialized to -1, we have no
* type as defaults. We have plenty of room to underflow the mapcount
* before we would end up indicating a valid page_type.
*/
#define PAGE_TYPE_BASE 0x80
enum page_type {
PT_buddy = PAGE_TYPE_BASE,
PT_offline,
PT_table,
PT_guard,
PT_hugetlb,
PT_slab,
/* we must forbid page_type == -1 */
PT_unusable = 0xff
};

In struct page:

union {
atomic_t _mapcount;

#if __BYTE_ORDER == __BIG_ENDIAN
struct {
uint16_t page_type_data;
uint8_t page_type_reserved;
uint8_t page_type;
};
#else
struct {
uint8_t page_type;
uint8_t page_type_reserved;
uint16_t page_type_data;
};
#end
};

#define PageType(page, type) (page->page_type == type)

Once could maybe also change page_has_type to simply work on the
fact that the highest bit must be set and any other bit of the type must be clear:

static inline int page_has_type(const struct page *page)
{
return (page->page_type & PAGE_TYPE_BASE) &&
page->page_type != 0xffff;
}

But just some thought.

>
>> +++ b/include/linux/page-flags.h
>> @@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>> */
>>
>> #define PAGE_TYPE_BASE 0xf0000000
>> -/* Reserve 0x0000007f to catch underflows of _mapcount */
>> -#define PAGE_MAPCOUNT_RESERVE -128
>> -#define PG_buddy 0x00000080
>> -#define PG_offline 0x00000100
>> -#define PG_table 0x00000200
>> -#define PG_guard 0x00000400
>> -#define PG_hugetlb 0x00000800
>> -#define PG_slab 0x00001000
>> +/*
>> + * Reserve 0x0000ffff to catch underflows of _mapcount and
>> + * allow owners that set a type to reuse the lower 16 bit for their own
>> + * purposes.
>> + */
>> +#define PAGE_MAPCOUNT_RESERVE -65536
>
> I think my original comment was misleading. This should be:
>
> * Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflow.

Makes sense.

>
> How about we start at the top end and let people extend down? ie:
>
> #define PAGE_TYPE_BASE 0x80000000
> #define PG_buddy 0x40000000
> #define PG_offline 0x20000000
> #define PG_table 0x10000000
> #define PG_guard 0x08000000
> #define PG_hugetlb 0x04000000
> #define PG_slab 0x02000000
> #define PAGE_MAPCOUNT_RESERVE (~0x0000ffff)
>
> Now we can see that we have 9 flags remaining, which should last until
> we can have proper memdesc typing.

Also works for me.

Thanks!

--
Thanks,

David / dhildenb


2024-05-29 15:56:52

by wang wei

[permalink] [raw]
Subject: Re: [PATCH v1 26] mm allow reuse of the lower 16 bit of the page type with an actual type

---
> As long as the owner sets a page type first, we can allow reuse of the
> lower 18 bit: sufficient to store an offset into a 64 KiB page, which
> is the maximum base page size in *common* configurations (ignoring the
> 256 KiB variant). Restrict it to the head page.
>
> We'll use that for zsmalloc next, to set a proper type while still
> reusing that field to store information (offset into a base page) that
> cannot go elsewhere for now.
>
> Fear of running out of bits for storing the actual type? Actually, we
> don't need one bit per type, we could store a single value instead.
> Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> include/linux/mm_types.h | 5 +++++
> include/linux/page-flags.h | 20 ++++++++++++--------
> 2 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6b2aeba792c4..598cfedbbfa0 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -157,6 +157,11 @@ struct page {
> *
> * See page-flags.h for a list of page types which are currently
> * stored here.
> + *
> + * Owners of typed folios may reuse the lower 16 bit of the
> + * head page page_type field after setting the page type,
> + * but must reset these 16 bit to -1 before clearing the
> + * page type.
> */
> unsigned int page_type;
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 104078afe0b1..b43e380ffa0b 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
> */
>
> #define PAGE_TYPE_BASE 0xf0000000
> -/* Reserve 0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE -128
> -#define PG_buddy 0x00000080
> -#define PG_offline 0x00000100
> -#define PG_table 0x00000200
> -#define PG_guard 0x00000400
> -#define PG_hugetlb 0x00000800
> -#define PG_slab 0x00001000
> +/*
> + * Reserve 0x0000ffff to catch underflows of _mapcount and
> + * allow owners that set a type to reuse the lower 16 bit for their own
> + * purposes.
> + */
> +#define PAGE_MAPCOUNT_RESERVE -65536
> +#define PG_buddy 0x00010000
> +#define PG_offline 0x00020000
> +#define PG_table 0x00040000
> +#define PG_guard 0x00080000
> +#define PG_hugetlb 0x00100800

Every PG_XX occupies one bit in my understanding. But why PG_hugetlb occupies two bits?

> +#define PG_slab 0x00200000
>
> #define PageType(page, flag) \
> ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)

--
2.25.1


2024-05-29 15:58:20

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 26] mm allow reuse of the lower 16 bit of the page type with an actual type

On 29.05.24 17:55, wang wei wrote:
> ---
>> As long as the owner sets a page type first, we can allow reuse of the
>> lower 18 bit: sufficient to store an offset into a 64 KiB page, which
>> is the maximum base page size in *common* configurations (ignoring the
>> 256 KiB variant). Restrict it to the head page.
>>
>> We'll use that for zsmalloc next, to set a proper type while still
>> reusing that field to store information (offset into a base page) that
>> cannot go elsewhere for now.
>>
>> Fear of running out of bits for storing the actual type? Actually, we
>> don't need one bit per type, we could store a single value instead.
>> Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> include/linux/mm_types.h | 5 +++++
>> include/linux/page-flags.h | 20 ++++++++++++--------
>> 2 files changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 6b2aeba792c4..598cfedbbfa0 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -157,6 +157,11 @@ struct page {
>> *
>> * See page-flags.h for a list of page types which are currently
>> * stored here.
>> + *
>> + * Owners of typed folios may reuse the lower 16 bit of the
>> + * head page page_type field after setting the page type,
>> + * but must reset these 16 bit to -1 before clearing the
>> + * page type.
>> */
>> unsigned int page_type;
>>
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 104078afe0b1..b43e380ffa0b 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>> */
>>
>> #define PAGE_TYPE_BASE 0xf0000000
>> -/* Reserve 0x0000007f to catch underflows of _mapcount */
>> -#define PAGE_MAPCOUNT_RESERVE -128
>> -#define PG_buddy 0x00000080
>> -#define PG_offline 0x00000100
>> -#define PG_table 0x00000200
>> -#define PG_guard 0x00000400
>> -#define PG_hugetlb 0x00000800
>> -#define PG_slab 0x00001000
>> +/*
>> + * Reserve 0x0000ffff to catch underflows of _mapcount and
>> + * allow owners that set a type to reuse the lower 16 bit for their own
>> + * purposes.
>> + */
>> +#define PAGE_MAPCOUNT_RESERVE -65536
>> +#define PG_buddy 0x00010000
>> +#define PG_offline 0x00020000
>> +#define PG_table 0x00040000
>> +#define PG_guard 0x00080000
>> +#define PG_hugetlb 0x00100800
>
> Every PG_XX occupies one bit in my understanding. But why PG_hugetlb occupies two bits?

Because it's wrong (although not harmful). Same issue in v2, fat fingers.

Thanks for pointing that out!

--
Cheers,

David / dhildenb