2024-05-22 21:03:58

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 0/6] mm: page_type, zsmalloc and page_mapcount_reset()

Wanting to remove the remaining abuser of _mapcount/page_type along with
page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
converted away from "struct page" [1].

Unfortunately, we cannot stop using the page_type field in zsmalloc code
completely for its own purposes. All other fields in "struct page" are
used one way or the other.

.. but we can limit the abuse to 16 bit, glue it to a apge type that
must be set, and document it. page_has_type() will always successfully
indicate such zsmalloc pages, and such zsmalloc pages only.

We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
We could use more bits from the page type, but 16 bit sounds like a good
idea.

So clarify the _mapcount/page_type documentation, use a proper page_type
for zsmalloc, and remove page_mapcount_reset().

Only lightly tested with zram. Will have to do more testing and
cross-compile checking.

[1] https://lore.kernel.org/all/[email protected]/

Cc: Andrew Morton <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>

David Hildenbrand (6):
mm: update _mapcount and page_type documentation
mm: allow reuse of the lower 16bit of the page type with an actual
type
mm/zsmalloc: use a proper page type
mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages
mm/filemap: reinitialize folio->_mapcount directly
mm/mm_init: initialize page->_mapcount directly in__init_single_page()

include/linux/mm.h | 10 ----------
include/linux/mm_types.h | 29 ++++++++++++++++++++---------
include/linux/page-flags.h | 25 +++++++++++++++++--------
mm/Kconfig | 1 +
mm/filemap.c | 2 +-
mm/mm_init.c | 2 +-
mm/page_alloc.c | 6 ++++--
mm/zsmalloc.c | 23 +++++++++++++++++++----
8 files changed, 63 insertions(+), 35 deletions(-)


base-commit: 29c73fc794c83505066ee6db893b2a83ac5fac63
--
2.45.0



2024-05-22 21:04:03

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 1/6] mm: update _mapcount and page_type documentation

Let's make it clearer that _mapcount must no longer be used for own
purposes, and how _mapcount and page_type behaves nowadays (also in the
context of hugetlb folios, which are typed folios that will be mapped
to user space).

Move the documentation regarding "-1" over from page_mapcount_reset(),
which we will remove next.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 5 -----
include/linux/mm_types.h | 24 +++++++++++++++---------
2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9849dfda44d4..018e7c0265ca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
return atomic_read(&folio->_entire_mapcount) + 1;
}

-/*
- * The atomic page->_mapcount, starts from -1: so that transitions
- * both from it and to it can be tracked, using atomic_inc_and_test
- * and atomic_add_negative(-1).
- */
static inline void page_mapcount_reset(struct page *page)
{
atomic_set(&(page)->_mapcount, -1);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 24323c7d0bd4..532a3030405d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -46,9 +46,7 @@ struct mem_cgroup;
* which is guaranteed to be aligned. If you use the same storage as
* page->mapping, you must restore it to NULL before freeing the page.
*
- * If your page will not be mapped to userspace, you can also use the four
- * bytes in the mapcount union, but you must call page_mapcount_reset()
- * before freeing it.
+ * The mapcount field must not be used for own purposes.
*
* If you want to use the refcount field, it must be used in such a way
* that other CPUs temporarily incrementing and then decrementing the
@@ -152,16 +150,24 @@ struct page {

union { /* This union is 4 bytes in size. */
/*
- * If the page can be mapped to userspace, encodes the number
- * of times this page is referenced by a page table.
+ * For pages part of non-typed folios for which mappings are
+ * tracked via the RMAP, encodes the number of times this page
+ * is directly referenced by a page table.
+ *
+ * Note that the mapcount is always initialized to -1, so that
+ * transitions both from it and to it can be tracked, using
+ * atomic_inc_and_test() and atomic_add_negative(-1).
*/
atomic_t _mapcount;

/*
- * If the page is neither PageSlab nor mappable to userspace,
- * the value stored here may help determine what this page
- * is used for. See page-flags.h for a list of page types
- * which are currently stored here.
+ * For head pages of typed folios, the value stored here
+ * allows for determining what this page is used for. The
+ * tail pages of typed folios will not store a type
+ * (mapcount == -1).
+ *
+ * See page-flags.h for a list of page types which are currently
+ * stored here.
*/
unsigned int page_type;
};
--
2.45.0


2024-05-22 21:04:26

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 3/6] mm/zsmalloc: use a proper page type

Let's clean it up: use a proper page type and store our data (offset
into a page) in the lower 16 bit as documented.

We'll have to restrict ourselves to <= 64KB base page size (so the offset
fits into 16 bit), which sounds reasonable. Unfortunately, we don't have
any space to store it elsewhere for now.

Based on this, we should do a proper "struct zsdesc" conversion, as
proposed in [1].

This removes the last _mapcount/page_type offender.

[1] https://lore.kernel.org/all/[email protected]/

Cc: Hyeonggon Yoo <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/page-flags.h | 3 +++
mm/Kconfig | 1 +
mm/zsmalloc.c | 23 +++++++++++++++++++----
3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index ed9ac4b5233d..ccaf16656de9 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -959,6 +959,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
#define PG_guard 0x00080000
#define PG_hugetlb 0x00100800
#define PG_slab 0x00200000
+#define PG_zsmalloc 0x00400000

#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
@@ -1073,6 +1074,8 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
FOLIO_TEST_FLAG_FALSE(hugetlb)
#endif

+PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
+
/**
* PageHuge - Determine if the page belongs to hugetlbfs
* @page: The page to test.
diff --git a/mm/Kconfig b/mm/Kconfig
index b4cb45255a54..0371d79b1b75 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -190,6 +190,7 @@ config ZSMALLOC
tristate
prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
depends on MMU
+ depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64KB
help
zsmalloc is a slab-based memory allocator designed to store
pages of various compression levels efficiently. It achieves
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b42d3545ca85..6f0032e06242 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -20,7 +20,8 @@
* page->index: links together all component pages of a zspage
* For the huge page, this is always 0, so we use this field
* to store handle.
- * page->page_type: first object offset in a subpage of zspage
+ * page->page_type: PG_zsmalloc, lower 16 bit locate the first object
+ * offset in a subpage of a zspage
*
* Usage of struct page flags:
* PG_private: identifies the first component page
@@ -450,14 +451,22 @@ static inline struct page *get_first_page(struct zspage *zspage)
return first_page;
}

+static inline void reset_first_obj_offset(struct page *page)
+{
+ page->page_type |= 0xffff;
+}
+
static inline unsigned int get_first_obj_offset(struct page *page)
{
- return page->page_type;
+ return page->page_type & 0xffff;
}

static inline void set_first_obj_offset(struct page *page, unsigned int offset)
{
- page->page_type = offset;
+ BUILD_BUG_ON(PAGE_SIZE & ~0xffff);
+ VM_WARN_ON_ONCE(offset & ~0xffff);
+ page->page_type &= ~0xffff;
+ page->page_type |= offset & 0xffff;
}

static inline unsigned int get_freeobj(struct zspage *zspage)
@@ -791,8 +800,9 @@ static void reset_page(struct page *page)
__ClearPageMovable(page);
ClearPagePrivate(page);
set_page_private(page, 0);
- page_mapcount_reset(page);
page->index = 0;
+ reset_first_obj_offset(page);
+ __ClearPageZsmalloc(page);
}

static int trylock_zspage(struct zspage *zspage)
@@ -965,11 +975,13 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
if (!page) {
while (--i >= 0) {
dec_zone_page_state(pages[i], NR_ZSPAGES);
+ __ClearPageZsmalloc(pages[i]);
__free_page(pages[i]);
}
cache_free_zspage(pool, zspage);
return NULL;
}
+ __SetPageZsmalloc(page);

inc_zone_page_state(page, NR_ZSPAGES);
pages[i] = page;
@@ -1762,6 +1774,9 @@ static int zs_page_migrate(struct page *newpage, struct page *page,

VM_BUG_ON_PAGE(!PageIsolated(page), page);

+ /* We're committed, tell the world that this is a Zsmalloc page. */
+ __SetPageZsmalloc(newpage);
+
/* The page is locked, so this pointer must remain valid */
zspage = get_zspage(page);
pool = zspage->pool;
--
2.45.0


2024-05-22 21:04:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages

Let's stop using page_mapcount_reset() and clear PageBuddy using
__ClearPageBuddy() instead.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/page_alloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2e22ce5675ca..b595342e73c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -498,7 +498,8 @@ static void bad_page(struct page *page, const char *reason)
dump_stack();
out:
/* Leave bad fields for debug, except PageBuddy could make trouble */
- page_mapcount_reset(page); /* remove PageBuddy */
+ if (PageBuddy(page))
+ __ClearPageBuddy(page);
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
}

@@ -1346,7 +1347,8 @@ static void check_new_page_bad(struct page *page)
{
if (unlikely(page->flags & __PG_HWPOISON)) {
/* Don't complain about hwpoisoned pages */
- page_mapcount_reset(page); /* remove PageBuddy */
+ if (PageBuddy(page))
+ __ClearPageBuddy(page);
return;
}

--
2.45.0


2024-05-22 21:04:44

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 5/6] mm/filemap: reinitialize folio->_mapcount directly

Let's get rid of the page_mapcount_reset() call and simply reinitialize
folio->_mapcount directly.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/filemap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 382c3d06bfb1..c4ac7289e88a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -177,7 +177,7 @@ static void filemap_unaccount_folio(struct address_space *mapping,
* and we'd rather not leak it: if we're wrong,
* another bad page check should catch it later.
*/
- page_mapcount_reset(&folio->page);
+ atomic_set(&folio->_mapcount, -1);
folio_ref_sub(folio, mapcount);
}
}
--
2.45.0


2024-05-22 21:05:34

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 6/6] mm/mm_init: initialize page->_mapcount directly in__init_single_page()

Let's simply reinitialize the page->_mapcount directly. We can now
get rid of page_mapcount_reset().

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 5 -----
include/linux/mm_types.h | 2 +-
mm/mm_init.c | 2 +-
3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 018e7c0265ca..3e1d3b0d545e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
return atomic_read(&folio->_entire_mapcount) + 1;
}

-static inline void page_mapcount_reset(struct page *page)
-{
- atomic_set(&(page)->_mapcount, -1);
-}
-
/**
* page_mapcount() - Number of times this precise page is mapped.
* @page: The page.
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6dae8e15037b..c377c040f87d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -169,7 +169,7 @@ struct page {
* See page-flags.h for a list of page types which are currently
* stored here.
*
- * Owners of typed folios may reuse the lower 16bit of the
+ * Owners of typed folios may reuse the lower 16 bit of the
* head page page_type field after setting the page type,
* but must reset these 16 bit to -1 before clearing the
* page type.
diff --git a/mm/mm_init.c b/mm/mm_init.c
index f72b852bd5b8..b4916751edce 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -568,7 +568,7 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
mm_zero_struct_page(page);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
- page_mapcount_reset(page);
+ atomic_set(&page->_mapcount, -1);
page_cpupid_reset_last(page);
page_kasan_tag_reset(page);

--
2.45.0


2024-05-22 21:05:52

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 2/6] mm: allow reuse of the lower 16bit of the page type with an actual type

As long as the owner sets a page type first, we can allow reuse of the
lower 16bit! Restrict it to the head page.

We'll use that for zsmalloc next, to set a proper type while still
reusing that field to store information that cannot go elsewhere for
now.

Fear of running out of bits for storing the actual type? Actually, we
don't need one bit per type, we could store a single value instead.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm_types.h | 5 +++++
include/linux/page-flags.h | 22 ++++++++++++++--------
2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 532a3030405d..6dae8e15037b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -168,6 +168,11 @@ struct page {
*
* See page-flags.h for a list of page types which are currently
* stored here.
+ *
+ * Owners of typed folios may reuse the lower 16bit of the
+ * head page page_type field after setting the page type,
+ * but must reset these 16 bit to -1 before clearing the
+ * page type.
*/
unsigned int page_type;
};
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 104078afe0b1..ed9ac4b5233d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -942,17 +942,23 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
* __ClearPageFoo *sets* the bit used for PageFoo. We reserve a few high and
* low bits so that an underflow or overflow of _mapcount won't be
* mistaken for a page type value.
+ *
+ * The highest bit must always be 1, to make page_has_type() work as expected.
*/

#define PAGE_TYPE_BASE 0xf0000000
-/* Reserve 0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE -128
-#define PG_buddy 0x00000080
-#define PG_offline 0x00000100
-#define PG_table 0x00000200
-#define PG_guard 0x00000400
-#define PG_hugetlb 0x00000800
-#define PG_slab 0x00001000
+/*
+ * Reserve 0x0000ffff to catch underflows of _mapcount and
+ * allow owners that set a type to reuse the lower 16 bit for their own
+ * purposes.
+ */
+#define PAGE_MAPCOUNT_RESERVE -65536
+#define PG_buddy 0x00010080
+#define PG_offline 0x00020000
+#define PG_table 0x00040000
+#define PG_guard 0x00080000
+#define PG_hugetlb 0x00100800
+#define PG_slab 0x00200000

#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
--
2.45.0


2024-05-23 14:56:00

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 3/6] mm/zsmalloc: use a proper page type

On 22.05.24 23:03, David Hildenbrand wrote:
> Let's clean it up: use a proper page type and store our data (offset
> into a page) in the lower 16 bit as documented.
>
> We'll have to restrict ourselves to <= 64KB base page size (so the offset
> fits into 16 bit), which sounds reasonable. Unfortunately, we don't have
> any space to store it elsewhere for now.
>
> Based on this, we should do a proper "struct zsdesc" conversion, as
> proposed in [1].
>
> This removes the last _mapcount/page_type offender.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Cc: Hyeonggon Yoo <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> include/linux/page-flags.h | 3 +++
> mm/Kconfig | 1 +
> mm/zsmalloc.c | 23 +++++++++++++++++++----
> 3 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index ed9ac4b5233d..ccaf16656de9 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -959,6 +959,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
> #define PG_guard 0x00080000
> #define PG_hugetlb 0x00100800
> #define PG_slab 0x00200000
> +#define PG_zsmalloc 0x00400000
>
> #define PageType(page, flag) \
> ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> @@ -1073,6 +1074,8 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
> FOLIO_TEST_FLAG_FALSE(hugetlb)
> #endif
>
> +PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
> +
> /**
> * PageHuge - Determine if the page belongs to hugetlbfs
> * @page: The page to test.
> diff --git a/mm/Kconfig b/mm/Kconfig
> index b4cb45255a54..0371d79b1b75 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -190,6 +190,7 @@ config ZSMALLOC
> tristate
> prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
> depends on MMU
> + depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64KB
> help
> zsmalloc is a slab-based memory allocator designed to store
> pages of various compression levels efficiently. It achieves
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index b42d3545ca85..6f0032e06242 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -20,7 +20,8 @@
> * page->index: links together all component pages of a zspage
> * For the huge page, this is always 0, so we use this field
> * to store handle.
> - * page->page_type: first object offset in a subpage of zspage
> + * page->page_type: PG_zsmalloc, lower 16 bit locate the first object
> + * offset in a subpage of a zspage
> *
> * Usage of struct page flags:
> * PG_private: identifies the first component page
> @@ -450,14 +451,22 @@ static inline struct page *get_first_page(struct zspage *zspage)
> return first_page;
> }
>
> +static inline void reset_first_obj_offset(struct page *page)
> +{
> + page->page_type |= 0xffff;
> +}
> +
> static inline unsigned int get_first_obj_offset(struct page *page)
> {
> - return page->page_type;
> + return page->page_type & 0xffff;
> }
>
> static inline void set_first_obj_offset(struct page *page, unsigned int offset)
> {
> - page->page_type = offset;
> + BUILD_BUG_ON(PAGE_SIZE & ~0xffff);

Buildbot is correctly complaining with PAGE_SIZE=64KiB.

We must check BUILD_BUG_ON((PAGE_SIZE -1) & ~0xffff);

--
Cheers,

David / dhildenb


2024-05-23 20:38:56

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 3/6] mm/zsmalloc: use a proper page type

On 23.05.24 16:55, David Hildenbrand wrote:
> On 22.05.24 23:03, David Hildenbrand wrote:
>> Let's clean it up: use a proper page type and store our data (offset
>> into a page) in the lower 16 bit as documented.
>>
>> We'll have to restrict ourselves to <= 64KB base page size (so the offset
>> fits into 16 bit), which sounds reasonable. Unfortunately, we don't have
>> any space to store it elsewhere for now.
>>
>> Based on this, we should do a proper "struct zsdesc" conversion, as
>> proposed in [1].
>>
>> This removes the last _mapcount/page_type offender.
>>
>> [1] https://lore.kernel.org/all/[email protected]/
>>
>> Cc: Hyeonggon Yoo <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> include/linux/page-flags.h | 3 +++
>> mm/Kconfig | 1 +
>> mm/zsmalloc.c | 23 +++++++++++++++++++----
>> 3 files changed, 23 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index ed9ac4b5233d..ccaf16656de9 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -959,6 +959,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
>> #define PG_guard 0x00080000
>> #define PG_hugetlb 0x00100800
>> #define PG_slab 0x00200000
>> +#define PG_zsmalloc 0x00400000
>>
>> #define PageType(page, flag) \
>> ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>> @@ -1073,6 +1074,8 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
>> FOLIO_TEST_FLAG_FALSE(hugetlb)
>> #endif
>>
>> +PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
>> +
>> /**
>> * PageHuge - Determine if the page belongs to hugetlbfs
>> * @page: The page to test.
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index b4cb45255a54..0371d79b1b75 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -190,6 +190,7 @@ config ZSMALLOC
>> tristate
>> prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
>> depends on MMU
>> + depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64KB
>> help
>> zsmalloc is a slab-based memory allocator designed to store
>> pages of various compression levels efficiently. It achieves
>> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
>> index b42d3545ca85..6f0032e06242 100644
>> --- a/mm/zsmalloc.c
>> +++ b/mm/zsmalloc.c
>> @@ -20,7 +20,8 @@
>> * page->index: links together all component pages of a zspage
>> * For the huge page, this is always 0, so we use this field
>> * to store handle.
>> - * page->page_type: first object offset in a subpage of zspage
>> + * page->page_type: PG_zsmalloc, lower 16 bit locate the first object
>> + * offset in a subpage of a zspage
>> *
>> * Usage of struct page flags:
>> * PG_private: identifies the first component page
>> @@ -450,14 +451,22 @@ static inline struct page *get_first_page(struct zspage *zspage)
>> return first_page;
>> }
>>
>> +static inline void reset_first_obj_offset(struct page *page)
>> +{
>> + page->page_type |= 0xffff;
>> +}
>> +
>> static inline unsigned int get_first_obj_offset(struct page *page)
>> {
>> - return page->page_type;
>> + return page->page_type & 0xffff;
>> }
>>
>> static inline void set_first_obj_offset(struct page *page, unsigned int offset)
>> {
>> - page->page_type = offset;
>> + BUILD_BUG_ON(PAGE_SIZE & ~0xffff);
>
> Buildbot is correctly complaining with PAGE_SIZE=64KiB.
>
> We must check BUILD_BUG_ON((PAGE_SIZE -1) & ~0xffff);

.. and I'll have to resolve the "select ZSMALLOC" situation. :)

--
Cheers,

David / dhildenb


2024-05-24 08:54:20

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH RFC 3/6] mm/zsmalloc: use a proper page type

On Thu, May 23, 2024 at 04:55:44PM +0200, David Hildenbrand wrote:
> On 22.05.24 23:03, David Hildenbrand wrote:
> > Let's clean it up: use a proper page type and store our data (offset
> > into a page) in the lower 16 bit as documented.
> >
> > We'll have to restrict ourselves to <= 64KB base page size (so the offset
> > fits into 16 bit), which sounds reasonable. Unfortunately, we don't have
> > any space to store it elsewhere for now.
> >
> > Based on this, we should do a proper "struct zsdesc" conversion, as
> > proposed in [1].
> >
> > This removes the last _mapcount/page_type offender.
> >
> > [1] https://lore.kernel.org/all/[email protected]/
> >
> > Cc: Hyeonggon Yoo <[email protected]>
> > Signed-off-by: David Hildenbrand <[email protected]>
> > ---
> > include/linux/page-flags.h | 3 +++
> > mm/Kconfig | 1 +
> > mm/zsmalloc.c | 23 +++++++++++++++++++----
> > 3 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index ed9ac4b5233d..ccaf16656de9 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -959,6 +959,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
> > #define PG_guard 0x00080000
> > #define PG_hugetlb 0x00100800
> > #define PG_slab 0x00200000
> > +#define PG_zsmalloc 0x00400000
> > #define PageType(page, flag) \
> > ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> > @@ -1073,6 +1074,8 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
> > FOLIO_TEST_FLAG_FALSE(hugetlb)
> > #endif
> > +PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
> > +
> > /**
> > * PageHuge - Determine if the page belongs to hugetlbfs
> > * @page: The page to test.
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index b4cb45255a54..0371d79b1b75 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -190,6 +190,7 @@ config ZSMALLOC
> > tristate
> > prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
> > depends on MMU
> > + depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64KB
> > help
> > zsmalloc is a slab-based memory allocator designed to store
> > pages of various compression levels efficiently. It achieves
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index b42d3545ca85..6f0032e06242 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -20,7 +20,8 @@
> > * page->index: links together all component pages of a zspage
> > * For the huge page, this is always 0, so we use this field
> > * to store handle.
> > - * page->page_type: first object offset in a subpage of zspage
> > + * page->page_type: PG_zsmalloc, lower 16 bit locate the first object
> > + * offset in a subpage of a zspage
> > *
> > * Usage of struct page flags:
> > * PG_private: identifies the first component page
> > @@ -450,14 +451,22 @@ static inline struct page *get_first_page(struct zspage *zspage)
> > return first_page;
> > }
> > +static inline void reset_first_obj_offset(struct page *page)
> > +{
> > + page->page_type |= 0xffff;
> > +}
> > +
> > static inline unsigned int get_first_obj_offset(struct page *page)
> > {
> > - return page->page_type;
> > + return page->page_type & 0xffff;
> > }
> > static inline void set_first_obj_offset(struct page *page, unsigned int offset)
> > {
> > - page->page_type = offset;
> > + BUILD_BUG_ON(PAGE_SIZE & ~0xffff);
>
> Buildbot is correctly complaining with PAGE_SIZE=64KiB.
>
> We must check BUILD_BUG_ON((PAGE_SIZE -1) & ~0xffff);

Won't

BUILD_BUG_ON(PAGE_SIZE > SZ_64K)

be clearer?

> --
> Cheers,
>
> David / dhildenb
>
>

--
Sincerely yours,
Mike.

2024-05-24 09:00:41

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] mm: update _mapcount and page_type documentation

On Wed, May 22, 2024 at 11:03:36PM +0200, David Hildenbrand wrote:
> Let's make it clearer that _mapcount must no longer be used for own
> purposes, and how _mapcount and page_type behaves nowadays (also in the
> context of hugetlb folios, which are typed folios that will be mapped
> to user space).
>
> Move the documentation regarding "-1" over from page_mapcount_reset(),
> which we will remove next.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> include/linux/mm.h | 5 -----
> include/linux/mm_types.h | 24 +++++++++++++++---------
> 2 files changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9849dfda44d4..018e7c0265ca 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
> return atomic_read(&folio->_entire_mapcount) + 1;
> }
>
> -/*
> - * The atomic page->_mapcount, starts from -1: so that transitions
> - * both from it and to it can be tracked, using atomic_inc_and_test
> - * and atomic_add_negative(-1).
> - */
> static inline void page_mapcount_reset(struct page *page)
> {
> atomic_set(&(page)->_mapcount, -1);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 24323c7d0bd4..532a3030405d 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -46,9 +46,7 @@ struct mem_cgroup;
> * which is guaranteed to be aligned. If you use the same storage as
> * page->mapping, you must restore it to NULL before freeing the page.
> *
> - * If your page will not be mapped to userspace, you can also use the four
> - * bytes in the mapcount union, but you must call page_mapcount_reset()
> - * before freeing it.
> + * The mapcount field must not be used for own purposes.
> *
> * If you want to use the refcount field, it must be used in such a way
> * that other CPUs temporarily incrementing and then decrementing the
> @@ -152,16 +150,24 @@ struct page {
>
> union { /* This union is 4 bytes in size. */
> /*
> - * If the page can be mapped to userspace, encodes the number
> - * of times this page is referenced by a page table.
> + * For pages part of non-typed folios for which mappings are

Maybe

For pages that are part ...

> + * tracked via the RMAP, encodes the number of times this page
> + * is directly referenced by a page table.
> + *
> + * Note that the mapcount is always initialized to -1, so that
> + * transitions both from it and to it can be tracked, using
> + * atomic_inc_and_test() and atomic_add_negative(-1).
> */
> atomic_t _mapcount;
>
> /*
> - * If the page is neither PageSlab nor mappable to userspace,
> - * the value stored here may help determine what this page
> - * is used for. See page-flags.h for a list of page types
> - * which are currently stored here.
> + * For head pages of typed folios, the value stored here
> + * allows for determining what this page is used for. The
> + * tail pages of typed folios will not store a type
> + * (mapcount == -1).
> + *
> + * See page-flags.h for a list of page types which are currently
> + * stored here.
> */
> unsigned int page_type;

and maybe move page_type before _mapcount, so that it will be a bit clearer
what are "typed" pages and folios are.

> };
> --
> 2.45.0
>

--
Sincerely yours,
Mike.

2024-05-24 10:05:47

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 2/6] mm: allow reuse of the lower 16bit of the page type with an actual type

On 22.05.24 23:03, David Hildenbrand wrote:
> As long as the owner sets a page type first, we can allow reuse of the
> lower 16bit! Restrict it to the head page.
>
> We'll use that for zsmalloc next, to set a proper type while still
> reusing that field to store information that cannot go elsewhere for
> now.
>
> Fear of running out of bits for storing the actual type? Actually, we
> don't need one bit per type, we could store a single value instead.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---

Likely the following might be better if we want to go down that path:



From 39f198589039451d94166f3893dc939a919f74c7 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <[email protected]>
Date: Wed, 22 May 2024 21:57:43 +0200
Subject: [PATCH] mm: allow reuse of the lower 18 bit of the page type with an
actual type

As long as the owner sets a page type first, we can allow reuse of the
lower 18 bit: sufficient to store an offset into a 256 KiB page, which
is the maximum base page size we support. Restrict it to the head page.

We'll use that for zsmalloc next, to set a proper type while still
reusing that field to store information (offset into a base page) that
cannot go elsewhere for now.

Fear of running out of bits for storing the actual type? Actually, we
don't need one bit per type, we could store a single value instead.
Further, we could likely limit PAGE_TYPE_BASE to a single (highest) bit.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm_types.h | 5 +++++
include/linux/page-flags.h | 20 ++++++++++++--------
2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 532a3030405d..437a62bed277 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -168,6 +168,11 @@ struct page {
*
* See page-flags.h for a list of page types which are currently
* stored here.
+ *
+ * Owners of typed folios may reuse the lower 18 bit of the
+ * head page page_type field after setting the page type,
+ * but must reset these 18 bit to -1 before clearing the
+ * page type.
*/
unsigned int page_type;
};
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 104078afe0b1..2f49c8b2f411 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -945,14 +945,18 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
*/

#define PAGE_TYPE_BASE 0xf0000000
-/* Reserve 0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE -128
-#define PG_buddy 0x00000080
-#define PG_offline 0x00000100
-#define PG_table 0x00000200
-#define PG_guard 0x00000400
-#define PG_hugetlb 0x00000800
-#define PG_slab 0x00001000
+/*
+ * Reserve 0x0003ffff to catch underflows of _mapcount and
+ * allow owners that set a type to reuse the lower 18 bit for their own
+ * purposes.
+ */
+#define PAGE_MAPCOUNT_RESERVE -262144
+#define PG_buddy 0x00040000
+#define PG_offline 0x00080000
+#define PG_table 0x00100000
+#define PG_guard 0x00200000
+#define PG_hugetlb 0x00400800
+#define PG_slab 0x00800000

#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
--
2.45.0


--
Cheers,

David / dhildenb