2024-05-29 11:19:21

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset()

Wanting to remove the remaining abuser of _mapcount/page_type along with
page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
converted away from "struct page" [1].

Unfortunately, we cannot stop using the page_type field in zsmalloc code
completely for its own purposes. All other fields in "struct page" are
used one way or the other. Could we simply store a 2-byte offset value
at the beginning of each page? Likely, but that will require a bit more
work; and once we have memdesc we might want to move the offset in there
(struct zsalloc?) again.

.. but we can limit the abuse to 16 bit, glue it to a page type that
must be set, and document it. page_has_type() will always successfully
indicate such zsmalloc pages, and such zsmalloc pages only.

We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
We could use more bits from the page type, but 16 bit sounds like a good
idea for now.

So clarify the _mapcount/page_type documentation, use a proper page_type
for zsmalloc, and remove page_mapcount_reset().

Briefly tested with zram on x86-64.

[1] https://lore.kernel.org/all/[email protected]/

Cc: Andrew Morton <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>

v1 -> v2:
* Rebased to mm/mm-unstable
* "mm: update _mapcount and page_type documentation"
-> Minor comment change
* "mm: allow reuse of the lower 16 bit of the page type with an actual
type"
-> Fixup 18 vs 16 in description
-> Reduce PAGE_TYPE_BASE to a single bit and hand-out bits from highest
to lowest
-> Adjust description

RFC -> v1:
* Rebased to v6.10-rc1
* "mm: update _mapcount and page_type documentation"
-> Exchange members and fixup doc as suggested by Mike
* "mm: allow reuse of the lower 16bit of the page type with an actual
type"
-> Remove "highest bit" comment, fixup PG_buddy, extend description
* "mm/zsmalloc: use a proper page type"
-> Add and use HAVE_ZSMALLOC to fixup compilcation
-> Fixup BUILD_BUG_ON
-> Add some VM_WARN_ON_ONCE(!PageZsmalloc(page));
* "mm/mm_init: initialize page->_mapcount directly
in __init_single_page()"
-> Fixup patch subject

David Hildenbrand (6):
mm: update _mapcount and page_type documentation
mm: allow reuse of the lower 16 bit of the page type with an actual
type
mm/zsmalloc: use a proper page type
mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages
mm/filemap: reinitialize folio->_mapcount directly
mm/mm_init: initialize page->_mapcount directly in
__init_single_page()

drivers/block/zram/Kconfig | 1 +
include/linux/mm.h | 10 ----------
include/linux/mm_types.h | 33 ++++++++++++++++++++++-----------
include/linux/page-flags.h | 25 ++++++++++++++++---------
mm/Kconfig | 10 ++++++++--
mm/filemap.c | 2 +-
mm/mm_init.c | 2 +-
mm/page_alloc.c | 6 ++++--
mm/zsmalloc.c | 29 +++++++++++++++++++++++++----
9 files changed, 78 insertions(+), 40 deletions(-)

--
2.45.1



2024-05-29 11:19:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 1/6] mm: update _mapcount and page_type documentation

Let's make it clearer that _mapcount must no longer be used for own
purposes, and how _mapcount and page_type behaves nowadays (also in the
context of hugetlb folios, which are typed folios that will be mapped
to user space).

Move the documentation regarding "-1" over from page_mapcount_reset(),
which we will remove next. Move "page_type" before "mapcount", to make
it clearer what typed folios are.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 5 -----
include/linux/mm_types.h | 28 +++++++++++++++++-----------
2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3aa1b6889bccf..eebfce8f58bca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
return atomic_read(&folio->_entire_mapcount) + 1;
}

-/*
- * The atomic page->_mapcount, starts from -1: so that transitions
- * both from it and to it can be tracked, using atomic_inc_and_test
- * and atomic_add_negative(-1).
- */
static inline void page_mapcount_reset(struct page *page)
{
atomic_set(&(page)->_mapcount, -1);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 24323c7d0bd48..dd2ce1b3ec80e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -46,9 +46,7 @@ struct mem_cgroup;
* which is guaranteed to be aligned. If you use the same storage as
* page->mapping, you must restore it to NULL before freeing the page.
*
- * If your page will not be mapped to userspace, you can also use the four
- * bytes in the mapcount union, but you must call page_mapcount_reset()
- * before freeing it.
+ * The mapcount field must not be used for own purposes.
*
* If you want to use the refcount field, it must be used in such a way
* that other CPUs temporarily incrementing and then decrementing the
@@ -152,18 +150,26 @@ struct page {

union { /* This union is 4 bytes in size. */
/*
- * If the page can be mapped to userspace, encodes the number
- * of times this page is referenced by a page table.
+ * For head pages of typed folios, the value stored here
+ * allows for determining what this page is used for. The
+ * tail pages of typed folios will not store a type
+ * (page_type == _mapcount == -1).
+ *
+ * See page-flags.h for a list of page types which are currently
+ * stored here.
*/
- atomic_t _mapcount;
+ unsigned int page_type;

/*
- * If the page is neither PageSlab nor mappable to userspace,
- * the value stored here may help determine what this page
- * is used for. See page-flags.h for a list of page types
- * which are currently stored here.
+ * For pages that are part of non-typed folios for which mappings
+ * are tracked via the RMAP, encodes the number of times this page
+ * is directly referenced by a page table.
+ *
+ * Note that the mapcount is always initialized to -1, so that
+ * transitions both from it and to it can be tracked, using
+ * atomic_inc_and_test() and atomic_add_negative(-1).
*/
- unsigned int page_type;
+ atomic_t _mapcount;
};

/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
--
2.45.1


2024-05-29 11:19:39

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type

As long as the owner sets a page type first, we can allow reuse of the
lower 16 bit: sufficient to store an offset into a 64 KiB page, which
is the maximum base page size in *common* configurations (ignoring the
256 KiB variant). Restrict it to the head page.

We'll use that for zsmalloc next, to set a proper type while still
reusing that field to store information (offset into a base page) that
cannot go elsewhere for now.

Let's reserve the lower 16 bit for that purpose and for catching
mapcount underflows, and let's reduce PAGE_TYPE_BASE to a single bit.

Note that we will still have to overflow the mapcount quite a lot until
we would actually indicate a valid page type.

Start handing out the type bits from highest to lowest, to make it
clearer how many bits for types we have left. Out of 15 bit we can use
for types, we currently use 6. If we run out of bits before we have
better typing (e.g., memdesc), we can always investigate storing a value
instead [1].

[1] https://lore.kernel.org/all/[email protected]/

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm_types.h | 5 +++++
include/linux/page-flags.h | 22 +++++++++++++---------
2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index dd2ce1b3ec80e..791afaf1b1ec3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -157,6 +157,11 @@ struct page {
*
* See page-flags.h for a list of page types which are currently
* stored here.
+ *
+ * Owners of typed folios may reuse the lower 16 bit of the
+ * head page page_type field after setting the page type,
+ * but must reset these 16 bit to -1 before clearing the
+ * page type.
*/
unsigned int page_type;

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d1bdbaaccc964..f060db808102c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -945,15 +945,19 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
* mistaken for a page type value.
*/

-#define PAGE_TYPE_BASE 0xf0000000
-/* Reserve 0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE -128
-#define PG_buddy 0x00000080
-#define PG_offline 0x00000100
-#define PG_table 0x00000200
-#define PG_guard 0x00000400
-#define PG_hugetlb 0x00000800
-#define PG_slab 0x00001000
+#define PAGE_TYPE_BASE 0x80000000
+/*
+ * Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflows and
+ * allow owners that set a type to reuse the lower 16 bit for their own
+ * purposes.
+ */
+#define PG_buddy 0x40000000
+#define PG_offline 0x20000000
+#define PG_table 0x10000000
+#define PG_guard 0x08000000
+#define PG_hugetlb 0x04008000
+#define PG_slab 0x02000000
+#define PAGE_MAPCOUNT_RESERVE (~0x0000ffff)

#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
--
2.45.1


2024-05-29 11:19:58

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 3/6] mm/zsmalloc: use a proper page type

Let's clean it up: use a proper page type and store our data (offset
into a page) in the lower 16 bit as documented.

We won't be able to support 256 KiB base pages, which is acceptable.
Teach Kconfig to handle that cleanly using a new CONFIG_HAVE_ZSMALLOC.

Based on this, we should do a proper "struct zsdesc" conversion, as
proposed in [1].

This removes the last _mapcount/page_type offender.

[1] https://lore.kernel.org/all/[email protected]/

Cc: Hyeonggon Yoo <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/block/zram/Kconfig | 1 +
include/linux/page-flags.h | 3 +++
mm/Kconfig | 10 ++++++++--
mm/zsmalloc.c | 29 +++++++++++++++++++++++++----
4 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 6aea609b795c2..40e035468de22 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -2,6 +2,7 @@
config ZRAM
tristate "Compressed RAM block device support"
depends on BLOCK && SYSFS && MMU
+ depends on HAVE_ZSMALLOC
select ZSMALLOC
help
Creates virtual block devices called /dev/zramX (X = 0, 1, ...).
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f060db808102c..3afcbfbb379ea 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -957,6 +957,7 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
#define PG_guard 0x08000000
#define PG_hugetlb 0x04008000
#define PG_slab 0x02000000
+#define PG_zsmalloc 0x01000000
#define PAGE_MAPCOUNT_RESERVE (~0x0000ffff)

#define PageType(page, flag) \
@@ -1072,6 +1073,8 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
FOLIO_TEST_FLAG_FALSE(hugetlb)
#endif

+PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
+
/**
* PageHuge - Determine if the page belongs to hugetlbfs
* @page: The page to test.
diff --git a/mm/Kconfig b/mm/Kconfig
index b4cb45255a541..67dc18c94448d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -128,7 +128,7 @@ config ZSWAP_COMPRESSOR_DEFAULT
choice
prompt "Default allocator"
depends on ZSWAP
- default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if MMU
+ default ZSWAP_ZPOOL_DEFAULT_ZSMALLOC if HAVE_ZSMALLOC
default ZSWAP_ZPOOL_DEFAULT_ZBUD
help
Selects the default allocator for the compressed cache for
@@ -154,6 +154,7 @@ config ZSWAP_ZPOOL_DEFAULT_Z3FOLD

config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
bool "zsmalloc"
+ depends on HAVE_ZSMALLOC
select ZSMALLOC
help
Use the zsmalloc allocator as the default allocator.
@@ -186,10 +187,15 @@ config Z3FOLD
page. It is a ZBUD derivative so the simplicity and determinism are
still there.

+config HAVE_ZSMALLOC
+ def_bool y
+ depends on MMU
+ depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
+
config ZSMALLOC
tristate
prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
- depends on MMU
+ depends on HAVE_ZSMALLOC
help
zsmalloc is a slab-based memory allocator designed to store
pages of various compression levels efficiently. It achieves
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a2a5866473bb8..44e0171d60036 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -20,7 +20,8 @@
* page->index: links together all component pages of a zspage
* For the huge page, this is always 0, so we use this field
* to store handle.
- * page->page_type: first object offset in a subpage of zspage
+ * page->page_type: PG_zsmalloc, lower 16 bit locate the first object
+ * offset in a subpage of a zspage
*
* Usage of struct page flags:
* PG_private: identifies the first component page
@@ -450,14 +451,28 @@ static inline struct page *get_first_page(struct zspage *zspage)
return first_page;
}

+#define FIRST_OBJ_PAGE_TYPE_MASK 0xffff
+
+static inline void reset_first_obj_offset(struct page *page)
+{
+ VM_WARN_ON_ONCE(!PageZsmalloc(page));
+ page->page_type |= FIRST_OBJ_PAGE_TYPE_MASK;
+}
+
static inline unsigned int get_first_obj_offset(struct page *page)
{
- return page->page_type;
+ VM_WARN_ON_ONCE(!PageZsmalloc(page));
+ return page->page_type & FIRST_OBJ_PAGE_TYPE_MASK;
}

static inline void set_first_obj_offset(struct page *page, unsigned int offset)
{
- page->page_type = offset;
+ /* With 16 bit available, we can support offsets into 64 KiB pages. */
+ BUILD_BUG_ON(PAGE_SIZE > SZ_64K);
+ VM_WARN_ON_ONCE(!PageZsmalloc(page));
+ VM_WARN_ON_ONCE(offset & ~FIRST_OBJ_PAGE_TYPE_MASK);
+ page->page_type &= ~FIRST_OBJ_PAGE_TYPE_MASK;
+ page->page_type |= offset & FIRST_OBJ_PAGE_TYPE_MASK;
}

static inline unsigned int get_freeobj(struct zspage *zspage)
@@ -791,8 +806,9 @@ static void reset_page(struct page *page)
__ClearPageMovable(page);
ClearPagePrivate(page);
set_page_private(page, 0);
- page_mapcount_reset(page);
page->index = 0;
+ reset_first_obj_offset(page);
+ __ClearPageZsmalloc(page);
}

static int trylock_zspage(struct zspage *zspage)
@@ -965,11 +981,13 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
if (!page) {
while (--i >= 0) {
dec_zone_page_state(pages[i], NR_ZSPAGES);
+ __ClearPageZsmalloc(pages[i]);
__free_page(pages[i]);
}
cache_free_zspage(pool, zspage);
return NULL;
}
+ __SetPageZsmalloc(page);

inc_zone_page_state(page, NR_ZSPAGES);
pages[i] = page;
@@ -1754,6 +1772,9 @@ static int zs_page_migrate(struct page *newpage, struct page *page,

VM_BUG_ON_PAGE(!PageIsolated(page), page);

+ /* We're committed, tell the world that this is a Zsmalloc page. */
+ __SetPageZsmalloc(newpage);
+
/* The page is locked, so this pointer must remain valid */
zspage = get_zspage(page);
pool = zspage->pool;
--
2.45.1


2024-05-29 11:20:06

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 4/6] mm/page_alloc: clear PageBuddy using __ClearPageBuddy() for bad pages

Let's stop using page_mapcount_reset() and clear PageBuddy using
__ClearPageBuddy() instead.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/page_alloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b1e3eb5787de1..591d28b9f3e48 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -498,7 +498,8 @@ static void bad_page(struct page *page, const char *reason)
dump_stack();
out:
/* Leave bad fields for debug, except PageBuddy could make trouble */
- page_mapcount_reset(page); /* remove PageBuddy */
+ if (PageBuddy(page))
+ __ClearPageBuddy(page);
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
}

@@ -1346,7 +1347,8 @@ static void check_new_page_bad(struct page *page)
{
if (unlikely(page->flags & __PG_HWPOISON)) {
/* Don't complain about hwpoisoned pages */
- page_mapcount_reset(page); /* remove PageBuddy */
+ if (PageBuddy(page))
+ __ClearPageBuddy(page);
return;
}

--
2.45.1


2024-05-29 11:20:15

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 5/6] mm/filemap: reinitialize folio->_mapcount directly

Let's get rid of the page_mapcount_reset() call and simply reinitialize
folio->_mapcount directly.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/filemap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index ba06237b942d6..9fe5c02ae92e7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -177,7 +177,7 @@ static void filemap_unaccount_folio(struct address_space *mapping,
* and we'd rather not leak it: if we're wrong,
* another bad page check should catch it later.
*/
- page_mapcount_reset(&folio->page);
+ atomic_set(&folio->_mapcount, -1);
folio_ref_sub(folio, mapcount);
}
}
--
2.45.1


2024-05-29 11:20:33

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 6/6] mm/mm_init: initialize page->_mapcount directly in __init_single_page()

Let's simply reinitialize the page->_mapcount directly. We can now
get rid of page_mapcount_reset().

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 5 -----
mm/mm_init.c | 2 +-
2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index eebfce8f58bca..c41c82bcbec2f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,11 +1206,6 @@ static inline int folio_entire_mapcount(const struct folio *folio)
return atomic_read(&folio->_entire_mapcount) + 1;
}

-static inline void page_mapcount_reset(struct page *page)
-{
- atomic_set(&(page)->_mapcount, -1);
-}
-
/**
* page_mapcount() - Number of times this precise page is mapped.
* @page: The page.
diff --git a/mm/mm_init.c b/mm/mm_init.c
index e0023aa685556..426314eeecec3 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -568,7 +568,7 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
mm_zero_struct_page(page);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
- page_mapcount_reset(page);
+ atomic_set(&page->_mapcount, -1);
page_cpupid_reset_last(page);
page_kasan_tag_reset(page);

--
2.45.1


2024-05-29 16:20:34

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] mm: allow reuse of the lower 16 bit of the page type with an actual type

>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d1bdbaaccc964..f060db808102c 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -945,15 +945,19 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
> * mistaken for a page type value.
> */
>
> -#define PAGE_TYPE_BASE 0xf0000000
> -/* Reserve 0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE -128
> -#define PG_buddy 0x00000080
> -#define PG_offline 0x00000100
> -#define PG_table 0x00000200
> -#define PG_guard 0x00000400
> -#define PG_hugetlb 0x00000800
> -#define PG_slab 0x00001000
> +#define PAGE_TYPE_BASE 0x80000000
> +/*
> + * Reserve 0xffff0000 - 0xfffffffe to catch _mapcount underflows and
> + * allow owners that set a type to reuse the lower 16 bit for their own
> + * purposes.
> + */
> +#define PG_buddy 0x40000000
> +#define PG_offline 0x20000000
> +#define PG_table 0x10000000
> +#define PG_guard 0x08000000
> +#define PG_hugetlb 0x04008000

As Wang Wei points out, the 0 and 8 look too similar on my screen ;)

This should be

#define PG_hugetlb 0x04000000

--
Cheers,

David / dhildenb


2024-05-30 05:02:34

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] mm: page_type, zsmalloc and page_mapcount_reset()

On (24/05/29 13:18), David Hildenbrand wrote:
> Wanting to remove the remaining abuser of _mapcount/page_type along with
> page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
> converted away from "struct page" [1].
>
> Unfortunately, we cannot stop using the page_type field in zsmalloc code
> completely for its own purposes. All other fields in "struct page" are
> used one way or the other. Could we simply store a 2-byte offset value
> at the beginning of each page? Likely, but that will require a bit more
> work; and once we have memdesc we might want to move the offset in there
> (struct zsalloc?) again.
>
> ... but we can limit the abuse to 16 bit, glue it to a page type that
> must be set, and document it. page_has_type() will always successfully
> indicate such zsmalloc pages, and such zsmalloc pages only.
>
> We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
> We could use more bits from the page type, but 16 bit sounds like a good
> idea for now.
>
> So clarify the _mapcount/page_type documentation, use a proper page_type
> for zsmalloc, and remove page_mapcount_reset().
>
> Briefly tested with zram on x86-64.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Cc: Andrew Morton <[email protected]>
> Cc: "Matthew Wilcox (Oracle)" <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Sergey Senozhatsky <[email protected]>
> Cc: Hyeonggon Yoo <[email protected]>

Tested-by: Sergey Senozhatsky <[email protected]> # zram/zsmalloc workloads

2024-05-30 05:02:48

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type

On (24/05/29 13:19), David Hildenbrand wrote:
> We won't be able to support 256 KiB base pages, which is acceptable.
[..]
> +config HAVE_ZSMALLOC
> + def_bool y
> + depends on MMU
> + depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB

Can't really say that I'm happy with this, but if mm-folks are
fine then okay.

FWIW
Reviewed-by: Sergey Senozhatsky <[email protected]>

2024-05-31 15:07:44

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type

On 31.05.24 16:27, Matthew Wilcox wrote:
> On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
>> On (24/05/29 13:19), David Hildenbrand wrote:
>>> We won't be able to support 256 KiB base pages, which is acceptable.
>> [..]
>>> +config HAVE_ZSMALLOC
>>> + def_bool y
>>> + depends on MMU
>>> + depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
>>
>> Can't really say that I'm happy with this, but if mm-folks are
>> fine then okay.
>
> I have an idea ...
>
> We use 6 of the bits in the top byte of the page_type to enumerate
> a type (ie value 0x80-0xbf) and then the remaining 24 bits are
> available. It's actually more efficient:
>
> $ ./scripts/bloat-o-meter prev.o .build-debian/mm/filemap.o
> add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
> Function old new delta
> __filemap_add_folio 1102 1098 -4
> filemap_unaccount_folio 455 446 -9
> replace_page_cache_folio 474 447 -27
> Total: Before=41258, After=41218, chg -0.10%
>
> (that's all from PG_hugetlb)
>
> before:
> 1406: 8b 46 30 mov 0x30(%rsi),%eax
> mapcount = atomic_read(&folio->_mapcount) + 1;
> 1409: 83 c0 01 add $0x1,%eax
> if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
> 140c: 83 f8 81 cmp $0xffffff81,%eax
> 140f: 7d 6c jge 147d <filemap_unaccount_folio+0x8d>
> 1411: 8b 43 30 mov 0x30(%rbx),%eax
> 1414: 25 00 08 00 f0 and $0xf0000800,%eax
> 1419: 3d 00 00 00 f0 cmp $0xf0000000,%eax
> 141e: 74 4e je 146e <filemap_unaccount_folio+0x7e>
>
> after:
> 1406: 8b 46 30 mov 0x30(%rsi),%eax
> mapcount = atomic_read(&folio->_mapcount) + 1;
> 1409: 83 c0 01 add $0x1,%eax
> if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
> 140c: 83 f8 81 cmp $0xffffff81,%eax
> 140f: 7d 63 jge 1474 <filemap_unaccount_folio+0x8
> 4>
> if (folio_test_hugetlb(folio))
> 1411: 80 7b 33 84 cmpb $0x84,0x33(%rbx)
> 1415: 74 4e je 1465 <filemap_unaccount_folio+0x75>
>
> so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
> be faster to execute as well as being more compact in the I$ (6 bytes vs 15).
>
> Anyway, not tested but this is the patch I used to generate the above.
> More for comment than application.

Right, it's likely very similar to my previous proposal to use 8 bit
(uint8_t) for the type.

https://lore.kernel.org/all/[email protected]/

I would prefer if we would do that separately; unless someone is able to
raise why we care about zram + 256KiB that much right now. (claim: we don't)

>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5265b3434b9e..4129d04ac812 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -942,24 +942,24 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
> * mistaken for a page type value.
> */
>
> -#define PAGE_TYPE_BASE 0xf0000000
> -/* Reserve 0x0000007f to catch underflows of _mapcount */
> -#define PAGE_MAPCOUNT_RESERVE -128
> -#define PG_buddy 0x00000080
> -#define PG_offline 0x00000100
> -#define PG_table 0x00000200
> -#define PG_guard 0x00000400
> -#define PG_hugetlb 0x00000800
> -#define PG_slab 0x00001000
> -
> -#define PageType(page, flag) \
> - ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> -#define folio_test_type(folio, flag) \
> - ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> +/* Reserve 0x0000007f to catch underflows of _mapcount */
> +#define PAGE_MAPCOUNT_RESERVE -128
> +
> +#define PG_buddy 0x80
> +#define PG_offline 0x81
> +#define PG_table 0x82
> +#define PG_guard 0x83
> +#define PG_hugetlb 0x84
> +#define PG_slab 0x85

Hoping we can stop calling that PG_ ...




--
Cheers,

David / dhildenb


2024-05-31 18:57:56

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] mm/zsmalloc: use a proper page type

On Thu, May 30, 2024 at 02:01:23PM +0900, Sergey Senozhatsky wrote:
> On (24/05/29 13:19), David Hildenbrand wrote:
> > We won't be able to support 256 KiB base pages, which is acceptable.
> [..]
> > +config HAVE_ZSMALLOC
> > + def_bool y
> > + depends on MMU
> > + depends on PAGE_SIZE_LESS_THAN_256KB # we want <= 64 KiB
>
> Can't really say that I'm happy with this, but if mm-folks are
> fine then okay.

I have an idea ...

We use 6 of the bits in the top byte of the page_type to enumerate
a type (ie value 0x80-0xbf) and then the remaining 24 bits are
available. It's actually more efficient:

$ ./scripts/bloat-o-meter prev.o .build-debian/mm/filemap.o
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
Function old new delta
__filemap_add_folio 1102 1098 -4
filemap_unaccount_folio 455 446 -9
replace_page_cache_folio 474 447 -27
Total: Before=41258, After=41218, chg -0.10%

(that's all from PG_hugetlb)

before:
1406: 8b 46 30 mov 0x30(%rsi),%eax
mapcount = atomic_read(&folio->_mapcount) + 1;
1409: 83 c0 01 add $0x1,%eax
if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
140c: 83 f8 81 cmp $0xffffff81,%eax
140f: 7d 6c jge 147d <filemap_unaccount_folio+0x8d>
1411: 8b 43 30 mov 0x30(%rbx),%eax
1414: 25 00 08 00 f0 and $0xf0000800,%eax
1419: 3d 00 00 00 f0 cmp $0xf0000000,%eax
141e: 74 4e je 146e <filemap_unaccount_folio+0x7e>

after:
1406: 8b 46 30 mov 0x30(%rsi),%eax
mapcount = atomic_read(&folio->_mapcount) + 1;
1409: 83 c0 01 add $0x1,%eax
if (mapcount < PAGE_MAPCOUNT_RESERVE + 1)
140c: 83 f8 81 cmp $0xffffff81,%eax
140f: 7d 63 jge 1474 <filemap_unaccount_folio+0x8
4>
if (folio_test_hugetlb(folio))
1411: 80 7b 33 84 cmpb $0x84,0x33(%rbx)
1415: 74 4e je 1465 <filemap_unaccount_folio+0x75>

so we go from "mov, and, cmp, je" to just "cmpb, je", which must surely
be faster to execute as well as being more compact in the I$ (6 bytes vs 15).

Anyway, not tested but this is the patch I used to generate the above.
More for comment than application.

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5265b3434b9e..4129d04ac812 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -942,24 +942,24 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned)
* mistaken for a page type value.
*/

-#define PAGE_TYPE_BASE 0xf0000000
-/* Reserve 0x0000007f to catch underflows of _mapcount */
-#define PAGE_MAPCOUNT_RESERVE -128
-#define PG_buddy 0x00000080
-#define PG_offline 0x00000100
-#define PG_table 0x00000200
-#define PG_guard 0x00000400
-#define PG_hugetlb 0x00000800
-#define PG_slab 0x00001000
-
-#define PageType(page, flag) \
- ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
-#define folio_test_type(folio, flag) \
- ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
+/* Reserve 0x0000007f to catch underflows of _mapcount */
+#define PAGE_MAPCOUNT_RESERVE -128
+
+#define PG_buddy 0x80
+#define PG_offline 0x81
+#define PG_table 0x82
+#define PG_guard 0x83
+#define PG_hugetlb 0x84
+#define PG_slab 0x85
+
+#define PageType(page, type) \
+ (((page)->page_type >> 24) == type)
+#define folio_test_type(folio, type) \
+ (((folio)->page.page_type >> 24) == type)

static inline int page_type_has_type(unsigned int page_type)
{
- return (int)page_type < PAGE_MAPCOUNT_RESERVE;
+ return ((int)page_type < 0) && (page_type < 0xc0000000);
}

static inline int page_has_type(const struct page *page)
@@ -975,12 +975,12 @@ static __always_inline bool folio_test_##fname(const struct folio *folio)\
static __always_inline void __folio_set_##fname(struct folio *folio) \
{ \
VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \
- folio->page.page_type &= ~PG_##lname; \
+ folio->page.page_type = PG_##lname << 24; \
} \
static __always_inline void __folio_clear_##fname(struct folio *folio) \
{ \
VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \
- folio->page.page_type |= PG_##lname; \
+ folio->page.page_type = 0xffffffff; \
}

#define PAGE_TYPE_OPS(uname, lname, fname) \
@@ -992,12 +992,12 @@ static __always_inline int Page##uname(const struct page *page) \
static __always_inline void __SetPage##uname(struct page *page) \
{ \
VM_BUG_ON_PAGE(!PageType(page, 0), page); \
- page->page_type &= ~PG_##lname; \
+ page->page_type = PG_##lname << 24; \
} \
static __always_inline void __ClearPage##uname(struct page *page) \
{ \
VM_BUG_ON_PAGE(!Page##uname(page), page); \
- page->page_type |= PG_##lname; \
+ page->page_type = 0xffffffff; \
}

/*