2022-09-19 13:26:11

by Hyeonggon Yoo

[permalink] [raw]
Subject: [PATCH] mm: move PG_slab flag to page_type

For now, only SLAB uses _mapcount field as a number of active objects in
a slab, and other slab allocators do not use it. As 16 bits are enough
for that, use remaining 16 bits of _mapcount as page_type even when
SLAB is used. And then move PG_slab flag to page_type!

Note that page_type is always placed in upper 16 bits of _mapcount to
avoid confusing normal _mapcount as page_type. As underflow (actually
I mean, yeah, overflow) is not a concern anymore, use more lower bits
except bit zero.

Add more folio helpers for PAGE_TYPE_OPS() not to break existing
slab implementations.

Remove PG_slab check from PAGE_FLAGS_CHECK_AT_FREE. buddy will still
check if _mapcount is properly set at free.

Exclude PG_slab from hwpoison and show_page_flags() for now.

Note that with this patch, page_mapped() and folio_mapped() always return
false for slab page.

Cc: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Marco Elver <[email protected]>
Signed-off-by: Hyeonggon Yoo <[email protected]>
---

I think this gives us two benefits:
- it frees a bit in flags field
- it makes it simpler to distinguish user-mapped pages and
not-user-mapped pages.

Plus I'm writing a bit more of code including:
0) a few cleanup for code that checks
!PageSlab() && page_mapped() or that does similar thing
1) provide human-readale string of page_type in dump_page
2) add show_page_types() for tracepoint
3) fix hwpoison ...etc.

Anyway This is an early RFC, I will very appreciate feedbacks!

include/linux/mm_types.h | 22 +++++++--
include/linux/page-flags.h | 83 ++++++++++++++++++++++++++--------
include/trace/events/mmflags.h | 1 -
mm/memory-failure.c | 8 ----
mm/slab.h | 11 ++++-
5 files changed, 92 insertions(+), 33 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cf97f3884fda..4b217c6fbe1f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -193,12 +193,24 @@ struct page {
atomic_t _mapcount;

/*
- * If the page is neither PageSlab nor mappable to userspace,
- * the value stored here may help determine what this page
- * is used for. See page-flags.h for a list of page types
- * which are currently stored here.
+ * If the page is not mappable to userspace, the value
+ * stored here may help determine what this page is used for.
+ * See page-flags.h for a list of page types which are currently
+ * stored here.
*/
- unsigned int page_type;
+ struct {
+ /*
+ * Always place page_type in
+ * upper 16 bits of _mapcount
+ */
+#ifdef CPU_BIG_ENDIAN
+ __u16 page_type;
+ __u16 active;
+#else
+ __u16 active;
+ __u16 page_type;
+#endif
+ };
};

/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 465ff35a8c00..b414d0996639 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -107,7 +107,6 @@ enum pageflags {
PG_workingset,
PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */
PG_error,
- PG_slab,
PG_owner_priv_1, /* Owner use. If pagecache, fs may use*/
PG_arch_1,
PG_reserved,
@@ -484,7 +483,6 @@ PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
PAGEFLAG(Workingset, workingset, PF_HEAD)
TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
-__PAGEFLAG(Slab, slab, PF_NO_TAIL)
__PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */

@@ -926,44 +924,90 @@ static inline bool is_page_hwpoison(struct page *page)
}

/*
- * For pages that are never mapped to userspace (and aren't PageSlab),
- * page_type may be used. Because it is initialised to -1, we invert the
- * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and
- * __ClearPageFoo *sets* the bit used for PageFoo. We reserve a few high and
- * low bits so that an underflow or overflow of page_mapcount() won't be
- * mistaken for a page type value.
+ * For pages that are never mapped to userspace, page_type may be used.
+ * Because it is initialised to -1, we invert the sense of the bit,
+ * so __SetPageFoo *clears* the bit used for PageFoo, and __ClearPageFoo
+ * *sets* the bit used for PageFoo. We reserve a few high and low bits
+ * so that an underflow or overflow of page_mapcount() won't be mistaken
+ * for a page type value.
*/

-#define PAGE_TYPE_BASE 0xf0000000
-/* Reserve 0x0000007f to catch underflows of page_mapcount */
-#define PAGE_MAPCOUNT_RESERVE -128
-#define PG_buddy 0x00000080
-#define PG_offline 0x00000100
-#define PG_table 0x00000200
-#define PG_guard 0x00000400
+#define PAGE_TYPE_BASE 0xf000
+#define PAGE_MAPCOUNT_RESERVE -1
+#define PG_buddy 0x0002
+#define PG_offline 0x0004
+#define PG_table 0x0008
+#define PG_guard 0x0010
+#define PG_slab 0x0020

#define PageType(page, flag) \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)

-static inline int page_has_type(struct page *page)
+#define PAGE_TYPE_MASK ((1UL << (BITS_PER_BYTE * sizeof(__u16))) - 1)
+
+static inline bool page_has_type(struct page *page)
{
return (int)page->page_type < PAGE_MAPCOUNT_RESERVE;
}

+static inline bool page_type_valid(__u16 page_type)
+{
+ return (page_type & PAGE_TYPE_BASE) == PAGE_TYPE_BASE;
+}
+
#define PAGE_TYPE_OPS(uname, lname) \
static __always_inline int Page##uname(struct page *page) \
{ \
return PageType(page, PG_##lname); \
} \
+static __always_inline int folio_test_##lname(struct folio *folio) \
+{ \
+ struct page *page = &folio->page; \
+ \
+ VM_BUG_ON_PAGE(PageTail(page), page); \
+ return PageType(page, PG_##lname); \
+} \
+static __always_inline int __folio_test_##lname(struct folio *folio) \
+{ \
+ struct page *page = &folio->page; \
+ \
+ return PageType(page, PG_##lname); \
+} \
static __always_inline void __SetPage##uname(struct page *page) \
{ \
VM_BUG_ON_PAGE(!PageType(page, 0), page); \
page->page_type &= ~PG_##lname; \
} \
+static __always_inline void folio_set_##lname(struct folio *folio) \
+{ \
+ struct page *page = &folio->page; \
+ \
+ VM_BUG_ON_PAGE(PageTail(page), page); \
+ __SetPage##uname(page); \
+} \
+static __always_inline void __folio_set_##lname(struct folio *folio) \
+{ \
+ struct page *page = &folio->page; \
+ \
+ __SetPage##uname(page); \
+} \
static __always_inline void __ClearPage##uname(struct page *page) \
{ \
VM_BUG_ON_PAGE(!Page##uname(page), page); \
page->page_type |= PG_##lname; \
+} \
+static __always_inline void folio_clear_##lname(struct folio *folio) \
+{ \
+ struct page *page = &folio->page; \
+ \
+ VM_BUG_ON_PAGE(PageTail(page), page); \
+ __ClearPage##uname(page); \
+} \
+static __always_inline void __folio_clear_##lname(struct folio *folio) \
+{ \
+ struct page *page = &folio->page; \
+ \
+ __ClearPage##uname(page); \
}

/*
@@ -996,6 +1040,9 @@ PAGE_TYPE_OPS(Buddy, buddy)
*/
PAGE_TYPE_OPS(Offline, offline)

+/* PageSlab() indicates that the page is used by slab subsystem. */
+PAGE_TYPE_OPS(Slab, slab)
+
extern void page_offline_freeze(void);
extern void page_offline_thaw(void);
extern void page_offline_begin(void);
@@ -1057,8 +1104,8 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
(1UL << PG_lru | 1UL << PG_locked | \
1UL << PG_private | 1UL << PG_private_2 | \
1UL << PG_writeback | 1UL << PG_reserved | \
- 1UL << PG_slab | 1UL << PG_active | \
- 1UL << PG_unevictable | __PG_MLOCKED)
+ 1UL << PG_active | 1UL << PG_unevictable | \
+ __PG_MLOCKED)

/*
* Flags checked when a page is prepped for return by the page allocator.
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index e87cb2b80ed3..fa5aa9e983ec 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -113,7 +113,6 @@
{1UL << PG_lru, "lru" }, \
{1UL << PG_active, "active" }, \
{1UL << PG_workingset, "workingset" }, \
- {1UL << PG_slab, "slab" }, \
{1UL << PG_owner_priv_1, "owner_priv_1" }, \
{1UL << PG_arch_1, "arch_1" }, \
{1UL << PG_reserved, "reserved" }, \
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 14439806b5ef..9a25d10d7391 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1123,7 +1123,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
#define mlock (1UL << PG_mlocked)
#define lru (1UL << PG_lru)
#define head (1UL << PG_head)
-#define slab (1UL << PG_slab)
#define reserved (1UL << PG_reserved)

static struct page_state error_states[] = {
@@ -1133,13 +1132,6 @@ static struct page_state error_states[] = {
* PG_buddy pages only make a small fraction of all free pages.
*/

- /*
- * Could in theory check if slab page is free or if we can drop
- * currently unused objects without touching them. But just
- * treat it as standard kernel for now.
- */
- { slab, slab, MF_MSG_SLAB, me_kernel },
-
{ head, head, MF_MSG_HUGE, me_huge_page },

{ sc|dirty, sc|dirty, MF_MSG_DIRTY_SWAPCACHE, me_swapcache_dirty },
diff --git a/mm/slab.h b/mm/slab.h
index 985820b9069b..a5273e189265 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -20,7 +20,16 @@ struct slab {
};
struct rcu_head rcu_head;
};
- unsigned int active;
+ struct {
+ /* always place page_type in upper 16 bits of _mapcount */
+#ifdef CPU_BIG_ENDIAN
+ __u16 page_type;
+ __u16 active;
+#else
+ __u16 active;
+ __u16 page_type;
+#endif
+ };

#elif defined(CONFIG_SLUB)

--
2.32.0


2022-09-19 14:06:10

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: move PG_slab flag to page_type

On Mon, Sep 19, 2022 at 09:57:08PM +0900, Hyeonggon Yoo wrote:
> For now, only SLAB uses _mapcount field as a number of active objects in
> a slab, and other slab allocators do not use it. As 16 bits are enough
> for that, use remaining 16 bits of _mapcount as page_type even when
> SLAB is used. And then move PG_slab flag to page_type!
>
> Note that page_type is always placed in upper 16 bits of _mapcount to
> avoid confusing normal _mapcount as page_type. As underflow (actually
> I mean, yeah, overflow) is not a concern anymore, use more lower bits
> except bit zero.
>
> Add more folio helpers for PAGE_TYPE_OPS() not to break existing
> slab implementations.
>
> Remove PG_slab check from PAGE_FLAGS_CHECK_AT_FREE. buddy will still
> check if _mapcount is properly set at free.
>
> Exclude PG_slab from hwpoison and show_page_flags() for now.
>
> Note that with this patch, page_mapped() and folio_mapped() always return
> false for slab page.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Pekka Enberg <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Roman Gushchin <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Miaohe Lin <[email protected]>
> Cc: "Matthew Wilcox (Oracle)" <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Muchun Song <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Andrey Konovalov <[email protected]>
> Cc: Marco Elver <[email protected]>
> Signed-off-by: Hyeonggon Yoo <[email protected]>
> ---
>
> I think this gives us two benefits:
> - it frees a bit in flags field
> - it makes it simpler to distinguish user-mapped pages and
> not-user-mapped pages.
>
> Plus I'm writing a bit more of code including:
> 0) a few cleanup for code that checks
> !PageSlab() && page_mapped() or that does similar thing
> 1) provide human-readale string of page_type in dump_page
> 2) add show_page_types() for tracepoint
> 3) fix hwpoison ...etc.
>
> Anyway This is an early RFC, I will very appreciate feedbacks!

Looks like I forgot to add RFC in subject :)

2022-09-19 14:08:56

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] mm: move PG_slab flag to page_type

On Mon, Sep 19, 2022 at 09:57:08PM +0900, Hyeonggon Yoo wrote:
> For now, only SLAB uses _mapcount field as a number of active objects in
> a slab, and other slab allocators do not use it. As 16 bits are enough
> for that, use remaining 16 bits of _mapcount as page_type even when
> SLAB is used. And then move PG_slab flag to page_type!
>
> Note that page_type is always placed in upper 16 bits of _mapcount to
> avoid confusing normal _mapcount as page_type. As underflow (actually
> I mean, yeah, overflow) is not a concern anymore, use more lower bits
> except bit zero.
>
> Add more folio helpers for PAGE_TYPE_OPS() not to break existing
> slab implementations.
>
> Remove PG_slab check from PAGE_FLAGS_CHECK_AT_FREE. buddy will still
> check if _mapcount is properly set at free.
>
> Exclude PG_slab from hwpoison and show_page_flags() for now.
>
> Note that with this patch, page_mapped() and folio_mapped() always return
> false for slab page.
>

[...]

Hi. a silly mistake:

>
> include/linux/mm_types.h | 22 +++++++--
> include/linux/page-flags.h | 83 ++++++++++++++++++++++++++--------
> include/trace/events/mmflags.h | 1 -
> mm/memory-failure.c | 8 ----
> mm/slab.h | 11 ++++-
> 5 files changed, 92 insertions(+), 33 deletions(-)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index cf97f3884fda..4b217c6fbe1f 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -193,12 +193,24 @@ struct page {
> atomic_t _mapcount;
>
> /*
> - * If the page is neither PageSlab nor mappable to userspace,
> - * the value stored here may help determine what this page
> - * is used for. See page-flags.h for a list of page types
> - * which are currently stored here.
> + * If the page is not mappable to userspace, the value
> + * stored here may help determine what this page is used for.
> + * See page-flags.h for a list of page types which are currently
> + * stored here.
> */
> - unsigned int page_type;
> + struct {
> + /*
> + * Always place page_type in
> + * upper 16 bits of _mapcount
> + */
> +#ifdef CPU_BIG_ENDIAN

s/CPU_BIG_ENDIAN/CONFIG_CPU_BIG_ENDIAN/g

> + __u16 page_type;
> + __u16 active;
> +#else
> + __u16 active;
> + __u16 page_type;
> +#endif
> + };
> };
>
> /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */

[...]

> diff --git a/mm/slab.h b/mm/slab.h
> index 985820b9069b..a5273e189265 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -20,7 +20,16 @@ struct slab {
> };
> struct rcu_head rcu_head;
> };
> - unsigned int active;
> + struct {
> + /* always place page_type in upper 16 bits of _mapcount */
> +#ifdef CPU_BIG_ENDIAN

same here.

> + __u16 page_type;
> + __u16 active;
> +#else
> + __u16 active;
> + __u16 page_type;
> +#endif
> + };
>
> #elif defined(CONFIG_SLUB)
>
> --
> 2.32.0
>

--
Thanks,
Hyeonggon

2022-09-25 00:27:07

by Matthew Wilcox (Oracle)

[permalink] [raw]
Subject: Re: [PATCH] mm: move PG_slab flag to page_type

On Mon, Sep 19, 2022 at 09:57:08PM +0900, Hyeonggon Yoo wrote:
> For now, only SLAB uses _mapcount field as a number of active objects in
> a slab, and other slab allocators do not use it. As 16 bits are enough
> for that, use remaining 16 bits of _mapcount as page_type even when
> SLAB is used. And then move PG_slab flag to page_type!
>
> Note that page_type is always placed in upper 16 bits of _mapcount to
> avoid confusing normal _mapcount as page_type. As underflow (actually
> I mean, yeah, overflow) is not a concern anymore, use more lower bits
> except bit zero.
>
> Add more folio helpers for PAGE_TYPE_OPS() not to break existing
> slab implementations.
>
> Remove PG_slab check from PAGE_FLAGS_CHECK_AT_FREE. buddy will still
> check if _mapcount is properly set at free.
>
> Exclude PG_slab from hwpoison and show_page_flags() for now.
>
> Note that with this patch, page_mapped() and folio_mapped() always return
> false for slab page.

This is an interesting approach. It raises some questions.

First, you say that folio_mapped() returns false for slab pages. That's
only true for order-0 slab pages. For larger pages,

if (!folio_test_large(folio))
return atomic_read(&folio->_mapcount) >= 0;
if (atomic_read(folio_mapcount_ptr(folio)) >= 0)
return true;

so that's going to depend what folio_mapcount_ptr() aliases with.

Second, this patch changes the behaviour of PageSlab() when applied to
tail pages. Which raises the further question of what PageBuddy(),
PageTable(), PageGuard() and PageIsolated() should do for multi-page
folios, if that is even possible.

Third, can we do this without that awkward __u16 thing? Perhaps

-#define PG_buddy 0x00000080
-#define PG_offline 0x00000100
-#define PG_table 0x00000200
-#define PG_guard 0x00000400
+#define PG_buddy 0x00010000
+#define PG_offline 0x00020000
+#define PG_table 0x00040000
+#define PG_guard 0x00080000
+#define PG_slab 0x00100000

... and then use wrappers in slab.c to access the bottom 16 bits?

2022-09-26 08:54:55

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] mm: move PG_slab flag to page_type

On 25.09.22 01:04, Matthew Wilcox wrote:
> On Mon, Sep 19, 2022 at 09:57:08PM +0900, Hyeonggon Yoo wrote:
>> For now, only SLAB uses _mapcount field as a number of active objects in
>> a slab, and other slab allocators do not use it. As 16 bits are enough
>> for that, use remaining 16 bits of _mapcount as page_type even when
>> SLAB is used. And then move PG_slab flag to page_type!
>>
>> Note that page_type is always placed in upper 16 bits of _mapcount to
>> avoid confusing normal _mapcount as page_type. As underflow (actually
>> I mean, yeah, overflow) is not a concern anymore, use more lower bits
>> except bit zero.
>>
>> Add more folio helpers for PAGE_TYPE_OPS() not to break existing
>> slab implementations.
>>
>> Remove PG_slab check from PAGE_FLAGS_CHECK_AT_FREE. buddy will still
>> check if _mapcount is properly set at free.
>>
>> Exclude PG_slab from hwpoison and show_page_flags() for now.
>>
>> Note that with this patch, page_mapped() and folio_mapped() always return
>> false for slab page.
>
> This is an interesting approach. It raises some questions.
>
> First, you say that folio_mapped() returns false for slab pages. That's
> only true for order-0 slab pages. For larger pages,
>
> if (!folio_test_large(folio))
> return atomic_read(&folio->_mapcount) >= 0;
> if (atomic_read(folio_mapcount_ptr(folio)) >= 0)
> return true;
>
> so that's going to depend what folio_mapcount_ptr() aliases with.
>
> Second, this patch changes the behaviour of PageSlab() when applied to
> tail pages. Which raises the further question of what PageBuddy(),
> PageTable(), PageGuard() and PageIsolated() should do for multi-page
> folios, if that is even possible.

IIRC, these flags never apply on real compound pages so far. For
example, PageBuddy() is only set on the first page of a (budy-aligned)
free chunk of memory, and all "remaining" (tail) pages have a refcount
of zero and don't have the flag set.

There are page tables on some systems (e.g., s390x) that span multiple
pages (pmd_alloc_one()). I think the behavior is similar -- no compound
page, and only the first page has the flag set.

--
Thanks,

David / dhildenb

2022-10-07 14:08:34

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] mm: move PG_slab flag to page_type

On Sun, Sep 25, 2022 at 12:04:40AM +0100, Matthew Wilcox wrote:
> On Mon, Sep 19, 2022 at 09:57:08PM +0900, Hyeonggon Yoo wrote:
> > For now, only SLAB uses _mapcount field as a number of active objects in
> > a slab, and other slab allocators do not use it. As 16 bits are enough
> > for that, use remaining 16 bits of _mapcount as page_type even when
> > SLAB is used. And then move PG_slab flag to page_type!
> >
> > Note that page_type is always placed in upper 16 bits of _mapcount to
> > avoid confusing normal _mapcount as page_type. As underflow (actually
> > I mean, yeah, overflow) is not a concern anymore, use more lower bits
> > except bit zero.
> >
> > Add more folio helpers for PAGE_TYPE_OPS() not to break existing
> > slab implementations.
> >
> > Remove PG_slab check from PAGE_FLAGS_CHECK_AT_FREE. buddy will still
> > check if _mapcount is properly set at free.
> >
> > Exclude PG_slab from hwpoison and show_page_flags() for now.
> >
> > Note that with this patch, page_mapped() and folio_mapped() always return
> > false for slab page.
>
> This is an interesting approach. It raises some questions.

Hello Matthew, sorry for late reply and I didn't mean to ignore your
feedback. I realized compound pages and folio stuffs are my weak side and
needed some time to learn :)

> First, you say that folio_mapped() returns false for slab pages. That's
> only true for order-0 slab pages. For larger pages,
>
> if (!folio_test_large(folio))
> return atomic_read(&folio->_mapcount) >= 0;
> if (atomic_read(folio_mapcount_ptr(folio)) >= 0)
> return true;
>
> so that's going to depend what folio_mapcount_ptr() aliases with.

IIUC it's true for order > 0 slab too.

As slab pages are not mapped to userspace at all,
entire compound page nor base pages are not mapped to userspace.

AFAIK followings are true for order > 0 slab:
- (first tail page)->compound_mapcount is -1
- _mapcount of base pages are -1

So:
folio_mapped() and page_mapped() (if applied to head page)
returns false for larger pages with this patch.

I wrote simple testcase and did check that folio_mapped() and page_mapped()
returns false for both order-0 page and larger pages. (and SLAB
returned true for them before)

> Second, this patch changes the behaviour of PageSlab() when applied to
> tail pages.

Altough it changes the way it checks the flag,

it does not change behavior when applied to tail pages - PageSlab() on tail
page returns false with or without this patch.

If PageSlab() need to return true for tail pages too,
we may make it check page_type at head page.

But I'm not sure when it the behavior is needed.
Can you please share your insight on this?

> Which raises the further question of what PageBuddy(),
> PageTable(), PageGuard() and PageIsolated() should do for multi-page
> folios, if that is even possible.

For users that uses real compound page like slab, we can make it check
page_type of head page. (if needed)

But for cases David described, there isn't much thing we can do
except making them to use real compound pages.

> Third, can we do this without that awkward __u16 thing? Perhaps
>
> -#define PG_buddy 0x00000080
> -#define PG_offline 0x00000100
> -#define PG_table 0x00000200
> -#define PG_guard 0x00000400
> +#define PG_buddy 0x00010000
> +#define PG_offline 0x00020000
> +#define PG_table 0x00040000
> +#define PG_guard 0x00080000
> +#define PG_slab 0x00100000
>
> ... and then use wrappers in slab.c to access the bottom 16 bits?

Definitely! I prefer that way and will adjust in RFC v2.

Thank you for precious feedback.

--
Hyeonggon

2022-10-07 18:46:28

by Matthew Wilcox (Oracle)

[permalink] [raw]
Subject: Re: [PATCH] mm: move PG_slab flag to page_type

On Fri, Oct 07, 2022 at 10:36:56PM +0900, Hyeonggon Yoo wrote:
> > First, you say that folio_mapped() returns false for slab pages. That's
> > only true for order-0 slab pages. For larger pages,
> >
> > if (!folio_test_large(folio))
> > return atomic_read(&folio->_mapcount) >= 0;
> > if (atomic_read(folio_mapcount_ptr(folio)) >= 0)
> > return true;
> >
> > so that's going to depend what folio_mapcount_ptr() aliases with.
>
> IIUC it's true for order > 0 slab too.
>
> As slab pages are not mapped to userspace at all,
> entire compound page nor base pages are not mapped to userspace.
>
> AFAIK followings are true for order > 0 slab:
> - (first tail page)->compound_mapcount is -1

That's the part I wasn't sure of. I think we do, in
prep_compound_head().

> - _mapcount of base pages are -1
>
> So:
> folio_mapped() and page_mapped() (if applied to head page)
> returns false for larger pages with this patch.
>
> I wrote simple testcase and did check that folio_mapped() and page_mapped()
> returns false for both order-0 page and larger pages. (and SLAB
> returned true for them before)
>
> > Second, this patch changes the behaviour of PageSlab() when applied to
> > tail pages.
>
> Altough it changes the way it checks the flag,
>
> it does not change behavior when applied to tail pages - PageSlab() on tail
> page returns false with or without this patch.

Really? It seems to me that it returns true at the moment. Look:

__PAGEFLAG(Slab, slab, PF_NO_TAIL)

#define PF_NO_TAIL(page, enforce) ({ \
VM_BUG_ON_PGFLAGS(enforce && PageTail(page), page); \
PF_POISONED_CHECK(compound_head(page)); })

so AFAICS, PageSlab checks the Slab bit on the head page, not the
tail page.

> If PageSlab() need to return true for tail pages too,
> we may make it check page_type at head page.
>
> But I'm not sure when it the behavior is needed.
> Can you please share your insight on this?

There are tools like tools/vm/page-types.c which expect PageSlab to
return true for tail pages.

> > Which raises the further question of what PageBuddy(),
> > PageTable(), PageGuard() and PageIsolated() should do for multi-page
> > folios, if that is even possible.
>
> For users that uses real compound page like slab, we can make it check
> page_type of head page. (if needed)
>
> But for cases David described, there isn't much thing we can do
> except making them to use real compound pages.
>
> > Third, can we do this without that awkward __u16 thing? Perhaps
> >
> > -#define PG_buddy 0x00000080
> > -#define PG_offline 0x00000100
> > -#define PG_table 0x00000200
> > -#define PG_guard 0x00000400
> > +#define PG_buddy 0x00010000
> > +#define PG_offline 0x00020000
> > +#define PG_table 0x00040000
> > +#define PG_guard 0x00080000
> > +#define PG_slab 0x00100000
> >
> > ... and then use wrappers in slab.c to access the bottom 16 bits?
>
> Definitely! I prefer that way and will adjust in RFC v2.
>
> Thank you for precious feedback.

No problem. I suggested (in an off-list email) that you consider counting
'active' by subtraction rather than addition because I have a feeling that

int active(struct slab *slab)
{
return ~(slab->page_type | PG_slab);
}

would be better than

int active(struct slab *slab)
{
return slab->page_type & 0xffff;
}

at least in part because you don't have to clear the bottom 16 bits of
page_type when you clear PG_slab, and you don't have to re-set them
when you set PG_slab.

2022-10-08 05:41:28

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] mm: move PG_slab flag to page_type

On Fri, Oct 07, 2022 at 07:02:35PM +0100, Matthew Wilcox wrote:
> On Fri, Oct 07, 2022 at 10:36:56PM +0900, Hyeonggon Yoo wrote:
> > > First, you say that folio_mapped() returns false for slab pages. That's
> > > only true for order-0 slab pages. For larger pages,
> > >
> > > if (!folio_test_large(folio))
> > > return atomic_read(&folio->_mapcount) >= 0;
> > > if (atomic_read(folio_mapcount_ptr(folio)) >= 0)
> > > return true;
> > >
> > > so that's going to depend what folio_mapcount_ptr() aliases with.
> >
> > IIUC it's true for order > 0 slab too.
> >
> > As slab pages are not mapped to userspace at all,
> > entire compound page nor base pages are not mapped to userspace.
> >
> > AFAIK followings are true for order > 0 slab:
> > - (first tail page)->compound_mapcount is -1
>
> That's the part I wasn't sure of. I think we do, in
> prep_compound_head().

Right, exactly!

>
> > - _mapcount of base pages are -1
> >
> > So:
> > folio_mapped() and page_mapped() (if applied to head page)
> > returns false for larger pages with this patch.
> >
> > I wrote simple testcase and did check that folio_mapped() and page_mapped()
> > returns false for both order-0 page and larger pages. (and SLAB
> > returned true for them before)

FYI, This is still true even after fixing my mistaken test case (see below)

> >
> > > Second, this patch changes the behaviour of PageSlab() when applied to
> > > tail pages.
> >
> > Altough it changes the way it checks the flag,
> >
> > it does not change behavior when applied to tail pages - PageSlab() on tail
> > page returns false with or without this patch.
>
> Really? It seems to me that it returns true at the moment. Look:
>
> __PAGEFLAG(Slab, slab, PF_NO_TAIL)
>
> #define PF_NO_TAIL(page, enforce) ({ \
> VM_BUG_ON_PGFLAGS(enforce && PageTail(page), page); \
> PF_POISONED_CHECK(compound_head(page)); })
>
> so AFAICS, PageSlab checks the Slab bit on the head page, not the
> tail page.

You are right. I misunderstood it due to my mistakenly written test case
(without passing __GFP_COMP... how silly of me :D)

Hmm okay, then I will implement PF_NO_TAIL policy that works on page_type.

>
> > If PageSlab() need to return true for tail pages too,
> > we may make it check page_type at head page.
> >
> > But I'm not sure when it the behavior is needed.
> > Can you please share your insight on this?
>
> There are tools like tools/vm/page-types.c which expect PageSlab to
> return true for tail pages.
>
> > > Which raises the further question of what PageBuddy(),
> > > PageTable(), PageGuard() and PageIsolated() should do for multi-page
> > > folios, if that is even possible.
> >
> > For users that uses real compound page like slab, we can make it check
> > page_type of head page. (if needed)
> >
> > But for cases David described, there isn't much thing we can do
> > except making them to use real compound pages.
> >
> > > Third, can we do this without that awkward __u16 thing? Perhaps
> > >
> > > -#define PG_buddy 0x00000080
> > > -#define PG_offline 0x00000100
> > > -#define PG_table 0x00000200
> > > -#define PG_guard 0x00000400
> > > +#define PG_buddy 0x00010000
> > > +#define PG_offline 0x00020000
> > > +#define PG_table 0x00040000
> > > +#define PG_guard 0x00080000
> > > +#define PG_slab 0x00100000
> > >
> > > ... and then use wrappers in slab.c to access the bottom 16 bits?
> >
> > Definitely! I prefer that way and will adjust in RFC v2.
> >
> > Thank you for precious feedback.
>
> No problem. I suggested (in an off-list email) that you consider counting
> 'active' by subtraction rather than addition because I have a feeling that
>
> int active(struct slab *slab)
> {
> return ~(slab->page_type | PG_slab);
> }
>
> would be better than
>
> int active(struct slab *slab)
> {
> return slab->page_type & 0xffff;
> }
>
> at least in part because you don't have to clear the bottom 16 bits of
> page_type when you clear PG_slab, and you don't have to re-set them
> when you set PG_slab.

Yeah, I was wondering what is the benefit of the that approach.
After implementing both approach, your suggestion seems better to me too.

Many thanks, Matthew!

--
Hyeonggon