2023-07-13 04:32:37

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

v1: https://lore.kernel.org/linux-mm/[email protected]

v1 -> v2:
- rebased to the latest mm-unstable, resulting in some patches dropped
- adjusted comments from Mike Rapoport, defining helpers when
converting its users

The purpose of this series is to define own memory descriptor for zsmalloc,
instead of re-using various fields of struct page. This is a part of the
effort to reduce the size of struct page to unsigned long and enable
dynamic allocation of memory descriptors.

While [1] outlines this ultimate objective, the current use of struct page
is highly dependent on its definition, making it challenging to separately
allocate memory descriptors.

Therefore, this series introduces new descriptor for zsmalloc, called
zsdesc. It overlays struct page for now, but will eventually be allocated
independently in the future. And apart from dynamic allocation of descriptors,
this is a nice cleanup.

This work is also available at:
https://gitlab.com/hyeyoo/linux/-/tree/separate_zsdesc_rfc-v2

[1] State Of The Page, August 2022
https://lore.kernel.org/lkml/[email protected]

Hyeonggon Yoo (21):
mm/zsmalloc: create new struct zsdesc
mm/zsmalloc: add utility functions for zsdesc
mm/zsmalloc: replace first_page to first_zsdesc in struct zspage
mm/zsmalloc: add alternatives of frequently used helper functions
mm/zsmalloc: convert {try,}lock_zspage() to use zsdesc
mm/zsmalloc: convert __zs_{map,unmap}_object() to use zsdesc
mm/zsmalloc: convert obj_to_location() and its users to use zsdesc
mm/zsmalloc: convert obj_malloc() to use zsdesc
mm/zsmalloc: convert create_page_chain() and its user to use zsdesc
mm/zsmalloc: convert obj_allocated() and related helpers to use zsdesc
mm/zsmalloc: convert init_zspage() to use zsdesc
mm/zsmalloc: convert obj_to_page() and zs_free() to use zsdesc
mm/zsmalloc: convert reset_page() to reset_zsdesc()
mm/zsmalloc: convert zs_page_{isolate,migrate,putback} to use zsdesc
mm/zsmalloc: convert __free_zspage() to use zsdesc
mm/zsmalloc: convert location_to_obj() to use zsdesc
mm/zsmalloc: convert migrate_zspage() to use zsdesc
mm/zsmalloc: convert get_zspage() to take zsdesc
mm/zsmalloc: convert SetZsPageMovable() to use zsdesc
mm/zsmalloc: remove now unused helper functions
mm/zsmalloc: convert {get,set}_first_obj_offset() to use zsdesc

mm/zsmalloc.c | 574 +++++++++++++++++++++++++++++++-------------------
1 file changed, 360 insertions(+), 214 deletions(-)

--
2.41.0



2023-07-13 04:33:41

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 01/21] mm/zsmalloc: create new struct zsdesc

Currently zsmalloc reuses fields of struct page. As part of simplifying
struct page, create own type for zsmalloc called zsdesc.

Remove comments about how zsmalloc reuses fields of struct page, because
zsdesc uses more intuitive names.

Note that zsmalloc does not use PG_owner_priv_v1 after commit a41ec880aa7b
("zsmalloc: move huge compressed obj from page to zspage"). Thus only
document how zsmalloc uses PG_private flag.

It is very tempting to rearrange zsdesc, but the three words after flags
field are not available for zsmalloc. Add comments about that.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 63 +++++++++++++++++++++++++++++++++++++--------------
1 file changed, 46 insertions(+), 17 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 32f5bc4074df..2204bea4f289 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -11,23 +11,6 @@
* Released under the terms of GNU General Public License Version 2.0
*/

-/*
- * Following is how we use various fields and flags of underlying
- * struct page(s) to form a zspage.
- *
- * Usage of struct page fields:
- * page->private: points to zspage
- * page->index: links together all component pages of a zspage
- * For the huge page, this is always 0, so we use this field
- * to store handle.
- * page->page_type: first object offset in a subpage of zspage
- *
- * Usage of struct page flags:
- * PG_private: identifies the first component page
- * PG_owner_priv_1: identifies the huge component page
- *
- */
-
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

/*
@@ -264,6 +247,52 @@ struct mapping_area {
enum zs_mapmode vm_mm; /* mapping mode */
};

+/*
+ * struct zsdesc - memory descriptor for zsmalloc memory
+ *
+ * This struct overlays struct page for now. Do not modify without a
+ * good understanding of the issues.
+ *
+ * Usage of struct page flags on zsdesc:
+ * PG_private: identifies the first component zsdesc
+ */
+struct zsdesc {
+ unsigned long flags;
+
+ /*
+ * Although not used by zsmalloc, this field is used by non-LRU page migration
+ * code. Leave it unused.
+ */
+ struct list_head lru;
+
+ /* Always points to zsmalloc_mops with PAGE_MAPPING_MOVABLE set */
+ struct movable_operations *mops;
+
+ union {
+ /* linked list of all zsdescs in a zspage */
+ struct zsdesc *next;
+ /* for huge zspages */
+ unsigned long handle;
+ };
+ struct zspage *zspage;
+ unsigned int first_obj_offset;
+ unsigned int _refcount;
+};
+
+#define ZSDESC_MATCH(pg, zs) \
+ static_assert(offsetof(struct page, pg) == offsetof(struct zsdesc, zs))
+
+ZSDESC_MATCH(flags, flags);
+ZSDESC_MATCH(lru, lru);
+ZSDESC_MATCH(mapping, mops);
+ZSDESC_MATCH(index, next);
+ZSDESC_MATCH(index, handle);
+ZSDESC_MATCH(private, zspage);
+ZSDESC_MATCH(page_type, first_obj_offset);
+ZSDESC_MATCH(_refcount, _refcount);
+#undef ZSDESC_MATCH
+static_assert(sizeof(struct zsdesc) <= sizeof(struct page));
+
/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
static void SetZsHugePage(struct zspage *zspage)
{
--
2.41.0


2023-07-13 04:33:55

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 21/21] mm/zsmalloc: convert {get,set}_first_obj_offset() to use zsdesc

Now that all users of {get,set}_first_obj_offset() are converted
to use zsdesc, convert them to use zsdesc.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 3933c023c3c9..7ac5d63e10a5 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -606,14 +606,14 @@ static struct zsdesc *get_first_zsdesc(struct zspage *zspage)
return first_zsdesc;
}

-static inline unsigned int get_first_obj_offset(struct page *page)
+static inline unsigned int get_first_obj_offset(struct zsdesc *zsdesc)
{
- return page->page_type;
+ return zsdesc->first_obj_offset;
}

-static inline void set_first_obj_offset(struct page *page, unsigned int offset)
+static inline void set_first_obj_offset(struct zsdesc *zsdesc, unsigned int offset)
{
- page->page_type = offset;
+ zsdesc->first_obj_offset = offset;
}

static inline unsigned int get_freeobj(struct zspage *zspage)
@@ -1049,7 +1049,7 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
struct link_free *link;
void *vaddr;

- set_first_obj_offset(zsdesc_page(zsdesc), off);
+ set_first_obj_offset(zsdesc, off);

vaddr = zsdesc_kmap_atomic(zsdesc);
link = (struct link_free *)vaddr + off / sizeof(*link);
@@ -1699,7 +1699,7 @@ static unsigned long find_alloced_obj(struct size_class *class,
unsigned long handle = 0;
void *addr = zsdesc_kmap_atomic(zsdesc);

- offset = get_first_obj_offset(zsdesc_page(zsdesc));
+ offset = get_first_obj_offset(zsdesc);
offset += class->size * index;

while (offset < PAGE_SIZE) {
@@ -1910,8 +1910,8 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
} while ((zsdesc = get_next_zsdesc(zsdesc)) != NULL);

create_page_chain(class, zspage, zsdescs);
- first_obj_offset = get_first_obj_offset(zsdesc_page(old_zsdesc));
- set_first_obj_offset(zsdesc_page(new_zsdesc), first_obj_offset);
+ first_obj_offset = get_first_obj_offset(old_zsdesc);
+ set_first_obj_offset(new_zsdesc, first_obj_offset);
if (unlikely(ZsHugePage(zspage)))
new_zsdesc->handle = old_zsdesc->handle;
zsdesc_set_movable(new_zsdesc);
@@ -1975,7 +1975,7 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
/* the migrate_write_lock protects zpage access via zs_map_object */
migrate_write_lock(zspage);

- offset = get_first_obj_offset(zsdesc_page(zsdesc));
+ offset = get_first_obj_offset(zsdesc);
s_addr = zsdesc_kmap_atomic(zsdesc);

/*
--
2.41.0


2023-07-13 04:34:08

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 09/21] mm/zsmalloc: convert create_page_chain() and its user to use zsdesc

Introduce a few helper functions for conversion.
Convert create_page_chain() and its user replace_sub_page() to use zsdesc.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 120 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 81 insertions(+), 39 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 06227da86afc..48bfdbbe3b1e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -346,6 +346,48 @@ static inline void *zsdesc_kmap_atomic(struct zsdesc *zsdesc)
return kmap_atomic(zsdesc_page(zsdesc));
}

+static inline void zsdesc_set_zspage(struct zsdesc *zsdesc,
+ struct zspage *zspage)
+{
+ zsdesc->zspage = zspage;
+}
+
+static inline void zsdesc_set_first(struct zsdesc *zsdesc)
+{
+ SetPagePrivate(zsdesc_page(zsdesc));
+}
+
+static const struct movable_operations zsmalloc_mops;
+
+static inline void zsdesc_set_movable(struct zsdesc *zsdesc)
+{
+ __SetPageMovable(zsdesc_page(zsdesc), &zsmalloc_mops);
+}
+
+static inline void zsdesc_inc_zone_page_state(struct zsdesc *zsdesc)
+{
+ inc_zone_page_state(zsdesc_page(zsdesc), NR_ZSPAGES);
+}
+
+static inline void zsdesc_dec_zone_page_state(struct zsdesc *zsdesc)
+{
+ dec_zone_page_state(zsdesc_page(zsdesc), NR_ZSPAGES);
+}
+
+static inline struct zsdesc *alloc_zsdesc(gfp_t gfp)
+{
+ struct page *page = alloc_page(gfp);
+
+ return page_zsdesc(page);
+}
+
+static inline void free_zsdesc(struct zsdesc *zsdesc)
+{
+ struct page *page = zsdesc_page(zsdesc);
+
+ __free_page(page);
+}
+
/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
static void SetZsHugePage(struct zspage *zspage)
{
@@ -1047,35 +1089,35 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
}

static void create_page_chain(struct size_class *class, struct zspage *zspage,
- struct page *pages[])
+ struct zsdesc *zsdescs[])
{
int i;
- struct page *page;
- struct page *prev_page = NULL;
- int nr_pages = class->pages_per_zspage;
+ struct zsdesc *zsdesc;
+ struct zsdesc *prev_zsdesc = NULL;
+ int nr_zsdescs = class->pages_per_zspage;

/*
* Allocate individual pages and link them together as:
- * 1. all pages are linked together using page->index
- * 2. each sub-page point to zspage using page->private
+ * 1. all pages are linked together using zsdesc->next
+ * 2. each sub-page point to zspage using zsdesc->zspage
*
- * we set PG_private to identify the first page (i.e. no other sub-page
+ * we set PG_private to identify the first zsdesc (i.e. no other zsdesc
* has this flag set).
*/
- for (i = 0; i < nr_pages; i++) {
- page = pages[i];
- set_page_private(page, (unsigned long)zspage);
- page->index = 0;
+ for (i = 0; i < nr_zsdescs; i++) {
+ zsdesc = zsdescs[i];
+ zsdesc_set_zspage(zsdesc, zspage);
+ zsdesc->next = NULL;
if (i == 0) {
- zspage->first_zsdesc = page_zsdesc(page);
- SetPagePrivate(page);
+ zspage->first_zsdesc = zsdesc;
+ zsdesc_set_first(zsdesc);
if (unlikely(class->objs_per_zspage == 1 &&
class->pages_per_zspage == 1))
SetZsHugePage(zspage);
} else {
- prev_page->index = (unsigned long)page;
+ prev_zsdesc->next = zsdesc;
}
- prev_page = page;
+ prev_zsdesc = zsdesc;
}
}

@@ -1087,7 +1129,7 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
gfp_t gfp)
{
int i;
- struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
+ struct zsdesc *zsdescs[ZS_MAX_PAGES_PER_ZSPAGE];
struct zspage *zspage = cache_alloc_zspage(pool, gfp);

if (!zspage)
@@ -1097,23 +1139,23 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
migrate_lock_init(zspage);

for (i = 0; i < class->pages_per_zspage; i++) {
- struct page *page;
+ struct zsdesc *zsdesc;

- page = alloc_page(gfp);
- if (!page) {
+ zsdesc = alloc_zsdesc(gfp);
+ if (!zsdesc) {
while (--i >= 0) {
- dec_zone_page_state(pages[i], NR_ZSPAGES);
- __free_page(pages[i]);
+ zsdesc_dec_zone_page_state(zsdescs[i]);
+ free_zsdesc(zsdescs[i]);
}
cache_free_zspage(pool, zspage);
return NULL;
}

- inc_zone_page_state(page, NR_ZSPAGES);
- pages[i] = page;
+ zsdesc_inc_zone_page_state(zsdesc);
+ zsdescs[i] = zsdesc;
}

- create_page_chain(class, zspage, pages);
+ create_page_chain(class, zspage, zsdescs);
init_zspage(class, zspage);
zspage->pool = pool;

@@ -1856,29 +1898,29 @@ static void dec_zspage_isolation(struct zspage *zspage)
zspage->isolated--;
}

-static const struct movable_operations zsmalloc_mops;
-
static void replace_sub_page(struct size_class *class, struct zspage *zspage,
- struct page *newpage, struct page *oldpage)
+ struct zsdesc *new_zsdesc, struct zsdesc *old_zsdesc)
{
- struct page *page;
- struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE] = {NULL, };
+ struct zsdesc *zsdesc;
+ struct zsdesc *zsdescs[ZS_MAX_PAGES_PER_ZSPAGE] = {NULL, };
+ unsigned int first_obj_offset;
int idx = 0;

- page = get_first_page(zspage);
+ zsdesc = get_first_zsdesc(zspage);
do {
- if (page == oldpage)
- pages[idx] = newpage;
+ if (zsdesc == old_zsdesc)
+ zsdescs[idx] = new_zsdesc;
else
- pages[idx] = page;
+ zsdescs[idx] = zsdesc;
idx++;
- } while ((page = get_next_page(page)) != NULL);
+ } while ((zsdesc = get_next_zsdesc(zsdesc)) != NULL);

- create_page_chain(class, zspage, pages);
- set_first_obj_offset(newpage, get_first_obj_offset(oldpage));
+ create_page_chain(class, zspage, zsdescs);
+ first_obj_offset = get_first_obj_offset(zsdesc_page(old_zsdesc));
+ set_first_obj_offset(zsdesc_page(new_zsdesc), first_obj_offset);
if (unlikely(ZsHugePage(zspage)))
- newpage->index = oldpage->index;
- __SetPageMovable(newpage, &zsmalloc_mops);
+ new_zsdesc->handle = old_zsdesc->handle;
+ zsdesc_set_movable(new_zsdesc);
}

static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
@@ -1959,7 +2001,7 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
}
kunmap_atomic(s_addr);

- replace_sub_page(class, zspage, newpage, page);
+ replace_sub_page(class, zspage, page_zsdesc(newpage), page_zsdesc(page));
/*
* Since we complete the data copy and set up new zspage structure,
* it's okay to release the pool's lock.
--
2.41.0


2023-07-13 04:34:16

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 08/21] mm/zsmalloc: convert obj_malloc() to use zsdesc

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index be9762a49237..06227da86afc 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1412,12 +1412,12 @@ EXPORT_SYMBOL_GPL(zs_huge_class_size);
static unsigned long obj_malloc(struct zs_pool *pool,
struct zspage *zspage, unsigned long handle)
{
- int i, nr_page, offset;
+ int i, nr_zsdesc, offset;
unsigned long obj;
struct link_free *link;
struct size_class *class;

- struct page *m_page;
+ struct zsdesc *m_zsdesc;
unsigned long m_offset;
void *vaddr;

@@ -1426,14 +1426,14 @@ static unsigned long obj_malloc(struct zs_pool *pool,
obj = get_freeobj(zspage);

offset = obj * class->size;
- nr_page = offset >> PAGE_SHIFT;
+ nr_zsdesc = offset >> PAGE_SHIFT;
m_offset = offset_in_page(offset);
- m_page = get_first_page(zspage);
+ m_zsdesc = get_first_zsdesc(zspage);

- for (i = 0; i < nr_page; i++)
- m_page = get_next_page(m_page);
+ for (i = 0; i < nr_zsdesc; i++)
+ m_zsdesc = get_next_zsdesc(m_zsdesc);

- vaddr = kmap_atomic(m_page);
+ vaddr = zsdesc_kmap_atomic(m_zsdesc);
link = (struct link_free *)vaddr + m_offset / sizeof(*link);
set_freeobj(zspage, link->next >> OBJ_TAG_BITS);
if (likely(!ZsHugePage(zspage)))
@@ -1446,7 +1446,7 @@ static unsigned long obj_malloc(struct zs_pool *pool,
kunmap_atomic(vaddr);
mod_zspage_inuse(zspage, 1);

- obj = location_to_obj(m_page, obj);
+ obj = location_to_obj(zsdesc_page(m_zsdesc), obj);

return obj;
}
--
2.41.0


2023-07-13 04:36:10

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 02/21] mm/zsmalloc: add utility functions for zsdesc

Introduce basic utility functions for zsdesc to avoid directly accessing
fields of struct page. More helpers will be defined later.

zsdesc_page() is defined with _Generic to preserve constness.
page_zsdesc() does not call compound_head() because zsdesc is always
a base page.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2204bea4f289..11c203e79c39 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -293,6 +293,39 @@ ZSDESC_MATCH(_refcount, _refcount);
#undef ZSDESC_MATCH
static_assert(sizeof(struct zsdesc) <= sizeof(struct page));

+#define zsdesc_page(zdesc) (_Generic((zdesc), \
+ const struct zsdesc *: (const struct page *)zdesc, \
+ struct zsdesc *: (struct page *)zdesc))
+
+static inline struct zsdesc *page_zsdesc(struct page *page)
+{
+ return (struct zsdesc *)page;
+}
+
+static inline unsigned long zsdesc_pfn(const struct zsdesc *zsdesc)
+{
+ return page_to_pfn(zsdesc_page(zsdesc));
+}
+
+static inline struct zsdesc *pfn_zsdesc(unsigned long pfn)
+{
+ return page_zsdesc(pfn_to_page(pfn));
+}
+
+static inline void zsdesc_get(struct zsdesc *zsdesc)
+{
+ struct folio *folio = (struct folio *)zsdesc;
+
+ folio_get(folio);
+}
+
+static inline void zsdesc_put(struct zsdesc *zsdesc)
+{
+ struct folio *folio = (struct folio *)zsdesc;
+
+ folio_put(folio);
+}
+
/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
static void SetZsHugePage(struct zspage *zspage)
{
--
2.41.0


2023-07-13 04:37:38

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 20/21] mm/zsmalloc: remove now unused helper functions

All users of is_first_page(), get_first_page(), get_next_page()
are now converted to use new helper functions that takes zsdesc.

Remove now unused helper functions.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 29 +++--------------------------
1 file changed, 3 insertions(+), 26 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5f07e3d92a99..3933c023c3c9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -581,12 +581,7 @@ static DEFINE_PER_CPU(struct mapping_area, zs_map_area) = {
.lock = INIT_LOCAL_LOCK(lock),
};

-static __maybe_unused int is_first_page(struct page *page)
-{
- return PagePrivate(page);
-}
-
-static __maybe_unused int is_first_zsdesc(struct zsdesc *zsdesc)
+static int is_first_zsdesc(struct zsdesc *zsdesc)
{
return PagePrivate(zsdesc_page(zsdesc));
}
@@ -603,15 +598,7 @@ static inline void mod_zspage_inuse(struct zspage *zspage, int val)
zspage->inuse += val;
}

-static __maybe_unused inline struct page *get_first_page(struct zspage *zspage)
-{
- struct page *first_page = zsdesc_page(zspage->first_zsdesc);
-
- VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- return first_page;
-}
-
-static __maybe_unused struct zsdesc *get_first_zsdesc(struct zspage *zspage)
+static struct zsdesc *get_first_zsdesc(struct zspage *zspage)
{
struct zsdesc *first_zsdesc = zspage->first_zsdesc;

@@ -907,17 +894,7 @@ static struct zspage *get_zspage(struct zsdesc *zsdesc)
return zspage;
}

-static __maybe_unused struct page *get_next_page(struct page *page)
-{
- struct zspage *zspage = get_zspage(page_zsdesc(page));
-
- if (unlikely(ZsHugePage(zspage)))
- return NULL;
-
- return (struct page *)page->index;
-}
-
-static __maybe_unused struct zsdesc *get_next_zsdesc(struct zsdesc *zsdesc)
+static struct zsdesc *get_next_zsdesc(struct zsdesc *zsdesc)
{
struct zspage *zspage = get_zspage(zsdesc);

--
2.41.0


2023-07-13 04:38:20

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 18/21] mm/zsmalloc: convert get_zspage() to take zsdesc

Now that all users except get_next_page() (which will be removed in
later patch) use zsdesc, convert get_zspage() to take zsdesc instead
of page.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index db43a5d05233..6cb216b8564a 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -899,9 +899,9 @@ static int fix_fullness_group(struct size_class *class, struct zspage *zspage)
return newfg;
}

-static struct zspage *get_zspage(struct page *page)
+static struct zspage *get_zspage(struct zsdesc *zsdesc)
{
- struct zspage *zspage = (struct zspage *)page_private(page);
+ struct zspage *zspage = zsdesc->zspage;

BUG_ON(zspage->magic != ZSPAGE_MAGIC);
return zspage;
@@ -909,7 +909,7 @@ static struct zspage *get_zspage(struct page *page)

static __maybe_unused struct page *get_next_page(struct page *page)
{
- struct zspage *zspage = get_zspage(page);
+ struct zspage *zspage = get_zspage(page_zsdesc(page));

if (unlikely(ZsHugePage(zspage)))
return NULL;
@@ -919,7 +919,7 @@ static __maybe_unused struct page *get_next_page(struct page *page)

static __maybe_unused struct zsdesc *get_next_zsdesc(struct zsdesc *zsdesc)
{
- struct zspage *zspage = get_zspage(zsdesc_page(zsdesc));
+ struct zspage *zspage = get_zspage(zsdesc);

if (unlikely(ZsHugePage(zspage)))
return NULL;
@@ -972,7 +972,7 @@ static inline bool obj_allocated(struct zsdesc *zsdesc, void *obj,
unsigned long *phandle)
{
unsigned long handle;
- struct zspage *zspage = get_zspage(zsdesc_page(zsdesc));
+ struct zspage *zspage = get_zspage(zsdesc);

if (unlikely(ZsHugePage(zspage))) {
VM_BUG_ON_PAGE(!is_first_zsdesc(zsdesc), zsdesc_page(zsdesc));
@@ -1377,7 +1377,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
spin_lock(&pool->lock);
obj = handle_to_obj(handle);
obj_to_location(obj, &zsdesc, &obj_idx);
- zspage = get_zspage(zsdesc_page(zsdesc));
+ zspage = get_zspage(zsdesc);

/*
* migration cannot move any zpages in this zspage. Here, pool->lock
@@ -1427,7 +1427,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)

obj = handle_to_obj(handle);
obj_to_location(obj, &zsdesc, &obj_idx);
- zspage = get_zspage(zsdesc_page(zsdesc));
+ zspage = get_zspage(zsdesc);
class = zspage_class(pool, zspage);
off = offset_in_page(class->size * obj_idx);

@@ -1591,7 +1591,7 @@ static void obj_free(int class_size, unsigned long obj)

obj_to_location(obj, &f_zsdesc, &f_objidx);
f_offset = offset_in_page(class_size * f_objidx);
- zspage = get_zspage(zsdesc_page(f_zsdesc));
+ zspage = get_zspage(f_zsdesc);

vaddr = zsdesc_kmap_atomic(f_zsdesc);
link = (struct link_free *)(vaddr + f_offset);
@@ -1625,7 +1625,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
spin_lock(&pool->lock);
obj = handle_to_obj(handle);
obj_to_zsdesc(obj, &f_zsdesc);
- zspage = get_zspage(zsdesc_page(f_zsdesc));
+ zspage = get_zspage(f_zsdesc);
class = zspage_class(pool, zspage);

class_stat_dec(class, ZS_OBJS_INUSE, 1);
@@ -1951,7 +1951,7 @@ static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
*/
VM_BUG_ON_PAGE(zsdesc_is_isolated(zsdesc), zsdesc_page(zsdesc));

- zspage = get_zspage(zsdesc_page(zsdesc));
+ zspage = get_zspage(zsdesc);
migrate_write_lock(zspage);
inc_zspage_isolation(zspage);
migrate_write_unlock(zspage);
@@ -1985,7 +1985,7 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
VM_BUG_ON_PAGE(!zsdesc_is_isolated(zsdesc), zsdesc_page(zsdesc));

/* The page is locked, so this pointer must remain valid */
- zspage = get_zspage(zsdesc_page(zsdesc));
+ zspage = get_zspage(zsdesc);
pool = zspage->pool;

/*
@@ -2049,7 +2049,7 @@ static void zs_page_putback(struct page *page)

VM_BUG_ON_PAGE(!zsdesc_is_isolated(zsdesc), zsdesc_page(zsdesc));

- zspage = get_zspage(zsdesc_page(zsdesc));
+ zspage = get_zspage(zsdesc);
migrate_write_lock(zspage);
dec_zspage_isolation(zspage);
migrate_write_unlock(zspage);
--
2.41.0


2023-07-13 04:38:53

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 12/21] mm/zsmalloc: convert obj_to_page() and zs_free() to use zsdesc

Convert obj_to_page() to obj_to_zsdesc() and also convert its user
zs_free() to use zsdesc.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index c8c3039a751a..5141a0c72c61 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -915,10 +915,10 @@ static void obj_to_location(unsigned long obj, struct zsdesc **zsdesc,
*obj_idx = (obj & OBJ_INDEX_MASK);
}

-static void obj_to_page(unsigned long obj, struct page **page)
+static void obj_to_zsdesc(unsigned long obj, struct zsdesc **zsdesc)
{
obj >>= OBJ_TAG_BITS;
- *page = pfn_to_page(obj >> OBJ_INDEX_BITS);
+ *zsdesc = pfn_zsdesc(obj >> OBJ_INDEX_BITS);
}

/**
@@ -1593,7 +1593,7 @@ static void obj_free(int class_size, unsigned long obj)
void zs_free(struct zs_pool *pool, unsigned long handle)
{
struct zspage *zspage;
- struct page *f_page;
+ struct zsdesc *f_zsdesc;
unsigned long obj;
struct size_class *class;
int fullness;
@@ -1607,8 +1607,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
*/
spin_lock(&pool->lock);
obj = handle_to_obj(handle);
- obj_to_page(obj, &f_page);
- zspage = get_zspage(f_page);
+ obj_to_zsdesc(obj, &f_zsdesc);
+ zspage = get_zspage(zsdesc_page(f_zsdesc));
class = zspage_class(pool, zspage);

class_stat_dec(class, ZS_OBJS_INUSE, 1);
--
2.41.0


2023-07-13 04:42:00

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 15/21] mm/zsmalloc: convert __free_zspage() to use zsdesc

Introduce zsdesc_is_locked() and convert __free_zspage() to use zsdesc.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9e4ced14e1eb..69bd497de35e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -336,6 +336,11 @@ static inline void unlock_zsdesc(struct zsdesc *zsdesc)
unlock_page(zsdesc_page(zsdesc));
}

+static inline bool zsdesc_is_locked(struct zsdesc *zsdesc)
+{
+ return PageLocked(zsdesc_page(zsdesc));
+}
+
static inline void wait_on_zsdesc_locked(struct zsdesc *zsdesc)
{
wait_on_page_locked(zsdesc_page(zsdesc));
@@ -1007,7 +1012,7 @@ static int trylock_zspage(struct zspage *zspage)
static void __free_zspage(struct zs_pool *pool, struct size_class *class,
struct zspage *zspage)
{
- struct page *page, *next;
+ struct zsdesc *zsdesc, *next;
int fg;
unsigned int class_idx;

@@ -1018,16 +1023,16 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,
VM_BUG_ON(get_zspage_inuse(zspage));
VM_BUG_ON(fg != ZS_INUSE_RATIO_0);

- next = page = get_first_page(zspage);
+ next = zsdesc = get_first_zsdesc(zspage);
do {
- VM_BUG_ON_PAGE(!PageLocked(page), page);
- next = get_next_page(page);
- reset_zsdesc(page_zsdesc(page));
- unlock_page(page);
- dec_zone_page_state(page, NR_ZSPAGES);
- put_page(page);
- page = next;
- } while (page != NULL);
+ VM_BUG_ON_PAGE(!zsdesc_is_locked(zsdesc), zsdesc_page(zsdesc));
+ next = get_next_zsdesc(zsdesc);
+ reset_zsdesc(zsdesc);
+ unlock_zsdesc(zsdesc);
+ zsdesc_dec_zone_page_state(zsdesc);
+ zsdesc_put(zsdesc);
+ zsdesc = next;
+ } while (zsdesc != NULL);

cache_free_zspage(pool, zspage);

--
2.41.0


2023-07-13 04:42:02

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 10/21] mm/zsmalloc: convert obj_allocated() and related helpers to use zsdesc

Convert obj_allocated(), and related helpers to take zsdesc. Also make
its callers to cast (struct page *) to (struct zsdesc *) when calling them.
The users will be converted gradually as there are many.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 48bfdbbe3b1e..efd7a0f78962 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -942,15 +942,15 @@ static unsigned long handle_to_obj(unsigned long handle)
return *(unsigned long *)handle;
}

-static inline bool obj_allocated(struct page *page, void *obj,
+static inline bool obj_allocated(struct zsdesc *zsdesc, void *obj,
unsigned long *phandle)
{
unsigned long handle;
- struct zspage *zspage = get_zspage(page);
+ struct zspage *zspage = get_zspage(zsdesc_page(zsdesc));

if (unlikely(ZsHugePage(zspage))) {
- VM_BUG_ON_PAGE(!is_first_page(page), page);
- handle = page->index;
+ VM_BUG_ON_PAGE(!is_first_zsdesc(zsdesc), zsdesc_page(zsdesc));
+ handle = zsdesc->handle;
} else
handle = *(unsigned long *)obj;

@@ -1698,18 +1698,18 @@ static void zs_object_copy(struct size_class *class, unsigned long dst,
* return handle.
*/
static unsigned long find_alloced_obj(struct size_class *class,
- struct page *page, int *obj_idx)
+ struct zsdesc *zsdesc, int *obj_idx)
{
unsigned int offset;
int index = *obj_idx;
unsigned long handle = 0;
- void *addr = kmap_atomic(page);
+ void *addr = zsdesc_kmap_atomic(zsdesc);

- offset = get_first_obj_offset(page);
+ offset = get_first_obj_offset(zsdesc_page(zsdesc));
offset += class->size * index;

while (offset < PAGE_SIZE) {
- if (obj_allocated(page, addr + offset, &handle))
+ if (obj_allocated(zsdesc, addr + offset, &handle))
break;

offset += class->size;
@@ -1733,7 +1733,7 @@ static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
struct size_class *class = pool->size_class[src_zspage->class];

while (1) {
- handle = find_alloced_obj(class, s_page, &obj_idx);
+ handle = find_alloced_obj(class, page_zsdesc(s_page), &obj_idx);
if (!handle) {
s_page = get_next_page(s_page);
if (!s_page)
@@ -1990,7 +1990,7 @@ static int zs_page_migrate(struct page *newpage, struct page *page,

for (addr = s_addr + offset; addr < s_addr + PAGE_SIZE;
addr += class->size) {
- if (obj_allocated(page, addr, &handle)) {
+ if (obj_allocated(page_zsdesc(page), addr, &handle)) {

old_obj = handle_to_obj(handle);
obj_to_location(old_obj, &dummy, &obj_idx);
--
2.41.0


2023-07-13 04:42:40

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 17/21] mm/zsmalloc: convert migrate_zspage() to use zsdesc

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index fd920b659b1d..db43a5d05233 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1746,14 +1746,14 @@ static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
unsigned long used_obj, free_obj;
unsigned long handle;
int obj_idx = 0;
- struct page *s_page = get_first_page(src_zspage);
+ struct zsdesc *s_zsdesc = get_first_zsdesc(src_zspage);
struct size_class *class = pool->size_class[src_zspage->class];

while (1) {
- handle = find_alloced_obj(class, page_zsdesc(s_page), &obj_idx);
+ handle = find_alloced_obj(class, s_zsdesc, &obj_idx);
if (!handle) {
- s_page = get_next_page(s_page);
- if (!s_page)
+ s_zsdesc = get_next_zsdesc(s_zsdesc);
+ if (!s_zsdesc)
break;
obj_idx = 0;
continue;
--
2.41.0


2023-07-13 04:42:52

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 06/21] mm/zsmalloc: convert __zs_{map,unmap}_object() to use zsdesc

These two functions take array of pointer to struct page. Introduce
zsdesc_kmap_atomic() and make them take array of pointer to zsdesc
instead of page.

Add silly type casting when calling them which. Casting will be removed
in the next patch.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2cce76a19a1e..4c0563fce3d0 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -341,6 +341,11 @@ static inline void wait_on_zsdesc_locked(struct zsdesc *zsdesc)
wait_on_page_locked(zsdesc_page(zsdesc));
}

+static inline void *zsdesc_kmap_atomic(struct zsdesc *zsdesc)
+{
+ return kmap_atomic(zsdesc_page(zsdesc));
+}
+
/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
static void SetZsHugePage(struct zspage *zspage)
{
@@ -1151,7 +1156,7 @@ static inline void __zs_cpu_down(struct mapping_area *area)
}

static void *__zs_map_object(struct mapping_area *area,
- struct page *pages[2], int off, int size)
+ struct zsdesc *zsdescs[2], int off, int size)
{
int sizes[2];
void *addr;
@@ -1168,10 +1173,10 @@ static void *__zs_map_object(struct mapping_area *area,
sizes[1] = size - sizes[0];

/* copy object to per-cpu buffer */
- addr = kmap_atomic(pages[0]);
+ addr = zsdesc_kmap_atomic(zsdescs[0]);
memcpy(buf, addr + off, sizes[0]);
kunmap_atomic(addr);
- addr = kmap_atomic(pages[1]);
+ addr = zsdesc_kmap_atomic(zsdescs[1]);
memcpy(buf + sizes[0], addr, sizes[1]);
kunmap_atomic(addr);
out:
@@ -1179,7 +1184,7 @@ static void *__zs_map_object(struct mapping_area *area,
}

static void __zs_unmap_object(struct mapping_area *area,
- struct page *pages[2], int off, int size)
+ struct zsdesc *zsdescs[2], int off, int size)
{
int sizes[2];
void *addr;
@@ -1198,10 +1203,10 @@ static void __zs_unmap_object(struct mapping_area *area,
sizes[1] = size - sizes[0];

/* copy per-cpu buffer to object */
- addr = kmap_atomic(pages[0]);
+ addr = zsdesc_kmap_atomic(zsdescs[0]);
memcpy(addr + off, buf, sizes[0]);
kunmap_atomic(addr);
- addr = kmap_atomic(pages[1]);
+ addr = zsdesc_kmap_atomic(zsdescs[1]);
memcpy(addr, buf + sizes[0], sizes[1]);
kunmap_atomic(addr);

@@ -1342,7 +1347,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
pages[1] = get_next_page(page);
BUG_ON(!pages[1]);

- ret = __zs_map_object(area, pages, off, class->size);
+ ret = __zs_map_object(area, (struct zsdesc **)pages, off, class->size);
out:
if (likely(!ZsHugePage(zspage)))
ret += ZS_HANDLE_SIZE;
@@ -1377,7 +1382,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
pages[1] = get_next_page(page);
BUG_ON(!pages[1]);

- __zs_unmap_object(area, pages, off, class->size);
+ __zs_unmap_object(area, (struct zsdesc **)pages, off, class->size);
}
local_unlock(&zs_map_area.lock);

--
2.41.0


2023-07-13 04:43:25

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 16/21] mm/zsmalloc: convert location_to_obj() to use zsdesc

As all users of location_to_obj() now use zsdesc, convert
location_to_obj() to use zsdesc.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 69bd497de35e..fd920b659b1d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -948,15 +948,15 @@ static void obj_to_zsdesc(unsigned long obj, struct zsdesc **zsdesc)
}

/**
- * location_to_obj - get obj value encoded from (<page>, <obj_idx>)
- * @page: page object resides in zspage
+ * location_to_obj - get obj value encoded from (<zsdesc>, <obj_idx>)
+ * @zsdesc object resides in zspage
* @obj_idx: object index
*/
-static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
+static unsigned long location_to_obj(struct zsdesc *zsdesc, unsigned int obj_idx)
{
unsigned long obj;

- obj = page_to_pfn(page) << OBJ_INDEX_BITS;
+ obj = zsdesc_pfn(zsdesc) << OBJ_INDEX_BITS;
obj |= obj_idx & OBJ_INDEX_MASK;
obj <<= OBJ_TAG_BITS;

@@ -1505,7 +1505,7 @@ static unsigned long obj_malloc(struct zs_pool *pool,
kunmap_atomic(vaddr);
mod_zspage_inuse(zspage, 1);

- obj = location_to_obj(zsdesc_page(m_zsdesc), obj);
+ obj = location_to_obj(m_zsdesc, obj);

return obj;
}
@@ -2014,7 +2014,7 @@ static int zs_page_migrate(struct page *newpage, struct page *page,

old_obj = handle_to_obj(handle);
obj_to_location(old_obj, &dummy, &obj_idx);
- new_obj = (unsigned long)location_to_obj(zsdesc_page(new_zsdesc),
+ new_obj = (unsigned long)location_to_obj(new_zsdesc,
obj_idx);
record_obj(handle, new_obj);
}
--
2.41.0


2023-07-13 04:43:53

by Hyeonggon Yoo

[permalink] [raw]
Subject: [RFC PATCH v2 14/21] mm/zsmalloc: convert zs_page_{isolate,migrate,putback} to use zsdesc

Convert the functions for movable operations of zsmalloc to use zsdesc.

Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/zsmalloc.c | 50 ++++++++++++++++++++++++++++++++------------------
1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 085f5c791a03..9e4ced14e1eb 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -399,6 +399,16 @@ static void reset_zsdesc(struct zsdesc *zsdesc)
page->index = 0;
}

+static inline bool zsdesc_is_isolated(struct zsdesc *zsdesc)
+{
+ return PageIsolated(zsdesc_page(zsdesc));
+}
+
+struct zone *zsdesc_zone(struct zsdesc *zsdesc)
+{
+ return page_zone(zsdesc_page(zsdesc));
+}
+
/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
static void SetZsHugePage(struct zspage *zspage)
{
@@ -1928,14 +1938,15 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
{
struct zspage *zspage;
+ struct zsdesc *zsdesc = page_zsdesc(page);

/*
* Page is locked so zspage couldn't be destroyed. For detail, look at
* lock_zspage in free_zspage.
*/
- VM_BUG_ON_PAGE(PageIsolated(page), page);
+ VM_BUG_ON_PAGE(zsdesc_is_isolated(zsdesc), zsdesc_page(zsdesc));

- zspage = get_zspage(page);
+ zspage = get_zspage(zsdesc_page(zsdesc));
migrate_write_lock(zspage);
inc_zspage_isolation(zspage);
migrate_write_unlock(zspage);
@@ -1950,6 +1961,8 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
struct size_class *class;
struct zspage *zspage;
struct zsdesc *dummy;
+ struct zsdesc *new_zsdesc = page_zsdesc(newpage);
+ struct zsdesc *zsdesc = page_zsdesc(page);
void *s_addr, *d_addr, *addr;
unsigned int offset;
unsigned long handle;
@@ -1964,10 +1977,10 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
if (mode == MIGRATE_SYNC_NO_COPY)
return -EINVAL;

- VM_BUG_ON_PAGE(!PageIsolated(page), page);
+ VM_BUG_ON_PAGE(!zsdesc_is_isolated(zsdesc), zsdesc_page(zsdesc));

/* The page is locked, so this pointer must remain valid */
- zspage = get_zspage(page);
+ zspage = get_zspage(zsdesc_page(zsdesc));
pool = zspage->pool;

/*
@@ -1980,30 +1993,30 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
/* the migrate_write_lock protects zpage access via zs_map_object */
migrate_write_lock(zspage);

- offset = get_first_obj_offset(page);
- s_addr = kmap_atomic(page);
+ offset = get_first_obj_offset(zsdesc_page(zsdesc));
+ s_addr = zsdesc_kmap_atomic(zsdesc);

/*
* Here, any user cannot access all objects in the zspage so let's move.
*/
- d_addr = kmap_atomic(newpage);
+ d_addr = zsdesc_kmap_atomic(new_zsdesc);
memcpy(d_addr, s_addr, PAGE_SIZE);
kunmap_atomic(d_addr);

for (addr = s_addr + offset; addr < s_addr + PAGE_SIZE;
addr += class->size) {
- if (obj_allocated(page_zsdesc(page), addr, &handle)) {
+ if (obj_allocated(zsdesc, addr, &handle)) {

old_obj = handle_to_obj(handle);
obj_to_location(old_obj, &dummy, &obj_idx);
- new_obj = (unsigned long)location_to_obj(newpage,
+ new_obj = (unsigned long)location_to_obj(zsdesc_page(new_zsdesc),
obj_idx);
record_obj(handle, new_obj);
}
}
kunmap_atomic(s_addr);

- replace_sub_page(class, zspage, page_zsdesc(newpage), page_zsdesc(page));
+ replace_sub_page(class, zspage, new_zsdesc, zsdesc);
/*
* Since we complete the data copy and set up new zspage structure,
* it's okay to release the pool's lock.
@@ -2012,14 +2025,14 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
dec_zspage_isolation(zspage);
migrate_write_unlock(zspage);

- get_page(newpage);
- if (page_zone(newpage) != page_zone(page)) {
- dec_zone_page_state(page, NR_ZSPAGES);
- inc_zone_page_state(newpage, NR_ZSPAGES);
+ zsdesc_get(new_zsdesc);
+ if (zsdesc_zone(new_zsdesc) != zsdesc_zone(zsdesc)) {
+ zsdesc_dec_zone_page_state(zsdesc);
+ zsdesc_inc_zone_page_state(new_zsdesc);
}

- reset_zsdesc(page_zsdesc(page));
- put_page(page);
+ reset_zsdesc(zsdesc);
+ zsdesc_put(zsdesc);

return MIGRATEPAGE_SUCCESS;
}
@@ -2027,10 +2040,11 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
static void zs_page_putback(struct page *page)
{
struct zspage *zspage;
+ struct zsdesc *zsdesc = page_zsdesc(page);

- VM_BUG_ON_PAGE(!PageIsolated(page), page);
+ VM_BUG_ON_PAGE(!zsdesc_is_isolated(zsdesc), zsdesc_page(zsdesc));

- zspage = get_zspage(page);
+ zspage = get_zspage(zsdesc_page(zsdesc));
migrate_write_lock(zspage);
dec_zspage_isolation(zspage);
migrate_write_unlock(zspage);
--
2.41.0


2023-07-20 07:31:41

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

On (23/07/13 13:20), Hyeonggon Yoo wrote:
> The purpose of this series is to define own memory descriptor for zsmalloc,
> instead of re-using various fields of struct page. This is a part of the
> effort to reduce the size of struct page to unsigned long and enable
> dynamic allocation of memory descriptors.
>
> While [1] outlines this ultimate objective, the current use of struct page
> is highly dependent on its definition, making it challenging to separately
> allocate memory descriptors.

I glanced through the series and it all looks pretty straight forward to
me. I'll have a closer look. And we definitely need Minchan to ACK it.

> Therefore, this series introduces new descriptor for zsmalloc, called
> zsdesc. It overlays struct page for now, but will eventually be allocated
> independently in the future.

So I don't expect zsmalloc memory usage increase. On one hand for each
physical page that zspage consists of we will allocate zsdesc (extra bytes),
but at the same time struct page gets slimmer. So we should be even, or
am I wrong?

2023-07-20 08:03:49

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [RFC PATCH v2 16/21] mm/zsmalloc: convert location_to_obj() to use zsdesc

On (23/07/13 13:20), Hyeonggon Yoo wrote:
> /**
> - * location_to_obj - get obj value encoded from (<page>, <obj_idx>)
> - * @page: page object resides in zspage
> + * location_to_obj - get obj value encoded from (<zsdesc>, <obj_idx>)
> + * @zsdesc object resides in zspage

@zsdesc:

> * @obj_idx: object index
> */

2023-07-20 08:12:28

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [RFC PATCH v2 01/21] mm/zsmalloc: create new struct zsdesc

On (23/07/13 13:20), Hyeonggon Yoo wrote:
> /*
> @@ -264,6 +247,52 @@ struct mapping_area {
> enum zs_mapmode vm_mm; /* mapping mode */
> };
>

struct zspage already has a zsdesc member at this point, so I'd prefer
to move struct zsdesc definition before struct zspage.

> +/*
> + * struct zsdesc - memory descriptor for zsmalloc memory
> + *
> + * This struct overlays struct page for now. Do not modify without a
> + * good understanding of the issues.
> + *
> + * Usage of struct page flags on zsdesc:
> + * PG_private: identifies the first component zsdesc
> + */
> +struct zsdesc {
> + unsigned long flags;
> +
> + /*
> + * Although not used by zsmalloc, this field is used by non-LRU page migration
> + * code. Leave it unused.
> + */
> + struct list_head lru;
> +
> + /* Always points to zsmalloc_mops with PAGE_MAPPING_MOVABLE set */
> + struct movable_operations *mops;
> +
> + union {
> + /* linked list of all zsdescs in a zspage */
> + struct zsdesc *next;
> + /* for huge zspages */
> + unsigned long handle;
> + };
> + struct zspage *zspage;
> + unsigned int first_obj_offset;
> + unsigned int _refcount;
> +};
> +
> +#define ZSDESC_MATCH(pg, zs) \
> + static_assert(offsetof(struct page, pg) == offsetof(struct zsdesc, zs))
> +
> +ZSDESC_MATCH(flags, flags);
> +ZSDESC_MATCH(lru, lru);
> +ZSDESC_MATCH(mapping, mops);
> +ZSDESC_MATCH(index, next);
> +ZSDESC_MATCH(index, handle);
> +ZSDESC_MATCH(private, zspage);
> +ZSDESC_MATCH(page_type, first_obj_offset);
> +ZSDESC_MATCH(_refcount, _refcount);
> +#undef ZSDESC_MATCH
> +static_assert(sizeof(struct zsdesc) <= sizeof(struct page));
> +
> /* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
> static void SetZsHugePage(struct zspage *zspage)
> {
> --
> 2.41.0
>

2023-07-20 08:58:02

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

On Thu, Jul 20, 2023 at 12:18 AM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (23/07/13 13:20), Hyeonggon Yoo wrote:
> > The purpose of this series is to define own memory descriptor for zsmalloc,
> > instead of re-using various fields of struct page. This is a part of the
> > effort to reduce the size of struct page to unsigned long and enable
> > dynamic allocation of memory descriptors.
> >
> > While [1] outlines this ultimate objective, the current use of struct page
> > is highly dependent on its definition, making it challenging to separately
> > allocate memory descriptors.
>
> I glanced through the series and it all looks pretty straight forward to
> me. I'll have a closer look. And we definitely need Minchan to ACK it.
>
> > Therefore, this series introduces new descriptor for zsmalloc, called
> > zsdesc. It overlays struct page for now, but will eventually be allocated
> > independently in the future.
>
> So I don't expect zsmalloc memory usage increase. On one hand for each
> physical page that zspage consists of we will allocate zsdesc (extra bytes),
> but at the same time struct page gets slimmer. So we should be even, or
> am I wrong?

Well, it depends. Here is my understanding (which may be completely wrong):

The end goal would be to have an 8-byte memdesc for each order-0 page,
and then allocate a specialized struct per-folio according to the use
case. In this case, we would have a memdesc and a zsdesc for each
order-0 page. If sizeof(zsdesc) is 64 bytes (on 64-bit), then it's a
net loss. The savings only start kicking in with higher order folios.
As of now, zsmalloc only uses order-0 pages as far as I can tell, so
the usage would increase if I understand correctly.

It seems to me though the sizeof(zsdesc) is actually 56 bytes (on
64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the
current size of struct page. If that's true, then there is no loss,
and there's potential gain if we start using higher order folios in
zsmalloc in the future.

(That is of course unless we want to maintain cache line alignment for
the zsdescs, then we might end up using 64 bytes anyway).

2023-07-20 12:10:04

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

On Thu, Jul 20, 2023 at 4:55 PM Yosry Ahmed <[email protected]> wrote:
>
> On Thu, Jul 20, 2023 at 12:18 AM Sergey Senozhatsky
> <[email protected]> wrote:
> >
> > On (23/07/13 13:20), Hyeonggon Yoo wrote:
> > > The purpose of this series is to define own memory descriptor for zsmalloc,
> > > instead of re-using various fields of struct page. This is a part of the
> > > effort to reduce the size of struct page to unsigned long and enable
> > > dynamic allocation of memory descriptors.
> > >
> > > While [1] outlines this ultimate objective, the current use of struct page
> > > is highly dependent on its definition, making it challenging to separately
> > > allocate memory descriptors.
> >
> > I glanced through the series and it all looks pretty straight forward to
> > me. I'll have a closer look. And we definitely need Minchan to ACK it.
> >
> > > Therefore, this series introduces new descriptor for zsmalloc, called
> > > zsdesc. It overlays struct page for now, but will eventually be allocated
> > > independently in the future.
> >
> > So I don't expect zsmalloc memory usage increase. On one hand for each
> > physical page that zspage consists of we will allocate zsdesc (extra bytes),
> > but at the same time struct page gets slimmer. So we should be even, or
> > am I wrong?
>
> Well, it depends. Here is my understanding (which may be completely wrong):
>
> The end goal would be to have an 8-byte memdesc for each order-0 page,
> and then allocate a specialized struct per-folio according to the use
> case. In this case, we would have a memdesc and a zsdesc for each
> order-0 page. If sizeof(zsdesc) is 64 bytes (on 64-bit), then it's a
> net loss. The savings only start kicking in with higher order folios.
> As of now, zsmalloc only uses order-0 pages as far as I can tell, so
> the usage would increase if I understand correctly.

I partially agree with you that the point of memdesc stuff is
allocating a use-case specific
descriptor per folio. but I thought the primary gain from memdesc was
from anon and file pages
(where high order pages are more usable), rather than zsmalloc.

And I believe enabling a memory descriptor per folio would be
impossible (or inefficient)
if zsmalloc and other subsystems are using struct page in the current
way (or please tell me I'm wrong?)

So I expect the primary gain would be from high-order anon/file folios,
while this series is a prerequisite for them to work sanely.

> It seems to me though the sizeof(zsdesc) is actually 56 bytes (on
> 64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the
> current size of struct page. If that's true, then there is no loss,

Yeah, zsdesc would be 56 bytes on 64 bit CPUs as memcg_data field is
not used in zsmalloc.
More fields in the current struct page might not be needed in the
future, although it's hard to say at the moment.
but it's not a loss.

> and there's potential gain if we start using higher order folios in
> zsmalloc in the future.

AFAICS zsmalloc should work even when the system memory is fragmented,
so we may implement fallback allocation (as currently discussed in
large anon folios thread).

It might work, but IMHO the purpose of this series is to enable memdesc
for large anon/file folios, rather than seeing a large gain in zsmalloc itself.
(But even in zsmalloc, it's not a loss)

> (That is of course unless we want to maintain cache line alignment for
> the zsdescs, then we might end up using 64 bytes anyway).

we already don't require cache line alignment for struct page. the current
alignment requirement is due to SLUB's cmpxchg128 operation, not cache
line alignment.

I might be wrong in some aspects, so please tell me if I am.
And thank you and Sergey for taking a look at this!
--
Hyeonggon

2023-07-20 19:10:10

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

On Thu, Jul 20, 2023 at 4:34 AM Hyeonggon Yoo <[email protected]> wrote:
>
> On Thu, Jul 20, 2023 at 4:55 PM Yosry Ahmed <[email protected]> wrote:
> >
> > On Thu, Jul 20, 2023 at 12:18 AM Sergey Senozhatsky
> > <[email protected]> wrote:
> > >
> > > On (23/07/13 13:20), Hyeonggon Yoo wrote:
> > > > The purpose of this series is to define own memory descriptor for zsmalloc,
> > > > instead of re-using various fields of struct page. This is a part of the
> > > > effort to reduce the size of struct page to unsigned long and enable
> > > > dynamic allocation of memory descriptors.
> > > >
> > > > While [1] outlines this ultimate objective, the current use of struct page
> > > > is highly dependent on its definition, making it challenging to separately
> > > > allocate memory descriptors.
> > >
> > > I glanced through the series and it all looks pretty straight forward to
> > > me. I'll have a closer look. And we definitely need Minchan to ACK it.
> > >
> > > > Therefore, this series introduces new descriptor for zsmalloc, called
> > > > zsdesc. It overlays struct page for now, but will eventually be allocated
> > > > independently in the future.
> > >
> > > So I don't expect zsmalloc memory usage increase. On one hand for each
> > > physical page that zspage consists of we will allocate zsdesc (extra bytes),
> > > but at the same time struct page gets slimmer. So we should be even, or
> > > am I wrong?
> >
> > Well, it depends. Here is my understanding (which may be completely wrong):
> >
> > The end goal would be to have an 8-byte memdesc for each order-0 page,
> > and then allocate a specialized struct per-folio according to the use
> > case. In this case, we would have a memdesc and a zsdesc for each
> > order-0 page. If sizeof(zsdesc) is 64 bytes (on 64-bit), then it's a
> > net loss. The savings only start kicking in with higher order folios.
> > As of now, zsmalloc only uses order-0 pages as far as I can tell, so
> > the usage would increase if I understand correctly.
>
> I partially agree with you that the point of memdesc stuff is
> allocating a use-case specific
> descriptor per folio. but I thought the primary gain from memdesc was
> from anon and file pages
> (where high order pages are more usable), rather than zsmalloc.
>
> And I believe enabling a memory descriptor per folio would be
> impossible (or inefficient)
> if zsmalloc and other subsystems are using struct page in the current
> way (or please tell me I'm wrong?)
>
> So I expect the primary gain would be from high-order anon/file folios,
> while this series is a prerequisite for them to work sanely.

Right, I agree with that, sorry if I wasn't clear. I meant that
generally speaking, we see gains from memdesc from higher order
folios, so for zsmalloc specifically we probably won't see seeing any
savings, and *might* see some extra usage (which I might be wrong
about, see below).

>
> > It seems to me though the sizeof(zsdesc) is actually 56 bytes (on
> > 64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the
> > current size of struct page. If that's true, then there is no loss,
>
> Yeah, zsdesc would be 56 bytes on 64 bit CPUs as memcg_data field is
> not used in zsmalloc.
> More fields in the current struct page might not be needed in the
> future, although it's hard to say at the moment.
> but it's not a loss.

Is page->memcg_data something that we can drop? Aren't there code
paths that will check page->memcg_data even for kernel pages (e.g.
__folio_put() -> __folio_put_small() -> mem_cgroup_uncharge() ) ?

>
> > and there's potential gain if we start using higher order folios in
> > zsmalloc in the future.
>
> AFAICS zsmalloc should work even when the system memory is fragmented,
> so we may implement fallback allocation (as currently discussed in
> large anon folios thread).

Of course, any usage of higher order folios in zsmalloc must have a
fallback logic, although it might be simpler for zsmalloc than anon
folios. I agree that's off topic here.

>
> It might work, but IMHO the purpose of this series is to enable memdesc
> for large anon/file folios, rather than seeing a large gain in zsmalloc itself.
> (But even in zsmalloc, it's not a loss)
>
> > (That is of course unless we want to maintain cache line alignment for
> > the zsdescs, then we might end up using 64 bytes anyway).
>
> we already don't require cache line alignment for struct page. the current
> alignment requirement is due to SLUB's cmpxchg128 operation, not cache
> line alignment.

I thought we want struct page to be cache line aligned (to avoid
having to fetch two cache lines for one struct page), but I can easily
be wrong.

>
> I might be wrong in some aspects, so please tell me if I am.
> And thank you and Sergey for taking a look at this!

Thanks to you for doing the work!

> --
> Hyeonggon

2023-07-20 21:51:21

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

<snip>
>
> > > > It seems to me though the sizeof(zsdesc) is actually 56 bytes (on
> > > > 64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the
> > > > current size of struct page. If that's true, then there is no loss,
> > >
> > > Yeah, zsdesc would be 56 bytes on 64 bit CPUs as memcg_data field is
> > > not used in zsmalloc.
> > > More fields in the current struct page might not be needed in the
> > > future, although it's hard to say at the moment.
> > > but it's not a loss.
> >
> > Is page->memcg_data something that we can drop? Aren't there code
> > paths that will check page->memcg_data even for kernel pages (e.g.
> > __folio_put() -> __folio_put_small() -> mem_cgroup_uncharge() ) ?
>
> zsmalloc pages are not accounted for via __GFP_ACCOUNT,

Yeah, but the code in the free path above will check page->memcg_data
nonetheless to check if it is charged. I think to drop memcg_data we
need to enlighten the code that some pages do not even have memcg_data
at all, no?

> and IIUC the current implementation of zswap memcg charging does not
> use memcg_data
> either - so I think it can be dropped.

My question is more about the generic mm code expecting to see
page->memcg_data in every page, even if it is not actually used
(zero).

>
> I think we don't want to increase memdesc to 16 bytes by adding memcg_data.
> It should be in use-case specific descriptors if it can be charged to memcg?
>
<snip>

2023-07-20 22:18:36

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

On Thu, Jul 20, 2023 at 2:52 PM Hyeonggon Yoo <[email protected]> wrote:
>
> On Fri, Jul 21, 2023 at 6:39 AM Yosry Ahmed <[email protected]> wrote:
> >
> > <snip>
> > >
> > > > > > It seems to me though the sizeof(zsdesc) is actually 56 bytes (on
> > > > > > 64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the
> > > > > > current size of struct page. If that's true, then there is no loss,
> > > > >
> > > > > Yeah, zsdesc would be 56 bytes on 64 bit CPUs as memcg_data field is
> > > > > not used in zsmalloc.
> > > > > More fields in the current struct page might not be needed in the
> > > > > future, although it's hard to say at the moment.
> > > > > but it's not a loss.
> > > >
> > > > Is page->memcg_data something that we can drop? Aren't there code
> > > > paths that will check page->memcg_data even for kernel pages (e.g.
> > > > __folio_put() -> __folio_put_small() -> mem_cgroup_uncharge() ) ?
> > >
> > > zsmalloc pages are not accounted for via __GFP_ACCOUNT,
> >
> > Yeah, but the code in the free path above will check page->memcg_data
> > nonetheless to check if it is charged.
>
> Right.
>
> > I think to drop memcg_data we need to enlighten the code that some pages
> > do not even have memcg_data at all
>
> I agree with you. It should be one of the milestones for all of this to work.
> It won't be complicated for the code to be aware of it, because there will be
> a freeing (and uncharging if need) routine per type of descriptors.

Right.

For this patch series, do we need to maintain memcg_data in zsdec to
avoid any subtle problems?

2023-07-20 22:19:44

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

On Fri, Jul 21, 2023 at 3:31 AM Yosry Ahmed <[email protected]> wrote:
>
> On Thu, Jul 20, 2023 at 4:34 AM Hyeonggon Yoo <[email protected]> wrote:
> >
> > On Thu, Jul 20, 2023 at 4:55 PM Yosry Ahmed <[email protected]> wrote:
> > >
> > > On Thu, Jul 20, 2023 at 12:18 AM Sergey Senozhatsky
> > > <[email protected]> wrote:
> > > >
> > > > On (23/07/13 13:20), Hyeonggon Yoo wrote:
> > > > > The purpose of this series is to define own memory descriptor for zsmalloc,
> > > > > instead of re-using various fields of struct page. This is a part of the
> > > > > effort to reduce the size of struct page to unsigned long and enable
> > > > > dynamic allocation of memory descriptors.
> > > > >
> > > > > While [1] outlines this ultimate objective, the current use of struct page
> > > > > is highly dependent on its definition, making it challenging to separately
> > > > > allocate memory descriptors.
> > > >
> > > > I glanced through the series and it all looks pretty straight forward to
> > > > me. I'll have a closer look. And we definitely need Minchan to ACK it.
> > > >
> > > > > Therefore, this series introduces new descriptor for zsmalloc, called
> > > > > zsdesc. It overlays struct page for now, but will eventually be allocated
> > > > > independently in the future.
> > > >
> > > > So I don't expect zsmalloc memory usage increase. On one hand for each
> > > > physical page that zspage consists of we will allocate zsdesc (extra bytes),
> > > > but at the same time struct page gets slimmer. So we should be even, or
> > > > am I wrong?
> > >
> > > Well, it depends. Here is my understanding (which may be completely wrong):
> > >
> > > The end goal would be to have an 8-byte memdesc for each order-0 page,
> > > and then allocate a specialized struct per-folio according to the use
> > > case. In this case, we would have a memdesc and a zsdesc for each
> > > order-0 page. If sizeof(zsdesc) is 64 bytes (on 64-bit), then it's a
> > > net loss. The savings only start kicking in with higher order folios.
> > > As of now, zsmalloc only uses order-0 pages as far as I can tell, so
> > > the usage would increase if I understand correctly.
> >
> > I partially agree with you that the point of memdesc stuff is
> > allocating a use-case specific
> > descriptor per folio. but I thought the primary gain from memdesc was
> > from anon and file pages
> > (where high order pages are more usable), rather than zsmalloc.
> >
> > And I believe enabling a memory descriptor per folio would be
> > impossible (or inefficient)
> > if zsmalloc and other subsystems are using struct page in the current
> > way (or please tell me I'm wrong?)
> >
> > So I expect the primary gain would be from high-order anon/file folios,
> > while this series is a prerequisite for them to work sanely.
>
> Right, I agree with that, sorry if I wasn't clear. I meant that
> generally speaking, we see gains from memdesc from higher order
> folios, so for zsmalloc specifically we probably won't see seeing any
> savings, and *might* see some extra usage (which I might be wrong
> about, see below).

Yeah, even if I said, "oh, we don't necessarily need to use extra
memory for zsdesc"
below, a slight increase wouldn't hurt too much in that perspective,
because there
will be savings from other users of memdesc.

> > > It seems to me though the sizeof(zsdesc) is actually 56 bytes (on
> > > 64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the
> > > current size of struct page. If that's true, then there is no loss,
> >
> > Yeah, zsdesc would be 56 bytes on 64 bit CPUs as memcg_data field is
> > not used in zsmalloc.
> > More fields in the current struct page might not be needed in the
> > future, although it's hard to say at the moment.
> > but it's not a loss.
>
> Is page->memcg_data something that we can drop? Aren't there code
> paths that will check page->memcg_data even for kernel pages (e.g.
> __folio_put() -> __folio_put_small() -> mem_cgroup_uncharge() ) ?

zsmalloc pages are not accounted for via __GFP_ACCOUNT,
and IIUC the current implementation of zswap memcg charging does not
use memcg_data
either - so I think it can be dropped.

I think we don't want to increase memdesc to 16 bytes by adding memcg_data.
It should be in use-case specific descriptors if it can be charged to memcg?

> > > and there's potential gain if we start using higher order folios in
> > > zsmalloc in the future.
> >
> > AFAICS zsmalloc should work even when the system memory is fragmented,
> > so we may implement fallback allocation (as currently discussed in
> > large anon folios thread).
>
> Of course, any usage of higher order folios in zsmalloc must have a
> fallback logic, although it might be simpler for zsmalloc than anon
> folios. I agree that's off topic here.
> > It might work, but IMHO the purpose of this series is to enable memdesc
> > for large anon/file folios, rather than seeing a large gain in zsmalloc itself.
> > (But even in zsmalloc, it's not a loss)
> >
> > > (That is of course unless we want to maintain cache line alignment for
> > > the zsdescs, then we might end up using 64 bytes anyway).
> >
> > we already don't require cache line alignment for struct page. the current
> > alignment requirement is due to SLUB's cmpxchg128 operation, not cache
> > line alignment.
>
> I thought we want struct page to be cache line aligned (to avoid
> having to fetch two cache lines for one struct page), but I can easily
> be wrong.

Right. I admit that even if it's not required to be cache line
aligned, it is 64 bytes
in commonly used configurations. and changing it could affect some workloads.

But I think for zsdesc it would be better not to align by cache line
size, before
observing degradations due to alignment. By the time zsmalloc is intensively
used, it shouldn't be a huge issue.

> > I might be wrong in some aspects, so please tell me if I am.
> > And thank you and Sergey for taking a look at this!
>
> Thanks to you for doing the work!

No problem! :)

2023-07-20 22:20:23

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page

On Fri, Jul 21, 2023 at 6:39 AM Yosry Ahmed <[email protected]> wrote:
>
> <snip>
> >
> > > > > It seems to me though the sizeof(zsdesc) is actually 56 bytes (on
> > > > > 64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the
> > > > > current size of struct page. If that's true, then there is no loss,
> > > >
> > > > Yeah, zsdesc would be 56 bytes on 64 bit CPUs as memcg_data field is
> > > > not used in zsmalloc.
> > > > More fields in the current struct page might not be needed in the
> > > > future, although it's hard to say at the moment.
> > > > but it's not a loss.
> > >
> > > Is page->memcg_data something that we can drop? Aren't there code
> > > paths that will check page->memcg_data even for kernel pages (e.g.
> > > __folio_put() -> __folio_put_small() -> mem_cgroup_uncharge() ) ?
> >
> > zsmalloc pages are not accounted for via __GFP_ACCOUNT,
>
> Yeah, but the code in the free path above will check page->memcg_data
> nonetheless to check if it is charged.

Right.

> I think to drop memcg_data we need to enlighten the code that some pages
> do not even have memcg_data at all

I agree with you. It should be one of the milestones for all of this to work.
It won't be complicated for the code to be aware of it, because there will be
a freeing (and uncharging if need) routine per type of descriptors.