2023-01-09 03:46:49

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

Hi,

This turns hard coded limit on maximum number of physical
pages per-zspage into a config option. It also increases the default
limit from 4 to 8.

Sergey Senozhatsky (4):
zsmalloc: rework zspage chain size selection
zsmalloc: skip chain size calculation for pow_of_2 classes
zsmalloc: make zspage chain size configurable
zsmalloc: set default zspage chain size to 8

Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
mm/Kconfig | 19 ++++
mm/zsmalloc.c | 72 +++++----------
3 files changed, 212 insertions(+), 47 deletions(-)

--
2.39.0.314.g84b9a713c41-goog


2023-01-09 03:48:01

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCHv2 1/4] zsmalloc: rework zspage chain size selection

Computers are bad at division. We currently decide the best
zspage chain size (max number of physical pages per-zspage)
by looking at a `used percentage` value. This is not enough
as we lose precision during usage percentage calculations
For example, let's look at size class 208:

pages per zspage wasted bytes used%
1 144 96
2 80 99
3 16 99
4 160 99

Current algorithm will select 2 page per zspage configuration,
as it's the first one to reach 99%. However, 3 pages per zspage
waste less memory.

Change algorithm and select zspage configuration that has
lowest wasted value.

Signed-off-by: Sergey Senozhatsky <[email protected]>
---
mm/zsmalloc.c | 56 +++++++++++++++++----------------------------------
1 file changed, 19 insertions(+), 37 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 6aafacd664fc..effe10fe76e9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -802,42 +802,6 @@ static enum fullness_group fix_fullness_group(struct size_class *class,
return newfg;
}

-/*
- * We have to decide on how many pages to link together
- * to form a zspage for each size class. This is important
- * to reduce wastage due to unusable space left at end of
- * each zspage which is given as:
- * wastage = Zp % class_size
- * usage = Zp - wastage
- * where Zp = zspage size = k * PAGE_SIZE where k = 1, 2, ...
- *
- * For example, for size class of 3/8 * PAGE_SIZE, we should
- * link together 3 PAGE_SIZE sized pages to form a zspage
- * since then we can perfectly fit in 8 such objects.
- */
-static int get_pages_per_zspage(int class_size)
-{
- int i, max_usedpc = 0;
- /* zspage order which gives maximum used size per KB */
- int max_usedpc_order = 1;
-
- for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
- int zspage_size;
- int waste, usedpc;
-
- zspage_size = i * PAGE_SIZE;
- waste = zspage_size % class_size;
- usedpc = (zspage_size - waste) * 100 / zspage_size;
-
- if (usedpc > max_usedpc) {
- max_usedpc = usedpc;
- max_usedpc_order = i;
- }
- }
-
- return max_usedpc_order;
-}
-
static struct zspage *get_zspage(struct page *page)
{
struct zspage *zspage = (struct zspage *)page_private(page);
@@ -2318,6 +2282,24 @@ static int zs_register_shrinker(struct zs_pool *pool)
pool->name);
}

+static int calculate_zspage_chain_size(int class_size)
+{
+ int i, min_waste = INT_MAX;
+ int chain_size = 1;
+
+ for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
+ int waste;
+
+ waste = (i * PAGE_SIZE) % class_size;
+ if (waste < min_waste) {
+ min_waste = waste;
+ chain_size = i;
+ }
+ }
+
+ return chain_size;
+}
+
/**
* zs_create_pool - Creates an allocation pool to work from.
* @name: pool name to be created
@@ -2362,7 +2344,7 @@ struct zs_pool *zs_create_pool(const char *name)
size = ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA;
if (size > ZS_MAX_ALLOC_SIZE)
size = ZS_MAX_ALLOC_SIZE;
- pages_per_zspage = get_pages_per_zspage(size);
+ pages_per_zspage = calculate_zspage_chain_size(size);
objs_per_zspage = pages_per_zspage * PAGE_SIZE / size;

/*
--
2.39.0.314.g84b9a713c41-goog

2023-01-09 04:06:20

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCHv2 2/4] zsmalloc: skip chain size calculation for pow_of_2 classes

If a class size is power of 2 then it wastes no memory
and the best configuration is 1 physical page per-zspage.

Signed-off-by: Sergey Senozhatsky <[email protected]>
---
mm/zsmalloc.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index effe10fe76e9..ee8431784998 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -2287,6 +2287,9 @@ static int calculate_zspage_chain_size(int class_size)
int i, min_waste = INT_MAX;
int chain_size = 1;

+ if (is_power_of_2(class_size))
+ return chain_size;
+
for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
int waste;

--
2.39.0.314.g84b9a713c41-goog

2023-01-09 04:06:53

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCHv2 3/4] zsmalloc: make zspage chain size configurable

Remove hard coded limit on the maximum number of physical
pages per-zspage.

This will allow tuning of zsmalloc pool as zspage chain
size changes `pages per-zspage` and `objects per-zspage`
characteristics of size classes which also affects size
classes clustering (the way size classes are merged).

Signed-off-by: Sergey Senozhatsky <[email protected]>
---
Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
mm/Kconfig | 19 ++++
mm/zsmalloc.c | 15 +--
3 files changed, 191 insertions(+), 11 deletions(-)

diff --git a/Documentation/mm/zsmalloc.rst b/Documentation/mm/zsmalloc.rst
index 6e79893d6132..40323c9b39d8 100644
--- a/Documentation/mm/zsmalloc.rst
+++ b/Documentation/mm/zsmalloc.rst
@@ -80,3 +80,171 @@ Similarly, we assign zspage to:
* ZS_ALMOST_FULL when n > N / f
* ZS_EMPTY when n == 0
* ZS_FULL when n == N
+
+
+Internals
+=========
+
+zsmalloc has 255 size classes, each of which can hold a number of zspages.
+Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
+The optimal zspage chain size for each size class is calculated during the
+creation of the zsmalloc pool (see calculate_zspage_chain_size()).
+
+As an optimization, zsmalloc merges size classes that have similar
+characteristics in terms of the number of pages per zspage and the number
+of objects that each zspage can store.
+
+For instance, consider the following size classes:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ 94 1536 0 0 0 0 0 3 0
+ 100 1632 0 0 0 0 0 2 0
+ ...
+
+
+Size classes #95-99 are merged with size class #100. This means that when we
+need to store an object of size, say, 1568 bytes, we end up using size class
+#100 instead of size class #96. Size class #100 is meant for objects of size
+1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
+
+Size class #100 consists of zspages with 2 physical pages each, which can
+hold a total of 5 objects. If we need to store 13 objects of size 1568, we
+end up allocating three zspages, or 6 physical pages.
+
+However, if we take a closer look at size class #96 (which is meant for
+objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
+find that the most optimal zspage configuration for this class is a chain
+of 5 physical pages:::
+
+ pages per zspage wasted bytes used%
+ 1 960 76
+ 2 352 95
+ 3 1312 89
+ 4 704 95
+ 5 96 99
+
+This means that a class #96 configuration with 5 physical pages can store 13
+objects of size 1568 in a single zspage, using a total of 5 physical pages.
+This is more efficient than the class #100 configuration, which would use 6
+physical pages to store the same number of objects.
+
+As the zspage chain size for class #96 increases, its key characteristics
+such as pages per-zspage and objects per-zspage also change. This leads to
+dewer class mergers, resulting in a more compact grouping of classes, which
+reduces memory wastage.
+
+Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ 202 3264 0 0 0 0 0 4 0
+ 254 4096 0 0 0 0 0 1 0
+ ...
+
+Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
+per zspage. Any object larger than 3264 bytes is considered huge and belongs
+to size class #254, which stores each object in its own physical page (objects
+in huge classes do not share pages).
+
+Increasing the size of the chain of zspages also results in a higher watermark
+for the huge size class and fewer huge classes overall. This allows for more
+efficient storage of large objects.
+
+For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ 202 3264 0 0 0 0 0 4 0
+ 211 3408 0 0 0 0 0 5 0
+ 217 3504 0 0 0 0 0 6 0
+ 222 3584 0 0 0 0 0 7 0
+ 225 3632 0 0 0 0 0 8 0
+ 254 4096 0 0 0 0 0 1 0
+ ...
+
+For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ 202 3264 0 0 0 0 0 4 0
+ 206 3328 0 0 0 0 0 13 0
+ 207 3344 0 0 0 0 0 9 0
+ 208 3360 0 0 0 0 0 14 0
+ 211 3408 0 0 0 0 0 5 0
+ 212 3424 0 0 0 0 0 16 0
+ 214 3456 0 0 0 0 0 11 0
+ 217 3504 0 0 0 0 0 6 0
+ 219 3536 0 0 0 0 0 13 0
+ 222 3584 0 0 0 0 0 7 0
+ 223 3600 0 0 0 0 0 15 0
+ 225 3632 0 0 0 0 0 8 0
+ 228 3680 0 0 0 0 0 9 0
+ 230 3712 0 0 0 0 0 10 0
+ 232 3744 0 0 0 0 0 11 0
+ 234 3776 0 0 0 0 0 12 0
+ 235 3792 0 0 0 0 0 13 0
+ 236 3808 0 0 0 0 0 14 0
+ 238 3840 0 0 0 0 0 15 0
+ 254 4096 0 0 0 0 0 1 0
+ ...
+
+Overall the combined zspage chain size effect on zsmalloc pool configuration:::
+
+ pages per zspage number of size classes (clusters) huge size class watermark
+ 4 69 3264
+ 5 86 3408
+ 6 93 3504
+ 7 112 3584
+ 8 123 3632
+ 9 140 3680
+ 10 143 3712
+ 11 159 3744
+ 12 164 3776
+ 13 180 3792
+ 14 183 3808
+ 15 188 3840
+ 16 191 3840
+
+
+A synthetic test
+----------------
+
+zram as a build artifacts storage (Linux kernel compilation).
+
+* `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
+
+ zsmalloc classes stats:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ Total 13 51 413836 412973 159955 3
+
+ zram mm_stat:::
+
+ 1691783168 628083717 655175680 0 655175680 60 0 34048 34049
+
+
+* `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
+
+ zsmalloc classes stats:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ Total 18 87 414852 412978 156666 0
+
+ zram mm_stat:::
+
+ 1691803648 627793930 641703936 0 641703936 60 0 33591 33591
+
+Using larger zspage chains may result in using fewer physical pages, as seen
+in the example where the number of physical pages used decreased from 159955
+to 156666, at the same time maximum zsmalloc pool memory usage went down from
+655175680 to 641703936 bytes.
+
+However, this advantage may be offset by the potential for increased system
+memory pressure (as some zspages have larger chain sizes) in cases where there
+is heavy internal fragmentation and zspool compaction is unable to relocate
+objects and release zspages. In these cases, it is recommended to decrease
+the limit on the size of the zspage chains (as specified by the
+CONFIG_ZSMALLOC_CHAIN_SIZE option).
diff --git a/mm/Kconfig b/mm/Kconfig
index 4eb4afa53e6d..5b2863de4be5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -191,6 +191,25 @@ config ZSMALLOC_STAT
information to userspace via debugfs.
If unsure, say N.

+config ZSMALLOC_CHAIN_SIZE
+ int "Maximum number of physical pages per-zspage"
+ default 4
+ range 1 16
+ depends on ZSMALLOC
+ help
+ This option sets the upper limit on the number of physical pages
+ that a zmalloc page (zspage) can consist of. The optimal zspage
+ chain size is calculated for each size class during the
+ initialization of the pool.
+
+ Changing this option can alter the characteristics of size classes,
+ such as the number of pages per zspage and the number of objects
+ per zspage. This can also result in different configurations of
+ the pool, as zsmalloc merges size classes with similar
+ characteristics.
+
+ For more information, see zsmalloc documentation.
+
menu "SLAB allocator options"

choice
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ee8431784998..77a8746a453d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -73,13 +73,6 @@
*/
#define ZS_ALIGN 8

-/*
- * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
- * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
- */
-#define ZS_MAX_ZSPAGE_ORDER 2
-#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
-
#define ZS_HANDLE_SIZE (sizeof(unsigned long))

/*
@@ -126,7 +119,7 @@
#define MAX(a, b) ((a) >= (b) ? (a) : (b))
/* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
#define ZS_MIN_ALLOC_SIZE \
- MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+ MAX(32, (CONFIG_ZSMALLOC_CHAIN_SIZE << PAGE_SHIFT >> OBJ_INDEX_BITS))
/* each chunk includes extra space to keep handle */
#define ZS_MAX_ALLOC_SIZE PAGE_SIZE

@@ -1078,7 +1071,7 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
gfp_t gfp)
{
int i;
- struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
+ struct page *pages[CONFIG_ZSMALLOC_CHAIN_SIZE];
struct zspage *zspage = cache_alloc_zspage(pool, gfp);

if (!zspage)
@@ -1910,7 +1903,7 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
struct page *newpage, struct page *oldpage)
{
struct page *page;
- struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE] = {NULL, };
+ struct page *pages[CONFIG_ZSMALLOC_CHAIN_SIZE] = {NULL, };
int idx = 0;

page = get_first_page(zspage);
@@ -2290,7 +2283,7 @@ static int calculate_zspage_chain_size(int class_size)
if (is_power_of_2(class_size))
return chain_size;

- for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
+ for (i = 1; i <= CONFIG_ZSMALLOC_CHAIN_SIZE; i++) {
int waste;

waste = (i * PAGE_SIZE) % class_size;
--
2.39.0.314.g84b9a713c41-goog

2023-01-09 04:27:16

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8

This changes key characteristics (pages per-zspage and objects
per-zspage) of a number of size classes which in results in
different pool configuration. With zspage chain size of 8 we
have more size clases clusters (123) and higher huge size class
watermark (3632 bytes).

Please read zsmalloc documentation for more details.

Signed-off-by: Sergey Senozhatsky <[email protected]>
---
mm/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 5b2863de4be5..d854a421821b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -193,7 +193,7 @@ config ZSMALLOC_STAT

config ZSMALLOC_CHAIN_SIZE
int "Maximum number of physical pages per-zspage"
- default 4
+ default 8
range 1 16
depends on ZSMALLOC
help
--
2.39.0.314.g84b9a713c41-goog

2023-01-12 07:31:47

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 3/4] zsmalloc: make zspage chain size configurable

On (23/01/09 12:38), Sergey Senozhatsky wrote:
> Remove hard coded limit on the maximum number of physical
> pages per-zspage.
>
> This will allow tuning of zsmalloc pool as zspage chain
> size changes `pages per-zspage` and `objects per-zspage`
> characteristics of size classes which also affects size
> classes clustering (the way size classes are merged).

Andrew I have small fixup patch (0day build bot failure on
parisc64). How would you prefer to handle this?

2023-01-13 17:59:43

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCHv2 1/4] zsmalloc: rework zspage chain size selection

On Mon, Jan 09, 2023 at 12:38:35PM +0900, Sergey Senozhatsky wrote:
> Computers are bad at division. We currently decide the best
> zspage chain size (max number of physical pages per-zspage)
> by looking at a `used percentage` value. This is not enough
> as we lose precision during usage percentage calculations
> For example, let's look at size class 208:
>
> pages per zspage wasted bytes used%
> 1 144 96
> 2 80 99
> 3 16 99
> 4 160 99
>
> Current algorithm will select 2 page per zspage configuration,
> as it's the first one to reach 99%. However, 3 pages per zspage
> waste less memory.
>
> Change algorithm and select zspage configuration that has
> lowest wasted value.
>
> Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>

2023-01-13 18:37:15

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCHv2 2/4] zsmalloc: skip chain size calculation for pow_of_2 classes

On Mon, Jan 09, 2023 at 12:38:36PM +0900, Sergey Senozhatsky wrote:
> If a class size is power of 2 then it wastes no memory
> and the best configuration is 1 physical page per-zspage.
>
> Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>

2023-01-13 19:11:13

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCHv2 3/4] zsmalloc: make zspage chain size configurable

On Mon, Jan 09, 2023 at 12:38:37PM +0900, Sergey Senozhatsky wrote:
> Remove hard coded limit on the maximum number of physical
> pages per-zspage.
>
> This will allow tuning of zsmalloc pool as zspage chain
> size changes `pages per-zspage` and `objects per-zspage`
> characteristics of size classes which also affects size
> classes clustering (the way size classes are merged).
>
> Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>

with addtional patch to fix UL in the thread.

2023-01-13 19:11:26

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8

On Mon, Jan 09, 2023 at 12:38:38PM +0900, Sergey Senozhatsky wrote:
> This changes key characteristics (pages per-zspage and objects
> per-zspage) of a number of size classes which in results in
> different pool configuration. With zspage chain size of 8 we
> have more size clases clusters (123) and higher huge size class
> watermark (3632 bytes).
>
> Please read zsmalloc documentation for more details.
>
> Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>

Thanks for great work, Sergey!

2023-01-13 21:08:26

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On 01/09/23 12:38, Sergey Senozhatsky wrote:
> Hi,
>
> This turns hard coded limit on maximum number of physical
> pages per-zspage into a config option. It also increases the default
> limit from 4 to 8.
>
> Sergey Senozhatsky (4):
> zsmalloc: rework zspage chain size selection
> zsmalloc: skip chain size calculation for pow_of_2 classes
> zsmalloc: make zspage chain size configurable
> zsmalloc: set default zspage chain size to 8
>
> Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
> mm/Kconfig | 19 ++++
> mm/zsmalloc.c | 72 +++++----------
> 3 files changed, 212 insertions(+), 47 deletions(-)

Hi Sergey,

The following BUG shows up after this series in linux-next. I can easily
recreate by doing the following:

# echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
where 'large_value' is a so big that there could never possibly be that
many 2MB huge pages in the system.

--
Mike Kravetz

[ 22.981684] ------------[ cut here ]------------
[ 22.982990] kernel BUG at mm/zsmalloc.c:1982!
[ 22.984204] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 22.985561] CPU: 0 PID: 41 Comm: kcompactd0 Not tainted 6.2.0-rc3+ #13
[ 22.987430] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
[ 22.989728] RIP: 0010:zs_page_migrate+0x43c/0x490
[ 22.991070] Code: c7 c6 c8 f6 21 82 e8 b3 73 f6 ff 0f 0b 0f 1f 44 00 00 e9 20 fd ff ff 0f 1f 44 00 00 e9 9e fd ff ff 48 83 ef 01 e9 6b fe ff ff <0f> 0b 48 8b 43 20 49 89 45 20 e9 ff fd ff ff 48 c7 c6 60 d3 1d 82
[ 22.995900] RSP: 0018:ffffc9000121fb20 EFLAGS: 00010246
[ 22.997364] RAX: 0000000000000002 RBX: ffffea0005b8b380 RCX: 0000000000000000
[ 22.999299] RDX: 0000000000000002 RSI: ffffffff81e28a62 RDI: 00000000ffffffff
[ 23.001236] RBP: ffff88816e2cf000 R08: ffffea0005b8b340 R09: 0000000000000008
[ 23.003181] R10: ffff88827fffafe0 R11: 0000000000280000 R12: ffff88816e2cf400
[ 23.005038] R13: ffffea0009e7f800 R14: ffff88817d783880 R15: ffff8881036a44d8
[ 23.006921] FS: 0000000000000000(0000) GS:ffff888277c00000(0000) knlGS:0000000000000000
[ 23.009116] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.010732] CR2: 00007f8b14e20550 CR3: 0000000103026004 CR4: 0000000000370ef0
[ 23.013978] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 23.015931] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 23.017892] Call Trace:
[ 23.018664] <TASK>
[ 23.019345] move_to_new_folio+0x14d/0x1f0
[ 23.020710] migrate_pages+0xe36/0x1240
[ 23.021895] ? __pfx_compaction_alloc+0x10/0x10
[ 23.023202] ? _raw_write_lock+0x13/0x30
[ 23.024335] ? __pfx_compaction_free+0x10/0x10
[ 23.025608] ? isolate_movable_page+0xff/0x250
[ 23.026880] compact_zone+0x9da/0xdf0
[ 23.027990] kcompactd_do_work+0x1d2/0x2c0
[ 23.029180] kcompactd+0x220/0x3e0
[ 23.030166] ? __pfx_autoremove_wake_function+0x10/0x10
[ 23.031612] ? __pfx_kcompactd+0x10/0x10
[ 23.032706] kthread+0xe6/0x110
[ 23.033648] ? __pfx_kthread+0x10/0x10
[ 23.034704] ret_from_fork+0x29/0x50
[ 23.035734] </TASK>
[ 23.036443] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device 9p netfs snd_pcm joydev 9pnet_virtio virtio_balloon snd_timer snd soundcore 9pnet virtio_blk virtio_net net_failover failover virtio_console crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring fuse
[ 23.049869] ---[ end trace 0000000000000000 ]---
[ 23.051154] RIP: 0010:zs_page_migrate+0x43c/0x490
[ 23.052466] Code: c7 c6 c8 f6 21 82 e8 b3 73 f6 ff 0f 0b 0f 1f 44 00 00 e9 20 fd ff ff 0f 1f 44 00 00 e9 9e fd ff ff 48 83 ef 01 e9 6b fe ff ff <0f> 0b 48 8b 43 20 49 89 45 20 e9 ff fd ff ff 48 c7 c6 60 d3 1d 82
[ 23.057413] RSP: 0018:ffffc9000121fb20 EFLAGS: 00010246
[ 23.058892] RAX: 0000000000000002 RBX: ffffea0005b8b380 RCX: 0000000000000000
[ 23.060867] RDX: 0000000000000002 RSI: ffffffff81e28a62 RDI: 00000000ffffffff
[ 23.062835] RBP: ffff88816e2cf000 R08: ffffea0005b8b340 R09: 0000000000000008
[ 23.064825] R10: ffff88827fffafe0 R11: 0000000000280000 R12: ffff88816e2cf400
[ 23.066806] R13: ffffea0009e7f800 R14: ffff88817d783880 R15: ffff8881036a44d8
[ 23.068738] FS: 0000000000000000(0000) GS:ffff888277c00000(0000) knlGS:0000000000000000
[ 23.071022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.072579] CR2: 00007f8b14e20550 CR3: 0000000103026004 CR4: 0000000000370ef0
[ 23.076152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 23.078172] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 23.080134] note: kcompactd0[41] exited with preempt_count 1

2023-01-14 05:42:46

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On (23/01/13 11:57), Mike Kravetz wrote:
> > This turns hard coded limit on maximum number of physical
> > pages per-zspage into a config option. It also increases the default
> > limit from 4 to 8.
> >
> > Sergey Senozhatsky (4):
> > zsmalloc: rework zspage chain size selection
> > zsmalloc: skip chain size calculation for pow_of_2 classes
> > zsmalloc: make zspage chain size configurable
> > zsmalloc: set default zspage chain size to 8
> >
> > Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
> > mm/Kconfig | 19 ++++
> > mm/zsmalloc.c | 72 +++++----------
> > 3 files changed, 212 insertions(+), 47 deletions(-)
>
> Hi Sergey,

Hi Mike,

> The following BUG shows up after this series in linux-next. I can easily
> recreate by doing the following:
>
> # echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> where 'large_value' is a so big that there could never possibly be that
> many 2MB huge pages in the system.

Hmm... Are we sure this is related? I really cannot see how chain-size
can have an effect on zspage ->isolate counter. What chain-size value
do you use? You don't see problems with chain size of 4?

2023-01-14 07:10:48

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On (23/01/13 11:57), Mike Kravetz wrote:
> The following BUG shows up after this series in linux-next. I can easily
> recreate by doing the following:
>
> # echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> where 'large_value' is a so big that there could never possibly be that
> many 2MB huge pages in the system.

Just to make sure. Do you have this patch applied?
https://lore.kernel.org/lkml/[email protected]

2023-01-14 08:09:46

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8

On (23/01/13 11:02), Minchan Kim wrote:
> Acked-by: Minchan Kim <[email protected]>
>
> Thanks for great work, Sergey!

Thank you!

2023-01-14 08:51:16

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On (23/01/13 11:57), Mike Kravetz wrote:
> Hi Sergey,
>
> The following BUG shows up after this series in linux-next. I can easily
> recreate by doing the following:
>
> # echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> where 'large_value' is a so big that there could never possibly be that
> many 2MB huge pages in the system.

I get migration warnins with the zsmalloc series reverted.
I guess the problem is somewhere else. Can you double check
on you side?


[ 87.208255] ------------[ cut here ]------------
[ 87.209431] WARNING: CPU: 18 PID: 300 at mm/migrate.c:995 move_to_new_folio+0x1ef/0x260
[ 87.211993] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
[ 87.214287] CPU: 18 PID: 300 Comm: kcompactd0 Tainted: G N 6.2.0-rc3-next-20230113+ #385
[ 87.217529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 87.220131] RIP: 0010:move_to_new_folio+0x1ef/0x260
[ 87.221892] Code: 84 c0 74 78 48 8b 43 18 44 89 ea 48 89 de 4c 89 e7 ff 50 06 85 c0 0f 85 a9 fe ff ff 48 8b 03 a9 00 00 04 00 0f 85 7a fe ff ff <0f> 0b e9 73 fe ff ff 48 8b 03 f6 c4 20 74 2a be c0 0c 00 00 48 89
[ 87.226514] RSP: 0018:ffffc90000b9fb08 EFLAGS: 00010246
[ 87.227879] RAX: 4000000000000021 RBX: ffffea0000890500 RCX: 0000000000000000
[ 87.230948] RDX: 0000000000000000 RSI: ffffffff81e6f950 RDI: ffffea0000890500
[ 87.233026] RBP: ffffea0000890500 R08: 0000001e82ec3c3e R09: 0000000000000001
[ 87.235517] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffffea00015a26c0
[ 87.237807] R13: 0000000000000001 R14: ffffea00015a2680 R15: ffffea00008904c0
[ 87.239438] FS: 0000000000000000(0000) GS:ffff888624200000(0000) knlGS:0000000000000000
[ 87.241303] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 87.242627] CR2: 00007fe537ebbdb8 CR3: 0000000110a0a004 CR4: 0000000000770ee0
[ 87.244283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 87.245913] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 87.247559] PKRU: 55555554
[ 87.248269] Call Trace:
[ 87.248862] <TASK>
[ 87.249370] ? lock_is_held_type+0xd9/0x130
[ 87.250377] migrate_pages_batch+0x553/0xc80
[ 87.251513] ? move_freelist_tail+0xc0/0xc0
[ 87.252545] ? isolate_freepages+0x290/0x290
[ 87.253654] ? trace_mm_migrate_pages+0xf0/0xf0
[ 87.254901] migrate_pages+0x1ae/0x330
[ 87.255877] ? isolate_freepages+0x290/0x290
[ 87.257015] ? move_freelist_tail+0xc0/0xc0
[ 87.258213] compact_zone+0x528/0x6a0
[ 87.260911] proactive_compact_node+0x87/0xd0
[ 87.262090] kcompactd+0x1ca/0x360
[ 87.263018] ? swake_up_all+0xe0/0xe0
[ 87.264101] ? kcompactd_do_work+0x240/0x240
[ 87.265243] kthread+0xec/0x110
[ 87.266031] ? kthread_complete_and_exit+0x20/0x20
[ 87.267268] ret_from_fork+0x1f/0x30
[ 87.268243] </TASK>
[ 87.268984] irq event stamp: 311113
[ 87.269930] hardirqs last enabled at (311125): [<ffffffff810da6c2>] __up_console_sem+0x52/0x60
[ 87.272235] hardirqs last disabled at (311134): [<ffffffff810da6a7>] __up_console_sem+0x37/0x60
[ 87.275707] softirqs last enabled at (311088): [<ffffffff819d2b2c>] __do_softirq+0x21c/0x31f
[ 87.278450] softirqs last disabled at (311083): [<ffffffff81070b8d>] __irq_exit_rcu+0xad/0x120
[ 87.280555] ---[ end trace 0000000000000000 ]---

2023-01-14 22:12:55

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On 01/14/23 16:08, Sergey Senozhatsky wrote:
> On (23/01/13 11:57), Mike Kravetz wrote:
> > Hi Sergey,
> >
> > The following BUG shows up after this series in linux-next. I can easily
> > recreate by doing the following:
> >
> > # echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> > where 'large_value' is a so big that there could never possibly be that
> > many 2MB huge pages in the system.
>
> I get migration warnins with the zsmalloc series reverted.
> I guess the problem is somewhere else. Can you double check
> on you side?

I did the following:

- Start with clean v6.2-rc3
Perform echo, did not see issue

- Applied your 5 patches (includes the zsmalloc: turn chain size config option
into UL constant patch). Took default value for ZSMALLOC_CHAIN_SIZE of 8.
Performed echo, recreated issue.

- Changed ZSMALLOC_CHAIN_SIZE to 1.
Perform echo, did not see issue

I have not looked into the details of your patches or elsewhere. Just thought
it might be related to your series because of the above. And, since your
series was fresh in your mind this may trigger some thought/explanation.

It is certainly possible that root cause could be elsewhere and your series is
just exposing that. I can take a closer look on Monday.

Thanks,
--
Mike Kravetz

2023-01-15 04:30:14

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On (23/01/14 13:34), Mike Kravetz wrote:
> I did the following:
>
> - Start with clean v6.2-rc3
> Perform echo, did not see issue
>
> - Applied your 5 patches (includes the zsmalloc: turn chain size config option
> into UL constant patch). Took default value for ZSMALLOC_CHAIN_SIZE of 8.
> Performed echo, recreated issue.
>
> - Changed ZSMALLOC_CHAIN_SIZE to 1.
> Perform echo, did not see issue

The patch set basically just adjusts $NUM in calculate_zspage_chain_size():

for (i = 1; i <= $NUM; i++)

It changes default 4 to 8. Can't really see how this can cause problems.

2023-01-15 06:20:40

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On (23/01/15 13:21), Sergey Senozhatsky wrote:
> On (23/01/14 13:34), Mike Kravetz wrote:
> > I did the following:
> >
> > - Start with clean v6.2-rc3
> > Perform echo, did not see issue
> >
> > - Applied your 5 patches (includes the zsmalloc: turn chain size config option
> > into UL constant patch). Took default value for ZSMALLOC_CHAIN_SIZE of 8.
> > Performed echo, recreated issue.
> >
> > - Changed ZSMALLOC_CHAIN_SIZE to 1.
> > Perform echo, did not see issue
>
> The patch set basically just adjusts $NUM in calculate_zspage_chain_size():
>
> for (i = 1; i <= $NUM; i++)
>
> It changes default 4 to 8. Can't really see how this can cause problems.

OK, I guess it overflows zspage isolated counter, which is a 3 bit
integer, so the max chain-size we can have is b111 == 7.

We probably need something like below (this should not increase sizeof
zspage):

---

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 290053e648b0..86b742a613ee 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -129,7 +129,7 @@
#define HUGE_BITS 1
#define FULLNESS_BITS 2
#define CLASS_BITS 8
-#define ISOLATED_BITS 3
+#define ISOLATED_BITS 5
#define MAGIC_VAL_BITS 8

#define MAX(a, b) ((a) >= (b) ? (a) : (b))

2023-01-15 08:05:07

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

Cc-ing Matthew,

On (23/01/14 16:08), Sergey Senozhatsky wrote:
> [ 87.208255] ------------[ cut here ]------------
> [ 87.209431] WARNING: CPU: 18 PID: 300 at mm/migrate.c:995 move_to_new_folio+0x1ef/0x260
> [ 87.211993] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> [ 87.214287] CPU: 18 PID: 300 Comm: kcompactd0 Tainted: G N 6.2.0-rc3-next-20230113+ #385
> [ 87.217529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> [ 87.220131] RIP: 0010:move_to_new_folio+0x1ef/0x260
> [ 87.221892] Code: 84 c0 74 78 48 8b 43 18 44 89 ea 48 89 de 4c 89 e7 ff 50 06 85 c0 0f 85 a9 fe ff ff 48 8b 03 a9 00 00 04 00 0f 85 7a fe ff ff <0f> 0b e9 73 fe ff ff 48 8b 03 f6 c4 20 74 2a be c0 0c 00 00 48 89
> [ 87.226514] RSP: 0018:ffffc90000b9fb08 EFLAGS: 00010246
> [ 87.227879] RAX: 4000000000000021 RBX: ffffea0000890500 RCX: 0000000000000000
> [ 87.230948] RDX: 0000000000000000 RSI: ffffffff81e6f950 RDI: ffffea0000890500
> [ 87.233026] RBP: ffffea0000890500 R08: 0000001e82ec3c3e R09: 0000000000000001
> [ 87.235517] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffffea00015a26c0
> [ 87.237807] R13: 0000000000000001 R14: ffffea00015a2680 R15: ffffea00008904c0
> [ 87.239438] FS: 0000000000000000(0000) GS:ffff888624200000(0000) knlGS:0000000000000000
> [ 87.241303] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 87.242627] CR2: 00007fe537ebbdb8 CR3: 0000000110a0a004 CR4: 0000000000770ee0
> [ 87.244283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 87.245913] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 87.247559] PKRU: 55555554
> [ 87.248269] Call Trace:
> [ 87.248862] <TASK>
> [ 87.249370] ? lock_is_held_type+0xd9/0x130
> [ 87.250377] migrate_pages_batch+0x553/0xc80
> [ 87.251513] ? move_freelist_tail+0xc0/0xc0
> [ 87.252545] ? isolate_freepages+0x290/0x290
> [ 87.253654] ? trace_mm_migrate_pages+0xf0/0xf0
> [ 87.254901] migrate_pages+0x1ae/0x330
> [ 87.255877] ? isolate_freepages+0x290/0x290
> [ 87.257015] ? move_freelist_tail+0xc0/0xc0
> [ 87.258213] compact_zone+0x528/0x6a0
> [ 87.260911] proactive_compact_node+0x87/0xd0
> [ 87.262090] kcompactd+0x1ca/0x360
> [ 87.263018] ? swake_up_all+0xe0/0xe0
> [ 87.264101] ? kcompactd_do_work+0x240/0x240
> [ 87.265243] kthread+0xec/0x110
> [ 87.266031] ? kthread_complete_and_exit+0x20/0x20
> [ 87.267268] ret_from_fork+0x1f/0x30
> [ 87.268243] </TASK>
> [ 87.268984] irq event stamp: 311113
> [ 87.269930] hardirqs last enabled at (311125): [<ffffffff810da6c2>] __up_console_sem+0x52/0x60
> [ 87.272235] hardirqs last disabled at (311134): [<ffffffff810da6a7>] __up_console_sem+0x37/0x60
> [ 87.275707] softirqs last enabled at (311088): [<ffffffff819d2b2c>] __do_softirq+0x21c/0x31f
> [ 87.278450] softirqs last disabled at (311083): [<ffffffff81070b8d>] __irq_exit_rcu+0xad/0x120
> [ 87.280555] ---[ end trace 0000000000000000 ]---

So this warning is move_to_new_folio() being called on un-isolated
src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
did nothing, however after mops->migrate_page() it would trigger WARN_ON()
because it evaluates folio_test_isolated(src) one more time:

[ 59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
[ 59.503239] flags: 0x8000000000000001(locked|zone=2)
[ 59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
[ 59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000
[ 59.509622] page dumped because: VM_BUG_ON_FOLIO(!folio_test_isolated(src))
[ 59.511845] ------------[ cut here ]------------
[ 59.513181] kernel BUG at mm/migrate.c:988!
[ 59.514821] invalid opcode: 0000 [#1] PREEMPT SMP PTI

[ 59.523018] RIP: 0010:move_to_new_folio+0x362/0x3b0
[ 59.524160] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
[ 59.528349] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
[ 59.529551] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
[ 59.531186] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
[ 59.532790] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
[ 59.534392] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
[ 59.536026] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
[ 59.537646] FS: 0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
[ 59.539484] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 59.540785] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
[ 59.542412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 59.544030] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 59.545637] PKRU: 55555554
[ 59.546261] Call Trace:
[ 59.546833] <TASK>
[ 59.547371] ? lock_is_held_type+0xd9/0x130
[ 59.548331] migrate_pages_batch+0x650/0xdc0
[ 59.549326] ? move_freelist_tail+0xc0/0xc0
[ 59.550281] ? isolate_freepages+0x290/0x290
[ 59.551289] ? folio_flags.constprop.0+0x50/0x50
[ 59.552348] migrate_pages+0x3fa/0x4d0
[ 59.553224] ? isolate_freepages+0x290/0x290
[ 59.554214] ? move_freelist_tail+0xc0/0xc0
[ 59.555173] compact_zone+0x51b/0x6a0
[ 59.556031] proactive_compact_node+0x8e/0xe0
[ 59.557033] kcompactd+0x1c3/0x350
[ 59.557842] ? swake_up_all+0xe0/0xe0
[ 59.558699] ? kcompactd_do_work+0x260/0x260
[ 59.559703] kthread+0xec/0x110
[ 59.560450] ? kthread_complete_and_exit+0x20/0x20
[ 59.561582] ret_from_fork+0x1f/0x30
[ 59.562427] </TASK>
[ 59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
[ 59.564591] ---[ end trace 0000000000000000 ]---
[ 59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
[ 59.566802] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
[ 59.571048] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
[ 59.572257] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
[ 59.573906] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
[ 59.575544] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
[ 59.577236] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
[ 59.578893] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
[ 59.580593] FS: 0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
[ 59.582432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 59.583767] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
[ 59.585437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 59.587082] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 59.588738] PKRU: 55555554

2023-01-15 09:08:51

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

+ Huang Ying,

> On (23/01/14 16:08), Sergey Senozhatsky wrote:
> > [ 87.208255] ------------[ cut here ]------------
> > [ 87.209431] WARNING: CPU: 18 PID: 300 at mm/migrate.c:995 move_to_new_folio+0x1ef/0x260
> > [ 87.211993] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> > [ 87.214287] CPU: 18 PID: 300 Comm: kcompactd0 Tainted: G N 6.2.0-rc3-next-20230113+ #385
> > [ 87.217529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > [ 87.220131] RIP: 0010:move_to_new_folio+0x1ef/0x260
> > [ 87.221892] Code: 84 c0 74 78 48 8b 43 18 44 89 ea 48 89 de 4c 89 e7 ff 50 06 85 c0 0f 85 a9 fe ff ff 48 8b 03 a9 00 00 04 00 0f 85 7a fe ff ff <0f> 0b e9 73 fe ff ff 48 8b 03 f6 c4 20 74 2a be c0 0c 00 00 48 89
> > [ 87.226514] RSP: 0018:ffffc90000b9fb08 EFLAGS: 00010246
> > [ 87.227879] RAX: 4000000000000021 RBX: ffffea0000890500 RCX: 0000000000000000
> > [ 87.230948] RDX: 0000000000000000 RSI: ffffffff81e6f950 RDI: ffffea0000890500
> > [ 87.233026] RBP: ffffea0000890500 R08: 0000001e82ec3c3e R09: 0000000000000001
> > [ 87.235517] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffffea00015a26c0
> > [ 87.237807] R13: 0000000000000001 R14: ffffea00015a2680 R15: ffffea00008904c0
> > [ 87.239438] FS: 0000000000000000(0000) GS:ffff888624200000(0000) knlGS:0000000000000000
> > [ 87.241303] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 87.242627] CR2: 00007fe537ebbdb8 CR3: 0000000110a0a004 CR4: 0000000000770ee0
> > [ 87.244283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 87.245913] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 87.247559] PKRU: 55555554
> > [ 87.248269] Call Trace:
> > [ 87.248862] <TASK>
> > [ 87.249370] ? lock_is_held_type+0xd9/0x130
> > [ 87.250377] migrate_pages_batch+0x553/0xc80
> > [ 87.251513] ? move_freelist_tail+0xc0/0xc0
> > [ 87.252545] ? isolate_freepages+0x290/0x290
> > [ 87.253654] ? trace_mm_migrate_pages+0xf0/0xf0
> > [ 87.254901] migrate_pages+0x1ae/0x330
> > [ 87.255877] ? isolate_freepages+0x290/0x290
> > [ 87.257015] ? move_freelist_tail+0xc0/0xc0
> > [ 87.258213] compact_zone+0x528/0x6a0
> > [ 87.260911] proactive_compact_node+0x87/0xd0
> > [ 87.262090] kcompactd+0x1ca/0x360
> > [ 87.263018] ? swake_up_all+0xe0/0xe0
> > [ 87.264101] ? kcompactd_do_work+0x240/0x240
> > [ 87.265243] kthread+0xec/0x110
> > [ 87.266031] ? kthread_complete_and_exit+0x20/0x20
> > [ 87.267268] ret_from_fork+0x1f/0x30
> > [ 87.268243] </TASK>
> > [ 87.268984] irq event stamp: 311113
> > [ 87.269930] hardirqs last enabled at (311125): [<ffffffff810da6c2>] __up_console_sem+0x52/0x60
> > [ 87.272235] hardirqs last disabled at (311134): [<ffffffff810da6a7>] __up_console_sem+0x37/0x60
> > [ 87.275707] softirqs last enabled at (311088): [<ffffffff819d2b2c>] __do_softirq+0x21c/0x31f
> > [ 87.278450] softirqs last disabled at (311083): [<ffffffff81070b8d>] __irq_exit_rcu+0xad/0x120
> > [ 87.280555] ---[ end trace 0000000000000000 ]---
>
> So this warning is move_to_new_folio() being called on un-isolated
> src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> did nothing, however after mops->migrate_page() it would trigger WARN_ON()
> because it evaluates folio_test_isolated(src) one more time:
>
> [ 59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
> [ 59.503239] flags: 0x8000000000000001(locked|zone=2)
> [ 59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
> [ 59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000
> [ 59.509622] page dumped because: VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> [ 59.511845] ------------[ cut here ]------------
> [ 59.513181] kernel BUG at mm/migrate.c:988!
> [ 59.514821] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>
> [ 59.523018] RIP: 0010:move_to_new_folio+0x362/0x3b0
> [ 59.524160] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
> [ 59.528349] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
> [ 59.529551] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
> [ 59.531186] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
> [ 59.532790] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
> [ 59.534392] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
> [ 59.536026] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
> [ 59.537646] FS: 0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
> [ 59.539484] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 59.540785] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
> [ 59.542412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 59.544030] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 59.545637] PKRU: 55555554
> [ 59.546261] Call Trace:
> [ 59.546833] <TASK>
> [ 59.547371] ? lock_is_held_type+0xd9/0x130
> [ 59.548331] migrate_pages_batch+0x650/0xdc0
> [ 59.549326] ? move_freelist_tail+0xc0/0xc0
> [ 59.550281] ? isolate_freepages+0x290/0x290
> [ 59.551289] ? folio_flags.constprop.0+0x50/0x50
> [ 59.552348] migrate_pages+0x3fa/0x4d0
> [ 59.553224] ? isolate_freepages+0x290/0x290
> [ 59.554214] ? move_freelist_tail+0xc0/0xc0
> [ 59.555173] compact_zone+0x51b/0x6a0
> [ 59.556031] proactive_compact_node+0x8e/0xe0
> [ 59.557033] kcompactd+0x1c3/0x350
> [ 59.557842] ? swake_up_all+0xe0/0xe0
> [ 59.558699] ? kcompactd_do_work+0x260/0x260
> [ 59.559703] kthread+0xec/0x110
> [ 59.560450] ? kthread_complete_and_exit+0x20/0x20
> [ 59.561582] ret_from_fork+0x1f/0x30
> [ 59.562427] </TASK>
> [ 59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> [ 59.564591] ---[ end trace 0000000000000000 ]---
> [ 59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
> [ 59.566802] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
> [ 59.571048] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
> [ 59.572257] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
> [ 59.573906] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
> [ 59.575544] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
> [ 59.577236] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
> [ 59.578893] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
> [ 59.580593] FS: 0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
> [ 59.582432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 59.583767] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
> [ 59.585437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 59.587082] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 59.588738] PKRU: 55555554

2023-01-15 13:49:30

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On Sun, Jan 15, 2023 at 04:18:55PM +0900, Sergey Senozhatsky wrote:
> So this warning is move_to_new_folio() being called on un-isolated
> src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> did nothing, however after mops->migrate_page() it would trigger WARN_ON()
> because it evaluates folio_test_isolated(src) one more time:
>
> [ 59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
> [ 59.503239] flags: 0x8000000000000001(locked|zone=2)
> [ 59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
> [ 59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000

That is quite the messed-up page. mapcount is positive, but higher than
refcount. And not just a little bit; 1665 vs 2. But mapping is NULL,
so it's not anon or file memory. Makes me think it belongs to a driver
that's using ->mapcount for its own purposes. It's not PageSlab.

Given that you're working on zsmalloc, I took a look and:

static inline void set_first_obj_offset(struct page *page, unsigned int offset)
{
page->page_type = offset;
}

(page_type aliases with mapcount). So I'm pretty sure this is a
zsmalloc page. But mapping should point to zsmalloc_mops. Not
really sure what's going on here. Can you bisect?

> [ 59.509622] page dumped because: VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> [ 59.511845] ------------[ cut here ]------------
> [ 59.513181] kernel BUG at mm/migrate.c:988!
> [ 59.514821] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>
> [ 59.523018] RIP: 0010:move_to_new_folio+0x362/0x3b0
> [ 59.524160] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
> [ 59.528349] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
> [ 59.529551] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
> [ 59.531186] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
> [ 59.532790] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
> [ 59.534392] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
> [ 59.536026] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
> [ 59.537646] FS: 0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
> [ 59.539484] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 59.540785] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
> [ 59.542412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 59.544030] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 59.545637] PKRU: 55555554
> [ 59.546261] Call Trace:
> [ 59.546833] <TASK>
> [ 59.547371] ? lock_is_held_type+0xd9/0x130
> [ 59.548331] migrate_pages_batch+0x650/0xdc0
> [ 59.549326] ? move_freelist_tail+0xc0/0xc0
> [ 59.550281] ? isolate_freepages+0x290/0x290
> [ 59.551289] ? folio_flags.constprop.0+0x50/0x50
> [ 59.552348] migrate_pages+0x3fa/0x4d0
> [ 59.553224] ? isolate_freepages+0x290/0x290
> [ 59.554214] ? move_freelist_tail+0xc0/0xc0
> [ 59.555173] compact_zone+0x51b/0x6a0
> [ 59.556031] proactive_compact_node+0x8e/0xe0
> [ 59.557033] kcompactd+0x1c3/0x350
> [ 59.557842] ? swake_up_all+0xe0/0xe0
> [ 59.558699] ? kcompactd_do_work+0x260/0x260
> [ 59.559703] kthread+0xec/0x110
> [ 59.560450] ? kthread_complete_and_exit+0x20/0x20
> [ 59.561582] ret_from_fork+0x1f/0x30
> [ 59.562427] </TASK>
> [ 59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> [ 59.564591] ---[ end trace 0000000000000000 ]---
> [ 59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
> [ 59.566802] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
> [ 59.571048] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
> [ 59.572257] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
> [ 59.573906] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
> [ 59.575544] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
> [ 59.577236] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
> [ 59.578893] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
> [ 59.580593] FS: 0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
> [ 59.582432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 59.583767] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
> [ 59.585437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 59.587082] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 59.588738] PKRU: 55555554

2023-01-15 15:03:08

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On (23/01/15 13:04), Matthew Wilcox wrote:
> On Sun, Jan 15, 2023 at 04:18:55PM +0900, Sergey Senozhatsky wrote:
> > So this warning is move_to_new_folio() being called on un-isolated
> > src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> > did nothing, however after mops->migrate_page() it would trigger WARN_ON()
> > because it evaluates folio_test_isolated(src) one more time:
> >
> > [ 59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
> > [ 59.503239] flags: 0x8000000000000001(locked|zone=2)
> > [ 59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
> > [ 59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000
>
> That is quite the messed-up page. mapcount is positive, but higher than
> refcount. And not just a little bit; 1665 vs 2. But mapping is NULL,
> so it's not anon or file memory. Makes me think it belongs to a driver
> that's using ->mapcount for its own purposes. It's not PageSlab.
>
> Given that you're working on zsmalloc, I took a look and:
>
> static inline void set_first_obj_offset(struct page *page, unsigned int offset)
> {
> page->page_type = offset;
> }
>
> (page_type aliases with mapcount). So I'm pretty sure this is a
> zsmalloc page. But mapping should point to zsmalloc_mops. Not
> really sure what's going on here. Can you bisect?

Thanks.

Let me try bisecting. From what I can tell it seems that
tags/next-20221226 is the last good and tags/next-20230105
is the first bad kernel.

I'll try to narrow it down from here.

2023-01-16 01:38:31

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

Hi, Sergey,

Sergey Senozhatsky <[email protected]> writes:
> + Huang Ying,
>
>> On (23/01/14 16:08), Sergey Senozhatsky wrote:
>> > [ 87.208255] ------------[ cut here ]------------
>> > [ 87.209431] WARNING: CPU: 18 PID: 300 at mm/migrate.c:995 move_to_new_folio+0x1ef/0x260
>> > [ 87.211993] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
>> > [ 87.214287] CPU: 18 PID: 300 Comm: kcompactd0 Tainted: G N 6.2.0-rc3-next-20230113+ #385
>> > [ 87.217529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
>> > [ 87.220131] RIP: 0010:move_to_new_folio+0x1ef/0x260
>> > [ 87.221892] Code: 84 c0 74 78 48 8b 43 18 44 89 ea 48 89 de 4c 89
>> > e7 ff 50 06 85 c0 0f 85 a9 fe ff ff 48 8b 03 a9 00 00 04 00 0f 85
>> > 7a fe ff ff <0f> 0b e9 73 fe ff ff 48 8b 03 f6 c4 20 74 2a be c0
>> > 0c 00 00 48 89
>> > [ 87.226514] RSP: 0018:ffffc90000b9fb08 EFLAGS: 00010246
>> > [ 87.227879] RAX: 4000000000000021 RBX: ffffea0000890500 RCX: 0000000000000000
>> > [ 87.230948] RDX: 0000000000000000 RSI: ffffffff81e6f950 RDI: ffffea0000890500
>> > [ 87.233026] RBP: ffffea0000890500 R08: 0000001e82ec3c3e R09: 0000000000000001
>> > [ 87.235517] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffffea00015a26c0
>> > [ 87.237807] R13: 0000000000000001 R14: ffffea00015a2680 R15: ffffea00008904c0
>> > [ 87.239438] FS: 0000000000000000(0000) GS:ffff888624200000(0000) knlGS:0000000000000000
>> > [ 87.241303] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [ 87.242627] CR2: 00007fe537ebbdb8 CR3: 0000000110a0a004 CR4: 0000000000770ee0
>> > [ 87.244283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > [ 87.245913] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > [ 87.247559] PKRU: 55555554
>> > [ 87.248269] Call Trace:
>> > [ 87.248862] <TASK>
>> > [ 87.249370] ? lock_is_held_type+0xd9/0x130
>> > [ 87.250377] migrate_pages_batch+0x553/0xc80
>> > [ 87.251513] ? move_freelist_tail+0xc0/0xc0
>> > [ 87.252545] ? isolate_freepages+0x290/0x290
>> > [ 87.253654] ? trace_mm_migrate_pages+0xf0/0xf0
>> > [ 87.254901] migrate_pages+0x1ae/0x330
>> > [ 87.255877] ? isolate_freepages+0x290/0x290
>> > [ 87.257015] ? move_freelist_tail+0xc0/0xc0
>> > [ 87.258213] compact_zone+0x528/0x6a0
>> > [ 87.260911] proactive_compact_node+0x87/0xd0
>> > [ 87.262090] kcompactd+0x1ca/0x360
>> > [ 87.263018] ? swake_up_all+0xe0/0xe0
>> > [ 87.264101] ? kcompactd_do_work+0x240/0x240
>> > [ 87.265243] kthread+0xec/0x110
>> > [ 87.266031] ? kthread_complete_and_exit+0x20/0x20
>> > [ 87.267268] ret_from_fork+0x1f/0x30
>> > [ 87.268243] </TASK>
>> > [ 87.268984] irq event stamp: 311113
>> > [ 87.269930] hardirqs last enabled at (311125): [<ffffffff810da6c2>] __up_console_sem+0x52/0x60
>> > [ 87.272235] hardirqs last disabled at (311134): [<ffffffff810da6a7>] __up_console_sem+0x37/0x60
>> > [ 87.275707] softirqs last enabled at (311088): [<ffffffff819d2b2c>] __do_softirq+0x21c/0x31f
>> > [ 87.278450] softirqs last disabled at (311083): [<ffffffff81070b8d>] __irq_exit_rcu+0xad/0x120
>> > [ 87.280555] ---[ end trace 0000000000000000 ]---
>>
>> So this warning is move_to_new_folio() being called on un-isolated
>> src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
>> did nothing, however after mops->migrate_page() it would trigger WARN_ON()
>> because it evaluates folio_test_isolated(src) one more time:
>>
>> [ 59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
>> [ 59.503239] flags: 0x8000000000000001(locked|zone=2)
>> [ 59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
>> [ 59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000
>> [ 59.509622] page dumped because: VM_BUG_ON_FOLIO(!folio_test_isolated(src))
>> [ 59.511845] ------------[ cut here ]------------
>> [ 59.513181] kernel BUG at mm/migrate.c:988!
>> [ 59.514821] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>
>> [ 59.523018] RIP: 0010:move_to_new_folio+0x362/0x3b0
>> [ 59.524160] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0
>> 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be
>> c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b
>> b8 f5 ff
>> [ 59.528349] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
>> [ 59.529551] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
>> [ 59.531186] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
>> [ 59.532790] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
>> [ 59.534392] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
>> [ 59.536026] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
>> [ 59.537646] FS: 0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
>> [ 59.539484] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 59.540785] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
>> [ 59.542412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 59.544030] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 59.545637] PKRU: 55555554
>> [ 59.546261] Call Trace:
>> [ 59.546833] <TASK>
>> [ 59.547371] ? lock_is_held_type+0xd9/0x130
>> [ 59.548331] migrate_pages_batch+0x650/0xdc0
>> [ 59.549326] ? move_freelist_tail+0xc0/0xc0
>> [ 59.550281] ? isolate_freepages+0x290/0x290
>> [ 59.551289] ? folio_flags.constprop.0+0x50/0x50
>> [ 59.552348] migrate_pages+0x3fa/0x4d0
>> [ 59.553224] ? isolate_freepages+0x290/0x290
>> [ 59.554214] ? move_freelist_tail+0xc0/0xc0
>> [ 59.555173] compact_zone+0x51b/0x6a0
>> [ 59.556031] proactive_compact_node+0x8e/0xe0
>> [ 59.557033] kcompactd+0x1c3/0x350
>> [ 59.557842] ? swake_up_all+0xe0/0xe0
>> [ 59.558699] ? kcompactd_do_work+0x260/0x260
>> [ 59.559703] kthread+0xec/0x110
>> [ 59.560450] ? kthread_complete_and_exit+0x20/0x20
>> [ 59.561582] ret_from_fork+0x1f/0x30
>> [ 59.562427] </TASK>
>> [ 59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
>> [ 59.564591] ---[ end trace 0000000000000000 ]---
>> [ 59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
>> [ 59.566802] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0
>> 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be
>> c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b
>> b8 f5 ff
>> [ 59.571048] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
>> [ 59.572257] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
>> [ 59.573906] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
>> [ 59.575544] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
>> [ 59.577236] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
>> [ 59.578893] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
>> [ 59.580593] FS: 0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
>> [ 59.582432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 59.583767] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
>> [ 59.585437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 59.587082] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 59.588738] PKRU: 55555554

Thanks for reporting. We have just fixed a ZRAM related bug in
migrate_pages() batching series with the help of Mike.

https://lore.kernel.org/linux-mm/Y8DizzvFXBSEPzI4@monkey/

I will send out a new version today or tomorrow to fix it. Please try
that.

Best Regards,
Huang, Ying

2023-01-16 03:22:18

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On (23/01/09 12:38), Sergey Senozhatsky wrote:
> This turns hard coded limit on maximum number of physical
> pages per-zspage into a config option. It also increases the default
> limit from 4 to 8.
>
> Sergey Senozhatsky (4):
> zsmalloc: rework zspage chain size selection
> zsmalloc: skip chain size calculation for pow_of_2 classes
> zsmalloc: make zspage chain size configurable
> zsmalloc: set default zspage chain size to 8
>
> Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
> mm/Kconfig | 19 ++++
> mm/zsmalloc.c | 72 +++++----------
> 3 files changed, 212 insertions(+), 47 deletions(-)

Andrew,

Can you please drop this series? We have two fixup patches (hppa64 build
failure and isolated bit-field overflow reported by Mike) for this series
and at this point I probably want to send out v3 with all fixups squashed.

Mike, would that be OK with you if I squash ->isolated fixup?

2023-01-16 04:08:01

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

Hi,

On (23/01/16 09:27), Huang, Ying wrote:
> >> [ 59.546261] Call Trace:
> >> [ 59.546833] <TASK>
> >> [ 59.547371] ? lock_is_held_type+0xd9/0x130
> >> [ 59.548331] migrate_pages_batch+0x650/0xdc0
> >> [ 59.549326] ? move_freelist_tail+0xc0/0xc0
> >> [ 59.550281] ? isolate_freepages+0x290/0x290
> >> [ 59.551289] ? folio_flags.constprop.0+0x50/0x50
> >> [ 59.552348] migrate_pages+0x3fa/0x4d0
> >> [ 59.553224] ? isolate_freepages+0x290/0x290
> >> [ 59.554214] ? move_freelist_tail+0xc0/0xc0
> >> [ 59.555173] compact_zone+0x51b/0x6a0
> >> [ 59.556031] proactive_compact_node+0x8e/0xe0
> >> [ 59.557033] kcompactd+0x1c3/0x350
> >> [ 59.557842] ? swake_up_all+0xe0/0xe0
> >> [ 59.558699] ? kcompactd_do_work+0x260/0x260
> >> [ 59.559703] kthread+0xec/0x110
> >> [ 59.560450] ? kthread_complete_and_exit+0x20/0x20
> >> [ 59.561582] ret_from_fork+0x1f/0x30
> >> [ 59.562427] </TASK>
> >> [ 59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> >> [ 59.564591] ---[ end trace 0000000000000000 ]---
> >> [ 59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
>
> Thanks for reporting. We have just fixed a ZRAM related bug in
> migrate_pages() batching series with the help of Mike.

Oh, great. Yeah, I narroved it down to that series as well.

> https://lore.kernel.org/linux-mm/Y8DizzvFXBSEPzI4@monkey/

That fixes it!

2023-01-16 19:10:46

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable

On 01/16/23 12:15, Sergey Senozhatsky wrote:
> On (23/01/09 12:38), Sergey Senozhatsky wrote:
> > This turns hard coded limit on maximum number of physical
> > pages per-zspage into a config option. It also increases the default
> > limit from 4 to 8.
> >
> > Sergey Senozhatsky (4):
> > zsmalloc: rework zspage chain size selection
> > zsmalloc: skip chain size calculation for pow_of_2 classes
> > zsmalloc: make zspage chain size configurable
> > zsmalloc: set default zspage chain size to 8
> >
> > Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
> > mm/Kconfig | 19 ++++
> > mm/zsmalloc.c | 72 +++++----------
> > 3 files changed, 212 insertions(+), 47 deletions(-)
>
> Andrew,
>
> Can you please drop this series? We have two fixup patches (hppa64 build
> failure and isolated bit-field overflow reported by Mike) for this series
> and at this point I probably want to send out v3 with all fixups squashed.
>
> Mike, would that be OK with you if I squash ->isolated fixup?

I'm OK with however you want to address. Thanks!
--
Mike Kravetz