2015-11-27 04:11:15

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCH v3 0/2] zram/zcomp: stream allocation fixes and tweaks

Hello,
Two patches that address possible issues with compression
stream allocations under low memory or heavy fragmentation conditions.

These patches are considered to be a -stable material, however there
is no Cc -stable on "zram: try vmalloc() after kmalloc()" as of now.
We'd like to ask Kyeongdon to re-test and re-confirm (the patch has
been modified).

I split the patch set into separate threads. Two more patches
- zram: pass gfp from zcomp frontend to backend (Minchan Kim)
- zram/zcomp: do not zero out zcomp private pages (Sergey Senozhatsky)
(not a -stable material) will be posted separately.

Kyeongdon Kim (1):
zram: try vmalloc() after kmalloc()

Sergey Senozhatsky (1):
zram/zcomp: use GFP_NOIO to allocate streams

drivers/block/zram/zcomp.c | 4 ++--
drivers/block/zram/zcomp_lz4.c | 23 +++++++++++++++++++++--
drivers/block/zram/zcomp_lzo.c | 23 +++++++++++++++++++++--
3 files changed, 44 insertions(+), 6 deletions(-)

--
2.6.3.368.gf34be46


2015-11-27 04:11:23

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCH v3 1/2] zram/zcomp: use GFP_NOIO to allocate streams

From: Sergey Senozhatsky <[email protected]>

We can end up allocating a new compression stream with GFP_KERNEL
from within the IO path, which may result is nested (recursive) IO
operations. That can introduce problems if the IO path in question
is a reclaimer, holding some locks that will deadlock nested IOs.

Allocate streams and working memory using GFP_NOIO flag, forbidding
recursive IO and FS operations.

An example:

[ 747.233722] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
[ 747.233724] git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 747.233725] (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
[ 747.233733] {IN-RECLAIM_FS-W} state was registered at:
[ 747.233735] [<ffffffff8107b8e9>] __lock_acquire+0x8da/0x117b
[ 747.233738] [<ffffffff8107c950>] lock_acquire+0x10c/0x1a7
[ 747.233740] [<ffffffff811e323e>] start_this_handle+0x52d/0x555
[ 747.233742] [<ffffffff811e331a>] jbd2__journal_start+0xb4/0x237
[ 747.233744] [<ffffffff811cc6c7>] __ext4_journal_start_sb+0x108/0x17e
[ 747.233748] [<ffffffff811a90bf>] ext4_dirty_inode+0x32/0x61
[ 747.233750] [<ffffffff8115f37e>] __mark_inode_dirty+0x16b/0x60c
[ 747.233754] [<ffffffff81150ad6>] iput+0x11e/0x274
[ 747.233757] [<ffffffff8114bfbd>] __dentry_kill+0x148/0x1b8
[ 747.233759] [<ffffffff8114c9d9>] shrink_dentry_list+0x274/0x44a
[ 747.233761] [<ffffffff8114d38a>] prune_dcache_sb+0x4a/0x55
[ 747.233763] [<ffffffff8113b1ad>] super_cache_scan+0xfc/0x176
[ 747.233767] [<ffffffff810fa089>] shrink_slab.part.14.constprop.25+0x2a2/0x4d3
[ 747.233770] [<ffffffff810fcccb>] shrink_zone+0x74/0x140
[ 747.233772] [<ffffffff810fd924>] kswapd+0x6b7/0x930
[ 747.233774] [<ffffffff81058887>] kthread+0x107/0x10f
[ 747.233778] [<ffffffff814fadff>] ret_from_fork+0x3f/0x70
[ 747.233783] irq event stamp: 138297
[ 747.233784] hardirqs last enabled at (138297): [<ffffffff8107aff3>] debug_check_no_locks_freed+0x113/0x12f
[ 747.233786] hardirqs last disabled at (138296): [<ffffffff8107af13>] debug_check_no_locks_freed+0x33/0x12f
[ 747.233788] softirqs last enabled at (137818): [<ffffffff81040f89>] __do_softirq+0x2d3/0x3e9
[ 747.233792] softirqs last disabled at (137813): [<ffffffff81041292>] irq_exit+0x41/0x95
[ 747.233794]
other info that might help us debug this:
[ 747.233796] Possible unsafe locking scenario:
[ 747.233797] CPU0
[ 747.233798] ----
[ 747.233799] lock(jbd2_handle);
[ 747.233801] <Interrupt>
[ 747.233801] lock(jbd2_handle);
[ 747.233803]
*** DEADLOCK ***
[ 747.233805] 5 locks held by git/20158:
[ 747.233806] #0: (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
[ 747.233811] #1: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
[ 747.233817] #2: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
[ 747.233822] #3: (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
[ 747.233827] #4: (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
[ 747.233831]
stack backtrace:
[ 747.233834] CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
[ 747.233837] ffff8800a56cea40 ffff88010d0a75f8 ffffffff814f446d ffffffff81077036
[ 747.233840] ffffffff823a84b0 ffff88010d0a7638 ffffffff814f3849 0000000000000001
[ 747.233843] 000000000000000a ffff8800a56cf6f8 ffff8800a56cea40 ffffffff810795dd
[ 747.233846] Call Trace:
[ 747.233849] [<ffffffff814f446d>] dump_stack+0x4c/0x6e
[ 747.233852] [<ffffffff81077036>] ? up+0x39/0x3e
[ 747.233854] [<ffffffff814f3849>] print_usage_bug.part.23+0x25b/0x26a
[ 747.233857] [<ffffffff810795dd>] ? print_shortest_lock_dependencies+0x182/0x182
[ 747.233859] [<ffffffff8107a9c9>] mark_lock+0x384/0x56d
[ 747.233862] [<ffffffff8107ac11>] mark_held_locks+0x5f/0x76
[ 747.233865] [<ffffffffa023d2f3>] ? zcomp_strm_alloc+0x25/0x73 [zram]
[ 747.233867] [<ffffffff8107d13b>] lockdep_trace_alloc+0xb2/0xb5
[ 747.233870] [<ffffffff8112bac7>] kmem_cache_alloc_trace+0x32/0x1e2
[ 747.233873] [<ffffffffa023d2f3>] zcomp_strm_alloc+0x25/0x73 [zram]
[ 747.233876] [<ffffffffa023d428>] zcomp_strm_multi_find+0xe7/0x173 [zram]
[ 747.233879] [<ffffffffa023d58b>] zcomp_strm_find+0xc/0xe [zram]
[ 747.233881] [<ffffffffa023f292>] zram_bvec_rw+0x2ca/0x7e0 [zram]
[ 747.233885] [<ffffffffa023fa8c>] zram_make_request+0x1fa/0x301 [zram]
[ 747.233889] [<ffffffff812142f8>] generic_make_request+0x9c/0xdb
[ 747.233891] [<ffffffff8121442e>] submit_bio+0xf7/0x120
[ 747.233895] [<ffffffff810f1c0c>] ? __test_set_page_writeback+0x1a0/0x1b8
[ 747.233897] [<ffffffff811a9d00>] ext4_io_submit+0x2e/0x43
[ 747.233899] [<ffffffff811a9efa>] ext4_bio_write_page+0x1b7/0x300
[ 747.233902] [<ffffffff811a2106>] mpage_submit_page+0x60/0x77
[ 747.233905] [<ffffffff811a25b0>] mpage_map_and_submit_buffers+0x10f/0x21d
[ 747.233907] [<ffffffff811a6814>] ext4_writepages+0xc8c/0xe1b
[ 747.233910] [<ffffffff810f3f77>] do_writepages+0x23/0x2c
[ 747.233913] [<ffffffff810ea5d1>] __filemap_fdatawrite_range+0x84/0x8b
[ 747.233915] [<ffffffff810ea657>] filemap_flush+0x1c/0x1e
[ 747.233917] [<ffffffff811a3851>] ext4_alloc_da_blocks+0xb8/0x117
[ 747.233919] [<ffffffff811af52a>] ext4_rename+0x132/0x6dc
[ 747.233921] [<ffffffff8107ac11>] ? mark_held_locks+0x5f/0x76
[ 747.233924] [<ffffffff811afafd>] ext4_rename2+0x29/0x2b
[ 747.233926] [<ffffffff811427ea>] vfs_rename+0x540/0x636
[ 747.233928] [<ffffffff81146a01>] SyS_renameat2+0x359/0x44d
[ 747.233931] [<ffffffff81146b26>] SyS_rename+0x1e/0x20
[ 747.233933] [<ffffffff814faa17>] entry_SYSCALL_64_fastpath+0x12/0x6f

[minchan: add stable mark]
Cc: [email protected]
Signed-off-by: Sergey Senozhatsky <[email protected]>
---
drivers/block/zram/zcomp.c | 4 ++--
drivers/block/zram/zcomp_lz4.c | 2 +-
drivers/block/zram/zcomp_lzo.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 5cb13ca..c536177 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -76,7 +76,7 @@ static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
*/
static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
{
- struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL);
+ struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_NOIO);
if (!zstrm)
return NULL;

@@ -85,7 +85,7 @@ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
* allocate 2 pages. 1 for compressed data, plus 1 extra for the
* case when compressed size is larger than the original one
*/
- zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
+ zstrm->buffer = (void *)__get_free_pages(GFP_NOIO | __GFP_ZERO, 1);
if (!zstrm->private || !zstrm->buffer) {
zcomp_strm_free(comp, zstrm);
zstrm = NULL;
diff --git a/drivers/block/zram/zcomp_lz4.c b/drivers/block/zram/zcomp_lz4.c
index f2afb7e..ee44b51 100644
--- a/drivers/block/zram/zcomp_lz4.c
+++ b/drivers/block/zram/zcomp_lz4.c
@@ -15,7 +15,7 @@

static void *zcomp_lz4_create(void)
{
- return kzalloc(LZ4_MEM_COMPRESS, GFP_KERNEL);
+ return kzalloc(LZ4_MEM_COMPRESS, GFP_NOIO);
}

static void zcomp_lz4_destroy(void *private)
diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
index da1bc47..683ce04 100644
--- a/drivers/block/zram/zcomp_lzo.c
+++ b/drivers/block/zram/zcomp_lzo.c
@@ -15,7 +15,7 @@

static void *lzo_create(void)
{
- return kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
+ return kzalloc(LZO1X_MEM_COMPRESS, GFP_NOIO);
}

static void lzo_destroy(void *private)
--
2.6.3.368.gf34be46

2015-11-27 04:11:28

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

From: Kyeongdon Kim <[email protected]>

When we're using LZ4 multi compression streams for zram swap,
we found out page allocation failure message in system running test.
That was not only once, but a few(2 - 5 times per test).
Also, some failure cases were continually occurring to try allocation
order 3.

In order to make parallel compression private data, we should call
kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
2/3 size memory to allocate in that time, page allocation fails.
This patch makes to use vmalloc() as fallback of kmalloc(), this
prevents page alloc failure warning.

After using this, we never found warning message in running test, also
It could reduce process startup latency about 60-120ms in each case.

For reference a call trace :

Binder_1: page allocation failure: order:3, mode:0x10c0d0
CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
Call trace:
[<ffffffc0002069c8>] dump_backtrace+0x0/0x270
[<ffffffc000206c48>] show_stack+0x10/0x1c
[<ffffffc000cb51c8>] dump_stack+0x1c/0x28
[<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
[<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
[<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
[<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
[<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
[<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
[<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
[<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
[<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
[<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
[<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
[<ffffffc00040f98c>] submit_bio+0xa4/0x15c
[<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
[<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
[<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
[<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
[<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
[<ffffffc0002c8894>] shrink_zone+0x3c/0x100
[<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
[<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
[<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
[<ffffffc0003446cc>] proc_info_read+0x50/0xe4
[<ffffffc0002f5204>] vfs_read+0xa0/0x12c
[<ffffffc0002f59c8>] SyS_read+0x44/0x74
DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB

[minchan: change vmalloc gfp and adding comment about gfp]
[sergey: tweak comments and styles]
Signed-off-by: Kyeongdon Kim <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
---
drivers/block/zram/zcomp_lz4.c | 23 +++++++++++++++++++++--
drivers/block/zram/zcomp_lzo.c | 23 +++++++++++++++++++++--
2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/drivers/block/zram/zcomp_lz4.c b/drivers/block/zram/zcomp_lz4.c
index ee44b51..f2bfced 100644
--- a/drivers/block/zram/zcomp_lz4.c
+++ b/drivers/block/zram/zcomp_lz4.c
@@ -10,17 +10,36 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/lz4.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>

#include "zcomp_lz4.h"

static void *zcomp_lz4_create(void)
{
- return kzalloc(LZ4_MEM_COMPRESS, GFP_NOIO);
+ void *ret;
+
+ /*
+ * This function can be called in swapout/fs write path
+ * so we can't use GFP_FS|IO. And it assumes we already
+ * have at least one stream in zram initialization so we
+ * don't do best effort to allocate more stream in here.
+ * A default stream will work well without further multiple
+ * streams. That's why we use NORETRY | NOWARN | NOMEMALLOC.
+ */
+ ret = kzalloc(LZ4_MEM_COMPRESS, GFP_NOIO | __GFP_NORETRY |
+ __GFP_NOWARN | __GFP_NOMEMALLOC);
+ if (!ret)
+ ret = __vmalloc(LZ4_MEM_COMPRESS,
+ GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | __GFP_ZERO | __GFP_HIGHMEM,
+ PAGE_KERNEL);
+ return ret;
}

static void zcomp_lz4_destroy(void *private)
{
- kfree(private);
+ kvfree(private);
}

static int zcomp_lz4_compress(const unsigned char *src, unsigned char *dst,
diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
index 683ce04..7fbb4a3 100644
--- a/drivers/block/zram/zcomp_lzo.c
+++ b/drivers/block/zram/zcomp_lzo.c
@@ -10,17 +10,36 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/lzo.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>

#include "zcomp_lzo.h"

static void *lzo_create(void)
{
- return kzalloc(LZO1X_MEM_COMPRESS, GFP_NOIO);
+ void *ret;
+
+ /*
+ * This function can be called in swapout/fs write path
+ * so we can't use GFP_FS|IO. And it assumes we already
+ * have at least one stream in zram initialization so we
+ * don't do best effort to allocate more stream in here.
+ * A default stream will work well without further multiple
+ * streams. That's why we use NORETRY | NOWARN | NOMEMALLOC.
+ */
+ ret = kzalloc(LZO1X_MEM_COMPRESS, GFP_NOIO | __GFP_NORETRY |
+ __GFP_NOWARN | __GFP_NOMEMALLOC);
+ if (!ret)
+ ret = __vmalloc(LZO1X_MEM_COMPRESS,
+ GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | __GFP_ZERO | __GFP_HIGHMEM,
+ PAGE_KERNEL);
+ return ret;
}

static void lzo_destroy(void *private)
{
- kfree(private);
+ kvfree(private);
}

static int lzo_compress(const unsigned char *src, unsigned char *dst,
--
2.6.3.368.gf34be46

2015-11-30 07:09:45

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] zram/zcomp: use GFP_NOIO to allocate streams

On Fri, Nov 27, 2015 at 01:10:48PM +0900, Sergey Senozhatsky wrote:
> From: Sergey Senozhatsky <[email protected]>
>
> We can end up allocating a new compression stream with GFP_KERNEL
> from within the IO path, which may result is nested (recursive) IO
> operations. That can introduce problems if the IO path in question
> is a reclaimer, holding some locks that will deadlock nested IOs.
>
> Allocate streams and working memory using GFP_NOIO flag, forbidding
> recursive IO and FS operations.
>
> An example:
>
> [ 747.233722] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
> [ 747.233724] git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 747.233725] (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
> [ 747.233733] {IN-RECLAIM_FS-W} state was registered at:
> [ 747.233735] [<ffffffff8107b8e9>] __lock_acquire+0x8da/0x117b
> [ 747.233738] [<ffffffff8107c950>] lock_acquire+0x10c/0x1a7
> [ 747.233740] [<ffffffff811e323e>] start_this_handle+0x52d/0x555
> [ 747.233742] [<ffffffff811e331a>] jbd2__journal_start+0xb4/0x237
> [ 747.233744] [<ffffffff811cc6c7>] __ext4_journal_start_sb+0x108/0x17e
> [ 747.233748] [<ffffffff811a90bf>] ext4_dirty_inode+0x32/0x61
> [ 747.233750] [<ffffffff8115f37e>] __mark_inode_dirty+0x16b/0x60c
> [ 747.233754] [<ffffffff81150ad6>] iput+0x11e/0x274
> [ 747.233757] [<ffffffff8114bfbd>] __dentry_kill+0x148/0x1b8
> [ 747.233759] [<ffffffff8114c9d9>] shrink_dentry_list+0x274/0x44a
> [ 747.233761] [<ffffffff8114d38a>] prune_dcache_sb+0x4a/0x55
> [ 747.233763] [<ffffffff8113b1ad>] super_cache_scan+0xfc/0x176
> [ 747.233767] [<ffffffff810fa089>] shrink_slab.part.14.constprop.25+0x2a2/0x4d3
> [ 747.233770] [<ffffffff810fcccb>] shrink_zone+0x74/0x140
> [ 747.233772] [<ffffffff810fd924>] kswapd+0x6b7/0x930
> [ 747.233774] [<ffffffff81058887>] kthread+0x107/0x10f
> [ 747.233778] [<ffffffff814fadff>] ret_from_fork+0x3f/0x70
> [ 747.233783] irq event stamp: 138297
> [ 747.233784] hardirqs last enabled at (138297): [<ffffffff8107aff3>] debug_check_no_locks_freed+0x113/0x12f
> [ 747.233786] hardirqs last disabled at (138296): [<ffffffff8107af13>] debug_check_no_locks_freed+0x33/0x12f
> [ 747.233788] softirqs last enabled at (137818): [<ffffffff81040f89>] __do_softirq+0x2d3/0x3e9
> [ 747.233792] softirqs last disabled at (137813): [<ffffffff81041292>] irq_exit+0x41/0x95
> [ 747.233794]
> other info that might help us debug this:
> [ 747.233796] Possible unsafe locking scenario:
> [ 747.233797] CPU0
> [ 747.233798] ----
> [ 747.233799] lock(jbd2_handle);
> [ 747.233801] <Interrupt>
> [ 747.233801] lock(jbd2_handle);
> [ 747.233803]
> *** DEADLOCK ***
> [ 747.233805] 5 locks held by git/20158:
> [ 747.233806] #0: (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
> [ 747.233811] #1: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
> [ 747.233817] #2: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
> [ 747.233822] #3: (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
> [ 747.233827] #4: (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
> [ 747.233831]
> stack backtrace:
> [ 747.233834] CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
> [ 747.233837] ffff8800a56cea40 ffff88010d0a75f8 ffffffff814f446d ffffffff81077036
> [ 747.233840] ffffffff823a84b0 ffff88010d0a7638 ffffffff814f3849 0000000000000001
> [ 747.233843] 000000000000000a ffff8800a56cf6f8 ffff8800a56cea40 ffffffff810795dd
> [ 747.233846] Call Trace:
> [ 747.233849] [<ffffffff814f446d>] dump_stack+0x4c/0x6e
> [ 747.233852] [<ffffffff81077036>] ? up+0x39/0x3e
> [ 747.233854] [<ffffffff814f3849>] print_usage_bug.part.23+0x25b/0x26a
> [ 747.233857] [<ffffffff810795dd>] ? print_shortest_lock_dependencies+0x182/0x182
> [ 747.233859] [<ffffffff8107a9c9>] mark_lock+0x384/0x56d
> [ 747.233862] [<ffffffff8107ac11>] mark_held_locks+0x5f/0x76
> [ 747.233865] [<ffffffffa023d2f3>] ? zcomp_strm_alloc+0x25/0x73 [zram]
> [ 747.233867] [<ffffffff8107d13b>] lockdep_trace_alloc+0xb2/0xb5
> [ 747.233870] [<ffffffff8112bac7>] kmem_cache_alloc_trace+0x32/0x1e2
> [ 747.233873] [<ffffffffa023d2f3>] zcomp_strm_alloc+0x25/0x73 [zram]
> [ 747.233876] [<ffffffffa023d428>] zcomp_strm_multi_find+0xe7/0x173 [zram]
> [ 747.233879] [<ffffffffa023d58b>] zcomp_strm_find+0xc/0xe [zram]
> [ 747.233881] [<ffffffffa023f292>] zram_bvec_rw+0x2ca/0x7e0 [zram]
> [ 747.233885] [<ffffffffa023fa8c>] zram_make_request+0x1fa/0x301 [zram]
> [ 747.233889] [<ffffffff812142f8>] generic_make_request+0x9c/0xdb
> [ 747.233891] [<ffffffff8121442e>] submit_bio+0xf7/0x120
> [ 747.233895] [<ffffffff810f1c0c>] ? __test_set_page_writeback+0x1a0/0x1b8
> [ 747.233897] [<ffffffff811a9d00>] ext4_io_submit+0x2e/0x43
> [ 747.233899] [<ffffffff811a9efa>] ext4_bio_write_page+0x1b7/0x300
> [ 747.233902] [<ffffffff811a2106>] mpage_submit_page+0x60/0x77
> [ 747.233905] [<ffffffff811a25b0>] mpage_map_and_submit_buffers+0x10f/0x21d
> [ 747.233907] [<ffffffff811a6814>] ext4_writepages+0xc8c/0xe1b
> [ 747.233910] [<ffffffff810f3f77>] do_writepages+0x23/0x2c
> [ 747.233913] [<ffffffff810ea5d1>] __filemap_fdatawrite_range+0x84/0x8b
> [ 747.233915] [<ffffffff810ea657>] filemap_flush+0x1c/0x1e
> [ 747.233917] [<ffffffff811a3851>] ext4_alloc_da_blocks+0xb8/0x117
> [ 747.233919] [<ffffffff811af52a>] ext4_rename+0x132/0x6dc
> [ 747.233921] [<ffffffff8107ac11>] ? mark_held_locks+0x5f/0x76
> [ 747.233924] [<ffffffff811afafd>] ext4_rename2+0x29/0x2b
> [ 747.233926] [<ffffffff811427ea>] vfs_rename+0x540/0x636
> [ 747.233928] [<ffffffff81146a01>] SyS_renameat2+0x359/0x44d
> [ 747.233931] [<ffffffff81146b26>] SyS_rename+0x1e/0x20
> [ 747.233933] [<ffffffff814faa17>] entry_SYSCALL_64_fastpath+0x12/0x6f
>
> [minchan: add stable mark]
> Cc: [email protected]
> Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>

2015-11-30 07:10:40

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On Fri, Nov 27, 2015 at 01:10:49PM +0900, Sergey Senozhatsky wrote:
> From: Kyeongdon Kim <[email protected]>
>
> When we're using LZ4 multi compression streams for zram swap,
> we found out page allocation failure message in system running test.
> That was not only once, but a few(2 - 5 times per test).
> Also, some failure cases were continually occurring to try allocation
> order 3.
>
> In order to make parallel compression private data, we should call
> kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
> 2/3 size memory to allocate in that time, page allocation fails.
> This patch makes to use vmalloc() as fallback of kmalloc(), this
> prevents page alloc failure warning.
>
> After using this, we never found warning message in running test, also
> It could reduce process startup latency about 60-120ms in each case.
>
> For reference a call trace :
>
> Binder_1: page allocation failure: order:3, mode:0x10c0d0
> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
> Call trace:
> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
> [<ffffffc000206c48>] show_stack+0x10/0x1c
> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
>
> [minchan: change vmalloc gfp and adding comment about gfp]
> [sergey: tweak comments and styles]
> Signed-off-by: Kyeongdon Kim <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>

Kyeongdon, Could you test this patch on your device?

Thanks.

2015-11-30 10:42:07

by Kyeongdon Kim

[permalink] [raw]
Subject: Re: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On 2015-11-30 오후 4:10, Minchan Kim wrote:
> On Fri, Nov 27, 2015 at 01:10:49PM +0900, Sergey Senozhatsky wrote:
>> From: Kyeongdon Kim <[email protected]>
>>
>> When we're using LZ4 multi compression streams for zram swap,
>> we found out page allocation failure message in system running test.
>> That was not only once, but a few(2 - 5 times per test).
>> Also, some failure cases were continually occurring to try allocation
>> order 3.
>>
>> In order to make parallel compression private data, we should call
>> kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
>> 2/3 size memory to allocate in that time, page allocation fails.
>> This patch makes to use vmalloc() as fallback of kmalloc(), this
>> prevents page alloc failure warning.
>>
>> After using this, we never found warning message in running test, also
>> It could reduce process startup latency about 60-120ms in each case.
>>
>> For reference a call trace :
>>
>> Binder_1: page allocation failure: order:3, mode:0x10c0d0
>> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty
> #20
>> Call trace:
>> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
>> [<ffffffc000206c48>] show_stack+0x10/0x1c
>> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
>> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
>> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
>> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
>> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
>> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
>> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
>> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
>> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
>> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
>> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
>> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
>> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
>> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
>> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
>> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
>> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
>> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
>> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
>> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
>> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
>> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
>> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
>> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
>> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
>> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
>> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
>>
>> [minchan: change vmalloc gfp and adding comment about gfp]
>> [sergey: tweak comments and styles]
>> Signed-off-by: Kyeongdon Kim <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>
> Kyeongdon, Could you test this patch on your device?
>
> Thanks.

Sorry to have kept you waiting,
Obviously, I couldn't see allocation fail message with this patch.
But, there is something to make some delay(not sure yet this is normal).

static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
{
<snip>

zstrm->private = comp->backend->create();
^ // sometimes, return 'null' continually(2-5times)

As you know, if there is 'null' return, this function is called again to
get a memory in while() loop. I just checked this one with printk().

If you guys don't mind, I'll test more with trace log to check time delay.

However, If this is fully expectable status to you.
I think I don't need to do it.

Thanks,
Kyeongdon Kim

2015-11-30 11:14:06

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On (11/30/15 19:42), kyeongdon.kim wrote:
[..]
> Sorry to have kept you waiting,
> Obviously, I couldn't see allocation fail message with this patch.
> But, there is something to make some delay(not sure yet this is normal).

what delay? how significant it is? do you see it in practice or it's just
a guess?

> static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> {
> <snip>
>
> zstrm->private = comp->backend->create();
> ^ // sometimes, return 'null' continually(2-5times)
>
> As you know, if there is 'null' return, this function is called again to
> get a memory in while() loop. I just checked this one with printk().

well, not always.

a) current wait_event() for available stream to become idle.
b) once current awaken it attempts to get an idle stream
c) if zstrm then return
d) if there is no idle stream then goto a)
e) else try to allocate stream again, if !zstrm goto a), else return

while (1) {
spin_lock(&zs->strm_lock);
if (!list_empty(&zs->idle_strm)) {
zstrm = list_entry(zs->idle_strm.next,
struct zcomp_strm, list);
list_del(&zstrm->list);
spin_unlock(&zs->strm_lock);
return zstrm;
}
/* zstrm streams limit reached, wait for idle stream */
if (zs->avail_strm >= zs->max_strm) {
spin_unlock(&zs->strm_lock);
wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
continue;
}
/* allocate new zstrm stream */
zs->avail_strm++;
spin_unlock(&zs->strm_lock);

zstrm = zcomp_strm_alloc(comp);
if (!zstrm) {
spin_lock(&zs->strm_lock);
zs->avail_strm--;
spin_unlock(&zs->strm_lock);
wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
continue;
}
break;
}

so it's possible for current to zcomp_strm_alloc() several times...

do you see the same process doing N zcomp_strm_alloc() calls, or it's N processes
doing one zcomp_strm_alloc()? I think the latter one is more likely; once we failed
to zcomp_strm_alloc() quite possible that N concurrent or succeeding IOs will do
the same. That's why I proposed to decrease ->max_strm; but we basically don't know
when we shall rollback it to the original value; I'm not sure I want to do something
like: every 42nd IO try to increment ->max_strm by one, until it's less than the
original value.

so I'd probably prefer to keep it the way it is; but let's see the numbers from
you first.

-ss

2015-12-01 00:32:48

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On (12/01/15 08:18), Minchan Kim wrote:
[..]
> > As you know, if there is 'null' return, this function is called again to
> > get a memory in while() loop. I just checked this one with printk().
> >
> > If you guys don't mind, I'll test more with trace log to check time delay.
>
> No problem.
>
> >
> > However, If this is fully expectable status to you.
> > I think I don't need to do it.
>
> It's not what I expected. Actually, I thought failure of vmalloc
> in that place should be *really really* rare. I think it's caused by
> __GFP_NOMEMALLOC so I want to see test result without the flag.

hm, agree. otherwise the whole vmalloc() fallback thing adds a
little value. additional streams are really not that important
to waste emergency memory. a stream, once allocated, stays
forever (until user decrease the ->max_strm).

> Thanks for the careful test!

yes, thank you Kyeongdon.

-ss

2015-12-01 02:04:19

by Kyeongdon Kim

[permalink] [raw]
Subject: Re: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On 2015-11-30 오후 8:14, Sergey Senozhatsky wrote:
> On (11/30/15 19:42), kyeongdon.kim wrote:
> [..]
>> Sorry to have kept you waiting,
>> Obviously, I couldn't see allocation fail message with this patch.
>> But, there is something to make some delay(not sure yet this is normal).
>
> what delay? how significant it is? do you see it in practice or it's just
> a guess?
>
Now, I just checked with printk() log in zcomp_lz4_create()/lzo_create()
to see address 'ret'. and these 'null' values are called several times
from kzalloc. also __vmalloc. - these are making the delay.
So, not significant status I guess. but if this allocation try is many.
I doubt is is fine.

>> static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
>> {
>> <snip>
>>
>> zstrm->private = comp->backend->create();
>> ^ // sometimes, return 'null' continually(2-5times)
>>
>> As you know, if there is 'null' return, this function is called again to
>> get a memory in while() loop. I just checked this one with printk().
>
> well, not always.
>
> a) current wait_event() for available stream to become idle.
> b) once current awaken it attempts to get an idle stream
> c) if zstrm then return
> d) if there is no idle stream then goto a)
> e) else try to allocate stream again, if !zstrm goto a), else return
>
> while (1) {
> spin_lock(&zs->strm_lock);
> if (!list_empty(&zs->idle_strm)) {
> zstrm = list_entry(zs->idle_strm.next,
> struct zcomp_strm, list);
> list_del(&zstrm->list);
> spin_unlock(&zs->strm_lock);
> return zstrm;
> }
> /* zstrm streams limit reached, wait for idle stream */
> if (zs->avail_strm >= zs->max_strm) {
> spin_unlock(&zs->strm_lock);
> wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
> continue;
> }
> /* allocate new zstrm stream */
> zs->avail_strm++;
> spin_unlock(&zs->strm_lock);
>
> zstrm = zcomp_strm_alloc(comp);
> if (!zstrm) {
> spin_lock(&zs->strm_lock);
> zs->avail_strm--;
> spin_unlock(&zs->strm_lock);
> wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
> continue;
> }
> break;
> }
>
> so it's possible for current to zcomp_strm_alloc() several times...
>
> do you see the same process doing N zcomp_strm_alloc() calls, or it's N
> processes
> doing one zcomp_strm_alloc()? I think the latter one is more likely;
> once we failed
> to zcomp_strm_alloc() quite possible that N concurrent or succeeding IOs
> will do
> the same. That's why I proposed to decrease ->max_strm; but we basically
> don't know
> when we shall rollback it to the original value; I'm not sure I want to
> do something
> like: every 42nd IO try to increment ->max_strm by one, until it's less
> than the
> original value.
>
> so I'd probably prefer to keep it the way it is; but let's see the
> numbers from
> you first.
>
> -ss

I didn't check detailed yet.I'll explain after checking this.

Thanks,
Kyeongdon Kim

2015-12-01 02:32:06

by Kyeongdon Kim

[permalink] [raw]
Subject: Re: Re: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On 2015-12-01 오전 8:18, Minchan Kim wrote:
> Hi Kyeongdon,
>
> On Mon, Nov 30, 2015 at 07:42:02PM +0900, kyeongdon.kim wrote:
>
>> > On Fri, Nov 27, 2015 at 01:10:49PM +0900, Sergey Senozhatsky wrote:
>> >> From: Kyeongdon Kim <[email protected]>
>> >>
>> >> When we're using LZ4 multi compression streams for zram swap,
>> >> we found out page allocation failure message in system running test.
>> >> That was not only once, but a few(2 - 5 times per test).
>> >> Also, some failure cases were continually occurring to try allocation
>> >> order 3.
>> >>
>> >> In order to make parallel compression private data, we should call
>> >> kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
>> >> 2/3 size memory to allocate in that time, page allocation fails.
>> >> This patch makes to use vmalloc() as fallback of kmalloc(), this
>> >> prevents page alloc failure warning.
>> >>
>> >> After using this, we never found warning message in running test, also
>> >> It could reduce process startup latency about 60-120ms in each case.
>> >>
>> >> For reference a call trace :
>> >>
>> >> Binder_1: page allocation failure: order:3, mode:0x10c0d0
>> >> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty
>> > #20
>> >> Call trace:
>> >> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
>> >> [<ffffffc000206c48>] show_stack+0x10/0x1c
>> >> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
>> >> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
>> >> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
>> >> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
>> >> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
>> >> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
>> >> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
>> >> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
>> >> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
>> >> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
>> >> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
>> >> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
>> >> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
>> >> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
>> >> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
>> >> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
>> >> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
>> >> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
>> >> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
>> >> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
>> >> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
>> >> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
>> >> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
>> >> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
>> >> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
>> >> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
>> >> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
>> >>
>> >> [minchan: change vmalloc gfp and adding comment about gfp]
>> >> [sergey: tweak comments and styles]
>> >> Signed-off-by: Kyeongdon Kim <[email protected]>
>> >> Signed-off-by: Minchan Kim <[email protected]>
>> >
>> > Kyeongdon, Could you test this patch on your device?
>> >
>> > Thanks.
>>
>> Sorry to have kept you waiting,
>> Obviously, I couldn't see allocation fail message with this patch.
>> But, there is something to make some delay(not sure yet this is normal).
>
> You mean new changes makes start-up delay of your application sometime
> still,
> but not frequent like old?
>
I couldn't see start-up delay during my test after this patch.
But, I checked the return value from alloc function like the below :

static void *zcomp_lz4_create(void)
<snip>
ret = kzalloc(LZ4_MEM_COMPRESS, GFP_NOIO | __GFP_NORETRY |
__GFP_NOWARN | __GFP_NOMEMALLOC);
printk("%s: %d: ret = %p\n",__func__,__LINE__,ret); //line 32
if (!ret) {
ret = __vmalloc(LZ4_MEM_COMPRESS,
GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN |
__GFP_NOMEMALLOC | __GFP_ZERO | __GFP_HIGHMEM,
PAGE_KERNEL);
printk("%s: %d: ret = %p\n",__func__,__LINE__,ret); //line 38
}
return ret;

log message :
[ 352.226014][0] zcomp_lz4_create: 32: ret = (null)
[ 352.226035][0] zcomp_lz4_create: 38: ret = (null)
[ 352.226791][0] zcomp_lz4_create: 32: ret = (null)
[ 352.226809][0] zcomp_lz4_create: 38: ret = (null)
[ 352.230348][0] zcomp_lz4_create: 32: ret = (null)
[ 352.230369][0] zcomp_lz4_create: 38: ret = (null)
[ 352.230460][0] zcomp_lz4_create: 32: ret = (null)
[ 352.230485][0] zcomp_lz4_create: 38: ret = (null)
[ 352.230507][0] zcomp_lz4_create: 32: ret = (null)
[ 352.230520][0] zcomp_lz4_create: 38: ret = (null)
[ 352.230608][0] zcomp_lz4_create: 32: ret = (null)
[ 352.230619][0] zcomp_lz4_create: 38: ret = (null)
[ 352.230888][0] zcomp_lz4_create: 32: ret = (null)
[ 352.230902][0] zcomp_lz4_create: 38: ret = (null)
[ 352.231406][0] zcomp_lz4_create: 32: ret = ffffffc002088000
[ 352.234024][0] zcomp_lz4_create: 32: ret = (null)
[ 352.234060][0] zcomp_lz4_create: 38: ret = (null)
[ 352.234359][0] zcomp_lz4_create: 32: ret = (null)
[ 352.234384][0] zcomp_lz4_create: 38: ret = (null)
[ 352.234618][0] zcomp_lz4_create: 32: ret = (null)
[ 352.234639][0] zcomp_lz4_create: 38: ret = (null)
[ 352.234667][0] zcomp_lz4_create: 32: ret = (null)
[ 352.234685][0] zcomp_lz4_create: 38: ret = (null)
[ 352.234738][0] zcomp_lz4_create: 32: ret = (null)
[ 352.234748][0] zcomp_lz4_create: 38: ret = (null)
[ 352.234800][0] zcomp_lz4_create: 32: ret = (null)
[ 352.234816][0] zcomp_lz4_create: 38: ret = (null)
[ 352.234852][0] zcomp_lz4_create: 32: ret = (null)
[ 352.234865][0] zcomp_lz4_create: 38: ret = (null)
[ 352.235136][0] zcomp_lz4_create: 32: ret = (null)
[ 352.235179][0] zcomp_lz4_create: 38: ret = ffffff80016a4000

I thought this pattern from vmalloc is not normal.
>>
>> static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
>> {
>> <snip>
>>
>> zstrm->private = comp->backend->create();
>> ^ // sometimes, return 'null' continually(2-5times)
>
> Hmm, I think it is caused by __GFP_NOMEMALLOC.
> Could you test it without the flag?
>
>>
>> As you know, if there is 'null' return, this function is called again to
>> get a memory in while() loop. I just checked this one with printk().
>>
>> If you guys don't mind, I'll test more with trace log to check time
> delay.
>
> No problem.
>
>>
>> However, If this is fully expectable status to you.
>> I think I don't need to do it.
>
> It's not what I expected. Actually, I thought failure of vmalloc
> in that place should be *really really* rare. I think it's caused by
> __GFP_NOMEMALLOC so I want to see test result without the flag.
>
> Thanks for the careful test!
>
You're welcome.

After I removed flag '__GFP_NOMEMALLOC', I couldn't find return 'null'
from vmalloc until now.

log message :
<4>[ 2288.954934][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
<4>[ 2288.954972][0] KDKIM: zcomp_lz4_create: 30: ret = ffffff800287e000
..<snip>..
<4>[ 2289.092411][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
<4>[ 2289.092546][0] KDKIM: zcomp_lz4_create: 30: ret = ffffff80028b5000
..<snip>..
<4>[ 2289.135628][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
<4>[ 2289.135642][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
<4>[ 2289.135729][0] KDKIM: zcomp_lz4_create: 30: ret = ffffff80028be000
<4>[ 2289.135732][0] KDKIM: zcomp_lz4_create: 30: ret = ffffff80028c7000

Thanks,
Kyeongdon Kim
>
>>
>> Thanks,
>> Kyeongdon Kim

2015-12-01 04:44:29

by Minchan Kim

[permalink] [raw]
Subject: Re: Re: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On Tue, Dec 01, 2015 at 11:31:41AM +0900, kyeongdon.kim wrote:
> On 2015-12-01 오전 8:18, Minchan Kim wrote:
> > Hi Kyeongdon,
> >
> > On Mon, Nov 30, 2015 at 07:42:02PM +0900, kyeongdon.kim wrote:
> >
> >> > On Fri, Nov 27, 2015 at 01:10:49PM +0900, Sergey Senozhatsky wrote:
> >> >> From: Kyeongdon Kim <[email protected]>
> >> >>
> >> >> When we're using LZ4 multi compression streams for zram swap,
> >> >> we found out page allocation failure message in system running test.
> >> >> That was not only once, but a few(2 - 5 times per test).
> >> >> Also, some failure cases were continually occurring to try allocation
> >> >> order 3.
> >> >>
> >> >> In order to make parallel compression private data, we should call
> >> >> kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
> >> >> 2/3 size memory to allocate in that time, page allocation fails.
> >> >> This patch makes to use vmalloc() as fallback of kmalloc(), this
> >> >> prevents page alloc failure warning.
> >> >>
> >> >> After using this, we never found warning message in running test, also
> >> >> It could reduce process startup latency about 60-120ms in each case.
> >> >>
> >> >> For reference a call trace :
> >> >>
> >> >> Binder_1: page allocation failure: order:3, mode:0x10c0d0
> >> >> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty
> >> > #20
> >> >> Call trace:
> >> >> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
> >> >> [<ffffffc000206c48>] show_stack+0x10/0x1c
> >> >> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
> >> >> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
> >> >> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
> >> >> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> >> >> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
> >> >> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
> >> >> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
> >> >> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
> >> >> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
> >> >> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
> >> >> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
> >> >> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
> >> >> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
> >> >> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
> >> >> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
> >> >> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
> >> >> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
> >> >> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
> >> >> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
> >> >> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
> >> >> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
> >> >> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> >> >> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
> >> >> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
> >> >> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
> >> >> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> >> >> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
> >> >>
> >> >> [minchan: change vmalloc gfp and adding comment about gfp]
> >> >> [sergey: tweak comments and styles]
> >> >> Signed-off-by: Kyeongdon Kim <[email protected]>
> >> >> Signed-off-by: Minchan Kim <[email protected]>
> >> >
> >> > Kyeongdon, Could you test this patch on your device?
> >> >
> >> > Thanks.
> >>
> >> Sorry to have kept you waiting,
> >> Obviously, I couldn't see allocation fail message with this patch.
> >> But, there is something to make some delay(not sure yet this is normal).
> >
> > You mean new changes makes start-up delay of your application sometime
> > still,
> > but not frequent like old?
> >
> I couldn't see start-up delay during my test after this patch.
> But, I checked the return value from alloc function like the below :
>
> static void *zcomp_lz4_create(void)
> <snip>
> ret = kzalloc(LZ4_MEM_COMPRESS, GFP_NOIO | __GFP_NORETRY |
> __GFP_NOWARN | __GFP_NOMEMALLOC);
> printk("%s: %d: ret = %p\n",__func__,__LINE__,ret); //line 32
> if (!ret) {
> ret = __vmalloc(LZ4_MEM_COMPRESS,
> GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN |
> __GFP_NOMEMALLOC | __GFP_ZERO | __GFP_HIGHMEM,
> PAGE_KERNEL);
> printk("%s: %d: ret = %p\n",__func__,__LINE__,ret); //line 38
> }
> return ret;

OK, I got it.

>
> log message :
> [ 352.226014][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.226035][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.226791][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.226809][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230348][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230369][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230460][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230485][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230507][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230520][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230608][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230619][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230888][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230902][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.231406][0] zcomp_lz4_create: 32: ret = ffffffc002088000
> [ 352.234024][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234060][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234359][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234384][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234618][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234639][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234667][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234685][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234738][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234748][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234800][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234816][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234852][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234865][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.235136][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.235179][0] zcomp_lz4_create: 38: ret = ffffff80016a4000
>
> I thought this pattern from vmalloc is not normal.
> >>
> >> static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> >> {
> >> <snip>
> >>
> >> zstrm->private = comp->backend->create();
> >> ^ // sometimes, return 'null' continually(2-5times)
> >
> > Hmm, I think it is caused by __GFP_NOMEMALLOC.
> > Could you test it without the flag?
> >
> >>
> >> As you know, if there is 'null' return, this function is called again to
> >> get a memory in while() loop. I just checked this one with printk().
> >>
> >> If you guys don't mind, I'll test more with trace log to check time
> > delay.
> >
> > No problem.
> >
> >>
> >> However, If this is fully expectable status to you.
> >> I think I don't need to do it.
> >
> > It's not what I expected. Actually, I thought failure of vmalloc
> > in that place should be *really really* rare. I think it's caused by
> > __GFP_NOMEMALLOC so I want to see test result without the flag.
> >
> > Thanks for the careful test!
> >
> You're welcome.
>
> After I removed flag '__GFP_NOMEMALLOC', I couldn't find return 'null'
> from vmalloc until now.

Thanks for the test.
You said you cannot see the delay any more so it's fine as it is.

Regardless of it, I want to remove __GFP_NOMEMALLOC in the gfp_mask.
The reason we decided to add vmalloc fallback is that we expect that
it is likely to succeed a few order-0 pages to allocate (sizeof(buffer)
> PAGE_SIZE) although it was failed by kmalloc.
But what I missing is that zram-swap is working in direct reclaim
normally so it is more flexible to use emegency memory pool(
e,g, zsmalloc doesn't use __GFP_NOMEMALLOC) so there is no point
to use __GFP_NOMEMALLOC in here.

2015-12-01 05:15:53

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On (12/01/15 13:55), Minchan Kim wrote:
[..]
> To clear my opinion,
>
> lzo_create(gfp_t flags)
> {
> void * ret = kmalloc(LZO1X_MEM_COMPRESS, flags);
> if (!ret)
> ret = vmalloc(LZO1X_MEM_COMPRESS, flasgs | GFP_NOMEMALLOC);
> return ret;
> }

ah, ok, I see. I've a question.

we had
kmalloc(f | __GFP_NOMEMALLOC)
__vmalloc(f | __GFP_NOMEMALLOC)

which produced high failure rates for both kmalloc() and __vmalloc()

test #1

> > > log message :
[..]
> > > [ 352.230608][0] zcomp_lz4_create: 32: ret = (null)
> > > [ 352.230619][0] zcomp_lz4_create: 38: ret = (null)
> > > [ 352.230888][0] zcomp_lz4_create: 32: ret = (null)
> > > [ 352.230902][0] zcomp_lz4_create: 38: ret = (null)
> > > [ 352.231406][0] zcomp_lz4_create: 32: ret = ffffffc002088000
> > > [ 352.234024][0] zcomp_lz4_create: 32: ret = (null)
> > > [ 352.234060][0] zcomp_lz4_create: 38: ret = (null)
> > > [ 352.234359][0] zcomp_lz4_create: 32: ret = (null)
[..]
> > > [ 352.234384][0] zcomp_lz4_create: 38: ret = (null)
> > > [ 352.234618][0] zcomp_lz4_create: 32: ret = (null)
> > > [ 352.234639][0] zcomp_lz4_create: 38: ret = (null)
> > > [ 352.234667][0] zcomp_lz4_create: 32: ret = (null)
> > > [ 352.235179][0] zcomp_lz4_create: 38: ret = ffffff80016a4000



Kyeongdon, do I understand correctly, that for the second test you
removed '__GFP_NOMEMALLOC' from both kmalloc() and __vmalloc()?

iow:
kmalloc(f & ~__GFP_NOMEMALLOC)
vmalloc(f & ~__GFP_NOMEMALLOC)

test #2 : almost always failing kmalloc() and !NULL __vmalloc()

> > > log message :
> > > <4>[ 2288.954934][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
> > > <4>[ 2288.954972][0] KDKIM: zcomp_lz4_create: 30: ret = ffffff800287e000
> > > ..<snip>..
> > > <4>[ 2289.092411][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
> > > <4>[ 2289.092546][0] KDKIM: zcomp_lz4_create: 30: ret = ffffff80028b5000
> > > ..<snip>..
> > > <4>[ 2289.135628][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
> > > <4>[ 2289.135642][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
> > > <4>[ 2289.135729][0] KDKIM: zcomp_lz4_create: 30: ret = ffffff80028be000
> > > <4>[ 2289.135732][0] KDKIM: zcomp_lz4_create: 30: ret = ffffff80028c7000


if this is the case (__GFP_NOMEMALLOC removed from both kmalloc and __vmalloc),
then proposed

kmalloc(f & ~__GFP_NOMEMALLOC)
__vmalloc(f | __GFP_NOMEMALLOC)


can be very close to 'test #1 && test #2':

kmalloc() fails (as in test #2)
__vmalloc() fails (as in test #1)

isn't it?

-ss

2015-12-01 06:36:00

by Kyeongdon Kim

[permalink] [raw]
Subject: Re: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()


On 2015-12-01 오후 2:16, Sergey Senozhatsky wrote:
> On (12/01/15 13:55), Minchan Kim wrote:
> [..]
>> To clear my opinion,
>>
>> lzo_create(gfp_t flags)
>> {
>> void * ret = kmalloc(LZO1X_MEM_COMPRESS, flags);
>> if (!ret)
>> ret = vmalloc(LZO1X_MEM_COMPRESS, flasgs | GFP_NOMEMALLOC);
>> return ret;
>> }
>
> ah, ok, I see. I've a question.
>
> we had
> kmalloc(f | __GFP_NOMEMALLOC)
> __vmalloc(f | __GFP_NOMEMALLOC)
>
> which produced high failure rates for both kmalloc() and __vmalloc()
>
> test #1
>
>> > > log message :
> [..]
>> > > [ 352.230608][0] zcomp_lz4_create: 32: ret = (null)
>> > > [ 352.230619][0] zcomp_lz4_create: 38: ret = (null)
>> > > [ 352.230888][0] zcomp_lz4_create: 32: ret = (null)
>> > > [ 352.230902][0] zcomp_lz4_create: 38: ret = (null)
>> > > [ 352.231406][0] zcomp_lz4_create: 32: ret = ffffffc002088000
>> > > [ 352.234024][0] zcomp_lz4_create: 32: ret = (null)
>> > > [ 352.234060][0] zcomp_lz4_create: 38: ret = (null)
>> > > [ 352.234359][0] zcomp_lz4_create: 32: ret = (null)
> [..]
>> > > [ 352.234384][0] zcomp_lz4_create: 38: ret = (null)
>> > > [ 352.234618][0] zcomp_lz4_create: 32: ret = (null)
>> > > [ 352.234639][0] zcomp_lz4_create: 38: ret = (null)
>> > > [ 352.234667][0] zcomp_lz4_create: 32: ret = (null)
>> > > [ 352.235179][0] zcomp_lz4_create: 38: ret = ffffff80016a4000
>
>
>
> Kyeongdon, do I understand correctly, that for the second test you
> removed '__GFP_NOMEMALLOC' from both kmalloc() and __vmalloc()?
>
> iow:
> kmalloc(f & ~__GFP_NOMEMALLOC)
> vmalloc(f & ~__GFP_NOMEMALLOC)
>
> test #2 : almost always failing kmalloc() and !NULL __vmalloc()
>
>> > > log message :
>> > > <4>[ 2288.954934][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
>> > > <4>[ 2288.954972][0] KDKIM: zcomp_lz4_create: 30: ret =
> ffffff800287e000
>> > > ..<snip>..
>> > > <4>[ 2289.092411][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
>> > > <4>[ 2289.092546][0] KDKIM: zcomp_lz4_create: 30: ret =
> ffffff80028b5000
>> > > ..<snip>..
>> > > <4>[ 2289.135628][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
>> > > <4>[ 2289.135642][0] KDKIM: zcomp_lz4_create: 24: ret = (null)
>> > > <4>[ 2289.135729][0] KDKIM: zcomp_lz4_create: 30: ret =
> ffffff80028be000
>> > > <4>[ 2289.135732][0] KDKIM: zcomp_lz4_create: 30: ret =
> ffffff80028c7000
>
>
> if this is the case (__GFP_NOMEMALLOC removed from both kmalloc and
> __vmalloc),
> then proposed
>
> kmalloc(f & ~__GFP_NOMEMALLOC)
> __vmalloc(f | __GFP_NOMEMALLOC)
>
>
> can be very close to 'test #1 && test #2':
>
> kmalloc() fails (as in test #2)
> __vmalloc() fails (as in test #1)
>
> isn't it?
>
> -ss

Let me give you a simple code of it.

@test #1 (previous shared log)
kmalloc(f | __GFP_NOMEMALLOC)
__vmalloc(f | __GFP_NOMEMALLOC)
// can find failure both

@test #2 (previous shared log)
kmalloc(f | __GFP_NOMEMALLOC)
__vmalloc(f)
// removed '__GFP_NOMEMALLOC' from vmalloc() only, and cannot find
failure from vmalloc()

And like you said, I made a quick check to see a failure about kmalloc()
without the flag :

@test #3
kmalloc(f)
__vmalloc(f | __GFP_NOMEMALLOC)
// removed '__GFP_NOMEMALLOC' from zmalloc() only
// and cannot find failure from zmalloc(), but in this case, it's hard
to find failure from vmalloc() because of already allocation mostly from
zsmalloc()

log message (test #3) :
<4>[ 186.763605][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002030000
<4>[ 186.776652][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc0020f0000
<4>[ 186.811423][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002108000
<4>[ 186.816744][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002000000
<4>[ 186.816796][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002008000

@test #4
kmalloc(f)
__vmalloc(f)
// cannot find failure both until now

log message (test #4) :
<4>[ 641.440468][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002190000
<snip>
<4>[ 922.182980][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002208000
<snip>
<4>[ 923.197593][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002020000
<snip>
<4>[ 939.813499][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc0020a0000

So,is there another problem if we remove the flag from both sides?

Thanks,
Kyeongdon Kim

2015-12-01 07:14:44

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On (12/01/15 15:35), Kyeongdon Kim wrote:
[..]
> @test #4
> kmalloc(f)
> __vmalloc(f)
> // cannot find failure both until now
>
> log message (test #4) :
> <4>[ 641.440468][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002190000
> <snip>
> <4>[ 922.182980][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002208000
> <snip>
> <4>[ 923.197593][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002020000
> <snip>
> <4>[ 939.813499][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc0020a0000

Thanks!

> So,is there another problem if we remove the flag from both sides?
>

Technically, '~__GFP_NOMEMALLOC' is what we've been doing for some time (well,
always); and, as Minchan noted, zsmalloc does not depend on emergency pools.

I vote for removal of __GFP_NOMEMALLOC from both kmalloc() and __vmalloc().

(user can make ->max_strm big enough to deplete emergency mem; but I tend to
ignore it).

Minchan?

-ss

2015-12-01 07:23:38

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On (12/01/15 15:35), Kyeongdon Kim wrote:
> Let me give you a simple code of it.
>
> @test #1 (previous shared log)
> kmalloc(f | __GFP_NOMEMALLOC)
> __vmalloc(f | __GFP_NOMEMALLOC)
> // can find failure both
>
> @test #2 (previous shared log)
> kmalloc(f | __GFP_NOMEMALLOC)
> __vmalloc(f)
> // removed '__GFP_NOMEMALLOC' from vmalloc() only, and cannot find
> failure from vmalloc()
>
> And like you said, I made a quick check to see a failure about kmalloc()
> without the flag :
>
> @test #3
> kmalloc(f)
> __vmalloc(f | __GFP_NOMEMALLOC)
> // removed '__GFP_NOMEMALLOC' from zmalloc() only
> // and cannot find failure from zmalloc(), but in this case, it's hard
> to find failure from vmalloc() because of already allocation mostly from
> zsmalloc()
>

I assume, that "zsmalloc" and "zmalloc" here are meant to be "kzalloc (kmalloc)"

-ss

> log message (test #3) :
> <4>[ 186.763605][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002030000
> <4>[ 186.776652][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc0020f0000
> <4>[ 186.811423][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002108000
> <4>[ 186.816744][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002000000
> <4>[ 186.816796][1] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002008000
>
> @test #4
> kmalloc(f)
> __vmalloc(f)
> // cannot find failure both until now
>
> log message (test #4) :
> <4>[ 641.440468][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002190000
> <snip>
> <4>[ 922.182980][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002208000
> <snip>
> <4>[ 923.197593][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002020000
> <snip>
> <4>[ 939.813499][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc0020a0000
>
> So,is there another problem if we remove the flag from both sides?
>
> Thanks,
> Kyeongdon Kim
>

2015-12-01 07:32:12

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On (12/01/15 16:15), Sergey Senozhatsky wrote:
> On (12/01/15 15:35), Kyeongdon Kim wrote:
> [..]
> > @test #4
> > kmalloc(f)
> > __vmalloc(f)
> > // cannot find failure both until now
> >
> > log message (test #4) :
> > <4>[ 641.440468][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002190000
> > <snip>
> > <4>[ 922.182980][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002208000
> > <snip>
> > <4>[ 923.197593][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002020000
> > <snip>
> > <4>[ 939.813499][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc0020a0000
>
> Thanks!
>
> > So,is there another problem if we remove the flag from both sides?
> >
>
> Technically, '~__GFP_NOMEMALLOC' is what we've been doing for some time (well,
> always); and, as Minchan noted, zsmalloc does not depend on emergency pools.
>
> I vote for removal of __GFP_NOMEMALLOC from both kmalloc() and __vmalloc().
>

um.. which is very close to
"remove vmalloc() fallback and use kzalloc(f & ~__GFP_NOMEMALLOC) only"

-ss

2015-12-01 08:16:04

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On Tue, Dec 01, 2015 at 04:15:42PM +0900, Sergey Senozhatsky wrote:
> On (12/01/15 15:35), Kyeongdon Kim wrote:
> [..]
> > @test #4
> > kmalloc(f)
> > __vmalloc(f)
> > // cannot find failure both until now
> >
> > log message (test #4) :
> > <4>[ 641.440468][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002190000
> > <snip>
> > <4>[ 922.182980][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002208000
> > <snip>
> > <4>[ 923.197593][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002020000
> > <snip>
> > <4>[ 939.813499][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc0020a0000
>
> Thanks!
>
> > So,is there another problem if we remove the flag from both sides?
> >
>
> Technically, '~__GFP_NOMEMALLOC' is what we've been doing for some time (well,
> always); and, as Minchan noted, zsmalloc does not depend on emergency pools.
>
> I vote for removal of __GFP_NOMEMALLOC from both kmalloc() and __vmalloc().
>
> (user can make ->max_strm big enough to deplete emergency mem; but I tend to
> ignore it).
>
> Minchan?

Agree. Do you mind resending patches? :)

Thanks.

2015-12-01 09:10:10

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

On (12/01/15 17:16), Minchan Kim wrote:
> On Tue, Dec 01, 2015 at 04:15:42PM +0900, Sergey Senozhatsky wrote:
> > On (12/01/15 15:35), Kyeongdon Kim wrote:
> > [..]
> > > @test #4
> > > kmalloc(f)
> > > __vmalloc(f)
> > > // cannot find failure both until now
> > >
> > > log message (test #4) :
> > > <4>[ 641.440468][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002190000
> > > <snip>
> > > <4>[ 922.182980][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002208000
> > > <snip>
> > > <4>[ 923.197593][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc002020000
> > > <snip>
> > > <4>[ 939.813499][7] KDKIM: zcomp_lz4_create: 24: ret = ffffffc0020a0000
> >
> > Thanks!
> >
> > > So,is there another problem if we remove the flag from both sides?
> > >
> >
> > Technically, '~__GFP_NOMEMALLOC' is what we've been doing for some time (well,
> > always); and, as Minchan noted, zsmalloc does not depend on emergency pools.
> >
> > I vote for removal of __GFP_NOMEMALLOC from both kmalloc() and __vmalloc().
> >
> > (user can make ->max_strm big enough to deplete emergency mem; but I tend to
> > ignore it).
> >
> > Minchan?
>
> Agree. Do you mind resending patches? :)

OK, will do later today. Thanks.

-ss