2015-11-20 10:05:50

by Kyeongdon Kim

[permalink] [raw]
Subject: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

When we're using LZ4 multi compression streams for zram swap,
we found out page allocation failure message in system running test.
That was not only once, but a few(2 - 5 times per test).
Also, some failure cases were continually occurring to try allocation
order 3.

In order to make parallel compression private data, we should call
kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
2/3 size memory in that time, page allocation fails.
This patch makes to use vmalloc() as fallback of kmalloc(), this
prevents page alloc failure warning.

After this, we never found warning message in running test, also
It could reduce process startup latency about 60-120ms in each case.

For reference a call trace :

Binder_1: page allocation failure: order:3, mode:0x10c0d0
CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
Call trace:
[<ffffffc0002069c8>] dump_backtrace+0x0/0x270
[<ffffffc000206c48>] show_stack+0x10/0x1c
[<ffffffc000cb51c8>] dump_stack+0x1c/0x28
[<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
[<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
[<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
[<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
[<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
[<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
[<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
[<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
[<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
[<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
[<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
[<ffffffc00040f98c>] submit_bio+0xa4/0x15c
[<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
[<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
[<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
[<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
[<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
[<ffffffc0002c8894>] shrink_zone+0x3c/0x100
[<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
[<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
[<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
[<ffffffc0003446cc>] proc_info_read+0x50/0xe4
[<ffffffc0002f5204>] vfs_read+0xa0/0x12c
[<ffffffc0002f59c8>] SyS_read+0x44/0x74
DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB

Signed-off-by: Kyeongdon Kim <[email protected]>
---
drivers/block/zram/zcomp_lz4.c | 15 +++++++++++++--
drivers/block/zram/zcomp_lzo.c | 15 +++++++++++++--
2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/block/zram/zcomp_lz4.c b/drivers/block/zram/zcomp_lz4.c
index f2afb7e..0477894 100644
--- a/drivers/block/zram/zcomp_lz4.c
+++ b/drivers/block/zram/zcomp_lz4.c
@@ -10,17 +10,28 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/lz4.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>

#include "zcomp_lz4.h"

static void *zcomp_lz4_create(void)
{
- return kzalloc(LZ4_MEM_COMPRESS, GFP_KERNEL);
+ void *ret;
+
+ ret = kzalloc(LZ4_MEM_COMPRESS,
+ __GFP_NORETRY|__GFP_NOWARN|__GFP_NOMEMALLOC);
+ if (!ret)
+ ret = vzalloc(LZ4_MEM_COMPRESS);
+ return ret;
}

static void zcomp_lz4_destroy(void *private)
{
- kfree(private);
+ if (is_vmalloc_addr(private))
+ vfree(private);
+ else
+ kfree(private);
}

static int zcomp_lz4_compress(const unsigned char *src, unsigned char *dst,
diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
index da1bc47..613fbac 100644
--- a/drivers/block/zram/zcomp_lzo.c
+++ b/drivers/block/zram/zcomp_lzo.c
@@ -10,17 +10,28 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/lzo.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>

#include "zcomp_lzo.h"

static void *lzo_create(void)
{
- return kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
+ void *ret;
+
+ ret = kzalloc(LZO1X_MEM_COMPRESS,
+ __GFP_NORETRY|__GFP_NOWARN|__GFP_NOMEMALLOC);
+ if (!ret)
+ ret = vzalloc(LZO1X_MEM_COMPRESS);
+ return ret;
}

static void lzo_destroy(void *private)
{
- kfree(private);
+ if (is_vmalloc_addr(private))
+ vfree(private);
+ else
+ kfree(private);
}

static int lzo_compress(const unsigned char *src, unsigned char *dst,
--
1.8.2


2015-11-21 02:10:03

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

Cc Andrew

On (11/20/15 19:02), Kyeongdon Kim wrote:
> When we're using LZ4 multi compression streams for zram swap,
> we found out page allocation failure message in system running test.
> That was not only once, but a few(2 - 5 times per test).
> Also, some failure cases were continually occurring to try allocation
> order 3.
>
> In order to make parallel compression private data, we should call
> kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
> 2/3 size memory in that time, page allocation fails.
> This patch makes to use vmalloc() as fallback of kmalloc(), this
> prevents page alloc failure warning.
>
> After this, we never found warning message in running test, also
> It could reduce process startup latency about 60-120ms in each case.

Acked-by: Sergey Senozhatsky <[email protected]>

thanks.

-ss

> For reference a call trace :
>
> Binder_1: page allocation failure: order:3, mode:0x10c0d0
> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
> Call trace:
> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
> [<ffffffc000206c48>] show_stack+0x10/0x1c
> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
>
> Signed-off-by: Kyeongdon Kim <[email protected]>
> ---
> drivers/block/zram/zcomp_lz4.c | 15 +++++++++++++--
> drivers/block/zram/zcomp_lzo.c | 15 +++++++++++++--
> 2 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/zram/zcomp_lz4.c b/drivers/block/zram/zcomp_lz4.c
> index f2afb7e..0477894 100644
> --- a/drivers/block/zram/zcomp_lz4.c
> +++ b/drivers/block/zram/zcomp_lz4.c
> @@ -10,17 +10,28 @@
> #include <linux/kernel.h>
> #include <linux/slab.h>
> #include <linux/lz4.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm.h>
>
> #include "zcomp_lz4.h"
>
> static void *zcomp_lz4_create(void)
> {
> - return kzalloc(LZ4_MEM_COMPRESS, GFP_KERNEL);
> + void *ret;
> +
> + ret = kzalloc(LZ4_MEM_COMPRESS,
> + __GFP_NORETRY|__GFP_NOWARN|__GFP_NOMEMALLOC);
> + if (!ret)
> + ret = vzalloc(LZ4_MEM_COMPRESS);
> + return ret;
> }
>
> static void zcomp_lz4_destroy(void *private)
> {
> - kfree(private);
> + if (is_vmalloc_addr(private))
> + vfree(private);
> + else
> + kfree(private);
> }
>
> static int zcomp_lz4_compress(const unsigned char *src, unsigned char *dst,
> diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
> index da1bc47..613fbac 100644
> --- a/drivers/block/zram/zcomp_lzo.c
> +++ b/drivers/block/zram/zcomp_lzo.c
> @@ -10,17 +10,28 @@
> #include <linux/kernel.h>
> #include <linux/slab.h>
> #include <linux/lzo.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm.h>
>
> #include "zcomp_lzo.h"
>
> static void *lzo_create(void)
> {
> - return kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
> + void *ret;
> +
> + ret = kzalloc(LZO1X_MEM_COMPRESS,
> + __GFP_NORETRY|__GFP_NOWARN|__GFP_NOMEMALLOC);
> + if (!ret)
> + ret = vzalloc(LZO1X_MEM_COMPRESS);
> + return ret;
> }
>
> static void lzo_destroy(void *private)
> {
> - kfree(private);
> + if (is_vmalloc_addr(private))
> + vfree(private);
> + else
> + kfree(private);
> }
>
> static int lzo_compress(const unsigned char *src, unsigned char *dst,
> --
> 1.8.2
>

2015-11-21 02:14:46

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

On (11/21/15 11:10), Sergey Senozhatsky wrote:
> Cc Andrew
>
> On (11/20/15 19:02), Kyeongdon Kim wrote:
> > When we're using LZ4 multi compression streams for zram swap,
> > we found out page allocation failure message in system running test.
> > That was not only once, but a few(2 - 5 times per test).
> > Also, some failure cases were continually occurring to try allocation
> > order 3.
> >
> > In order to make parallel compression private data, we should call
> > kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
> > 2/3 size memory in that time, page allocation fails.
> > This patch makes to use vmalloc() as fallback of kmalloc(), this
> > prevents page alloc failure warning.
> >
> > After this, we never found warning message in running test, also
> > It could reduce process startup latency about 60-120ms in each case.
>
> Acked-by: Sergey Senozhatsky <[email protected]>
>

with the only nit that the subject should be "try kmalloc() before vmalloc()"
or similar, not "prevent page allocation failure", I think.

-ss

2015-11-21 09:39:07

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

On (11/21/15 11:15), Sergey Senozhatsky wrote:
[..]
>
> with the only nit that the subject should be "try kmalloc() before vmalloc()"
> or similar, not "prevent page allocation failure", I think.
>

Oh, and one more thing


> static void zcomp_lz4_destroy(void *private)
> {
> - kfree(private);
> + if (is_vmalloc_addr(private))
> + vfree(private);
> + else
> + kfree(private);
> }

static void zcomp_FOO_destroy(void *private)
{
kvfree(private);
}

-ss

2015-11-23 02:15:13

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

Hello,

On Fri, Nov 20, 2015 at 07:02:44PM +0900, Kyeongdon Kim wrote:
> When we're using LZ4 multi compression streams for zram swap,
> we found out page allocation failure message in system running test.
> That was not only once, but a few(2 - 5 times per test).
> Also, some failure cases were continually occurring to try allocation
> order 3.
>
> In order to make parallel compression private data, we should call
> kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
> 2/3 size memory in that time, page allocation fails.
> This patch makes to use vmalloc() as fallback of kmalloc(), this
> prevents page alloc failure warning.
>
> After this, we never found warning message in running test, also
> It could reduce process startup latency about 60-120ms in each case.

Very nice!

>
> For reference a call trace :
>
> Binder_1: page allocation failure: order:3, mode:0x10c0d0
> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
> Call trace:
> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
> [<ffffffc000206c48>] show_stack+0x10/0x1c
> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
>
> Signed-off-by: Kyeongdon Kim <[email protected]>

If you fix kvfree Sergey pointed out, you can add my

Acked-by: Minchan Kim <[email protected]>

> ---
> drivers/block/zram/zcomp_lz4.c | 15 +++++++++++++--
> drivers/block/zram/zcomp_lzo.c | 15 +++++++++++++--
> 2 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/zram/zcomp_lz4.c b/drivers/block/zram/zcomp_lz4.c
> index f2afb7e..0477894 100644
> --- a/drivers/block/zram/zcomp_lz4.c
> +++ b/drivers/block/zram/zcomp_lz4.c
> @@ -10,17 +10,28 @@
> #include <linux/kernel.h>
> #include <linux/slab.h>
> #include <linux/lz4.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm.h>
>
> #include "zcomp_lz4.h"
>
> static void *zcomp_lz4_create(void)
> {
> - return kzalloc(LZ4_MEM_COMPRESS, GFP_KERNEL);
> + void *ret;
> +
> + ret = kzalloc(LZ4_MEM_COMPRESS,
> + __GFP_NORETRY|__GFP_NOWARN|__GFP_NOMEMALLOC);
> + if (!ret)
> + ret = vzalloc(LZ4_MEM_COMPRESS);

One thing I feel bad smell is that call vzalloc with GFP_KERNEL.
This function can be called in direct reclaim path with holding
fs lock and GFP_KERNEL can enter recursive reclaim path so
lockdep would complain theoretically if I don't miss something.

If it is true, we should fix several allocation flags in
zcomp_strm_alloc. I just want to record this warning for the future
in this thread so someone who is finding for the contribution
material will prove and fix it. :)


> + return ret;
> }
>
> static void zcomp_lz4_destroy(void *private)
> {
> - kfree(private);
> + if (is_vmalloc_addr(private))
> + vfree(private);
> + else
> + kfree(private);

> }
>
> static int zcomp_lz4_compress(const unsigned char *src, unsigned char *dst,
> diff --git a/drivers/block/zram/zcomp_lzo.c b/drivers/block/zram/zcomp_lzo.c
> index da1bc47..613fbac 100644
> --- a/drivers/block/zram/zcomp_lzo.c
> +++ b/drivers/block/zram/zcomp_lzo.c
> @@ -10,17 +10,28 @@
> #include <linux/kernel.h>
> #include <linux/slab.h>
> #include <linux/lzo.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm.h>
>
> #include "zcomp_lzo.h"
>
> static void *lzo_create(void)
> {
> - return kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
> + void *ret;
> +
> + ret = kzalloc(LZO1X_MEM_COMPRESS,
> + __GFP_NORETRY|__GFP_NOWARN|__GFP_NOMEMALLOC);
> + if (!ret)
> + ret = vzalloc(LZO1X_MEM_COMPRESS);
> + return ret;
> }
>
> static void lzo_destroy(void *private)
> {
> - kfree(private);
> + if (is_vmalloc_addr(private))
> + vfree(private);
> + else
> + kfree(private);
> }
>
> static int lzo_compress(const unsigned char *src, unsigned char *dst,
> --
> 1.8.2
>

2015-11-23 03:13:03

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

Hello,

On (11/23/15 11:15), Minchan Kim wrote:
[..]
> > static void *zcomp_lz4_create(void)
> > {
> > - return kzalloc(LZ4_MEM_COMPRESS, GFP_KERNEL);
> > + void *ret;
> > +
> > + ret = kzalloc(LZ4_MEM_COMPRESS,
> > + __GFP_NORETRY|__GFP_NOWARN|__GFP_NOMEMALLOC);
> > + if (!ret)
> > + ret = vzalloc(LZ4_MEM_COMPRESS);
>
> One thing I feel bad smell is that call vzalloc with GFP_KERNEL.
> This function can be called in direct reclaim path with holding
> fs lock and GFP_KERNEL can enter recursive reclaim path so
> lockdep would complain theoretically if I don't miss something.
>

yes, GFP_KERNEL looks a bit fragile to me too. And may be zcomp_strm_alloc()
and comp->backend->create() deserve GFP_NOFS. I believe I sent a patch doing
this a while ago: https://lkml.org/lkml/2015/6/16/465

> If it is true, we should fix several allocation flags in
> zcomp_strm_alloc. I just want to record this warning for the future
> in this thread so someone who is finding for the contribution
> material will prove and fix it. :)

I can re-send the patch.

And, in case if you missed it, what's your opinion on the idea of
reducing ->max_strm if we can't allocate new streams. Here:
http://marc.info/?l=linux-kernel&m=144798049429861

-ss

2015-11-23 03:34:15

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

On (11/23/15 12:14), Sergey Senozhatsky wrote:
>
> yes, GFP_KERNEL looks a bit fragile to me too. And may be zcomp_strm_alloc()
> and comp->backend->create() deserve GFP_NOFS. I believe I sent a patch doing
> this a while ago: https://lkml.org/lkml/2015/6/16/465
>

perhaps just __GFP_RECLAIM (GFP_NOIO), not '__GFP_RECLAIM | __GFP_IO'
(GFP_NOFS), is better.

-ss

2015-11-23 04:19:05

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

New thread

On Mon, Nov 23, 2015 at 12:14:00PM +0900, Sergey Senozhatsky wrote:
> Hello,
>
> On (11/23/15 11:15), Minchan Kim wrote:
> [..]
> > > static void *zcomp_lz4_create(void)
> > > {
> > > - return kzalloc(LZ4_MEM_COMPRESS, GFP_KERNEL);
> > > + void *ret;
> > > +
> > > + ret = kzalloc(LZ4_MEM_COMPRESS,
> > > + __GFP_NORETRY|__GFP_NOWARN|__GFP_NOMEMALLOC);
> > > + if (!ret)
> > > + ret = vzalloc(LZ4_MEM_COMPRESS);
> >
> > One thing I feel bad smell is that call vzalloc with GFP_KERNEL.
> > This function can be called in direct reclaim path with holding
> > fs lock and GFP_KERNEL can enter recursive reclaim path so
> > lockdep would complain theoretically if I don't miss something.
> >
>
> yes, GFP_KERNEL looks a bit fragile to me too. And may be zcomp_strm_alloc()
> and comp->backend->create() deserve GFP_NOFS. I believe I sent a patch doing
> this a while ago: https://lkml.org/lkml/2015/6/16/465

Sorry, I have missed that.
It's worth to fix that you proved it that could happen.
But when I read your patch, GFP_NOIO instead GFP_NOFS would
better way. Could you resend it?

>
> > If it is true, we should fix several allocation flags in
> > zcomp_strm_alloc. I just want to record this warning for the future
> > in this thread so someone who is finding for the contribution
> > material will prove and fix it. :)
>
> I can re-send the patch.
>
> And, in case if you missed it, what's your opinion on the idea of
> reducing ->max_strm if we can't allocate new streams. Here:
> http://marc.info/?l=linux-kernel&m=144798049429861

Hmm, auto backoff max_comp_streams with warn to user, I'm not sure
we really need it. Alloc failure is depenent on the workload and
timing so although there are a fail in t1, it could be successful
in t2 so if we *really* want to introduce auto backoff, the logic
should include advance routine as well as backoff. With that,
I want to handle it traparently without notice to user.

*HOWEVER*, I think the problem Kyeongdon said came from warning and
memset by __GFP_ZERO flood. Other overheads for alloc/free would be
not heavy because they are higly optimized path in the MM part.
About reclaiming part, it should be almost no problem because
reclaimer cannot make forward progress by deadlock so it doesn't
scan any LRU.

So, Kyeongdon's patch will remove warning overhead and likely to
make zcomp_stram_alloc successful with vmalloc so I want to
roll it out first. And let's add a WARN_ON_ONCE to detect of
failure and rethink it when we receive such report.

Thanks.

2015-11-23 07:42:46

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

On (11/23/15 13:18), Minchan Kim wrote:
[..]
> > https://lkml.org/lkml/2015/6/16/465
>
> Sorry, I have missed that.
> It's worth to fix that you proved it that could happen.
> But when I read your patch, GFP_NOIO instead GFP_NOFS would
> better way. Could you resend it?

no problem.

agree. we also would want to switch from vzalloc() to
__vmalloc_node_flags(size, NUMA_NO_NODE,
GFP_NOIO | __GFP_HIGHMEM | __GFP_ZERO)

in fallbacks. I'll send the patch later today.

> >
> > > If it is true, we should fix several allocation flags in
> > > zcomp_strm_alloc. I just want to record this warning for the future
> > > in this thread so someone who is finding for the contribution
> > > material will prove and fix it. :)
> >
> > I can re-send the patch.
> >
> > And, in case if you missed it, what's your opinion on the idea of
> > reducing ->max_strm if we can't allocate new streams. Here:
> > http://marc.info/?l=linux-kernel&m=144798049429861
>
> Hmm, auto backoff max_comp_streams with warn to user, I'm not sure
> we really need it. Alloc failure is depenent on the workload and
> timing so although there are a fail in t1, it could be successful
> in t2 so if we *really* want to introduce auto backoff, the logic
> should include advance routine as well as backoff. With that,
> I want to handle it traparently without notice to user.

yes. auto roll-back is important (that's why I mentioned it). the idea
is to avoid stealing of pages for streams. for example, in case of low
memory decrease ->max_strm to
MAX(->avail_strm, ->max_strm, online cpus() / 2)

and even probably release some idle streams (um... as a reaction
to shrinker call???). or at least prevent setting of ->max_sgtrm to
some unreasonably huge values... "> 4 * NR_CPUS", for instance. but
this is not so critical.

> So, Kyeongdon's patch will remove warning overhead and likely to
> make zcomp_stram_alloc successful with vmalloc so I want to
> roll it out first. And let's add a WARN_ON_ONCE to detect of
> failure and rethink it when we receive such report.

-ss

2015-11-23 08:12:48

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2] zram: Prevent page allocation failure during zcomp_strm_alloc

On (11/23/15 16:43), Sergey Senozhatsky wrote:
[..]
> agree. we also would want to switch from vzalloc() to
> __vmalloc_node_flags(size, NUMA_NO_NODE,
> GFP_NOIO | __GFP_HIGHMEM | __GFP_ZERO)

[..]
> > So, Kyeongdon's patch will remove warning overhead and likely to
> > make zcomp_stram_alloc successful with vmalloc so I want to
> > roll it out first. And let's add a WARN_ON_ONCE to detect of
> > failure and rethink it when we receive such report.

hm... for k{z,m}alloc() it does reduce the warning overhead, but not
for vmalloc() -> warn_alloc_failed() [in theory]. So I guess, I'll
change vmalloc() to

__vmalloc(XXX,
GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM | __GFP_ZERO,
PAGE_KERNEL);

/* passing __GFP_NOWARN */.

-ss