2015-11-19 09:44:40

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: Prevent page allocation failure during zcomp_strm_alloc

Hello,

On (11/19/15 15:54), kyeongdon.kim wrote:
> When we use lzo/lz4 multi compression streams for zram,
> we might face page allocation failure. In order to make parallel
> compression private data, we should call kzalloc() with order 2/3
> in runtime. But if there is no order 2/3 size memory to allocate
> in that time, page allocation fails.
>
> This patch makes new zstrm stream list in zcomp_create() or
> max_comp_streams set, which means calling (kzalloc()) in runtime
> is not needed. This prevents page alloc failure.
>

The decision here is to not create unneded streams (streams are not
free) -- we create one we need it. And if allocation fails we just
wait for already existing stream to become available (and we are
guaranteed to have at least one stream)

zcomp_strm_multi_find
{
...
while (1) {
spin_lock(&zs->strm_lock);
if (!list_empty(&zs->idle_strm)) {
zstrm = list_entry(zs->idle_strm.next,
struct zcomp_strm, list);
list_del(&zstrm->list);
spin_unlock(&zs->strm_lock);
return zstrm;
}
...
/* allocate new zstrm stream */
zs->avail_strm++;
spin_unlock(&zs->strm_lock);

zstrm = zcomp_strm_alloc(comp);
if (!zstrm) {
spin_lock(&zs->strm_lock);
zs->avail_strm--;
spin_unlock(&zs->strm_lock);
wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
continue;
}
break;
}
return zstrm;
}


So, are you trying to fix the warning from zcomp_strm_multi_find()? What
prevents this warning from happening in zcomp_strm_multi_set_max_streams()
or zcomp_strm_multi_create()? e.g. I requested 64 streams and ended up not
having enought memory? The warning will still be there and the streams will
not be allocated. What's the gain?


(besides, your implementation is racy. ->list and ->avail_strm should
not be modified outside of ->strm_lock).

-ss

> For reference a call trace :
>
> Binder_1: page allocation failure: order:3, mode:0x10c0d0
> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
> Call trace:
> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
> [<ffffffc000206c48>] show_stack+0x10/0x1c
> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
>
> Signed-off-by: kyeongdon.kim <[email protected]>
> ---
> drivers/block/zram/zcomp.c | 33 ++++++++++++++++++++++++++++-----
> 1 file changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index f1ff39a..45f39b9 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -158,6 +158,21 @@ static bool zcomp_strm_multi_set_max_streams(struct zcomp *comp, int num_strm)
> struct zcomp_strm_multi *zs = comp->stream;
> struct zcomp_strm *zstrm;
>
> + if (zs->avail_strm < num_strm) {
> + while (1) {
> + /* allocate new zstrm stream */
> + zs->avail_strm++;
> + zstrm = zcomp_strm_alloc(comp);
> + if (!zstrm) {
> + zs->avail_strm--;
> + kfree(zs);
> + return false;
> + }
> + list_add(&zstrm->list, &zs->idle_strm);
> + if (zs->avail_strm == num_strm)
> + break;
> + }
> + }
> spin_lock(&zs->strm_lock);
> zs->max_strm = num_strm;
> /*
> @@ -209,12 +224,20 @@ static int zcomp_strm_multi_create(struct zcomp *comp, int max_strm)
> zs->max_strm = max_strm;
> zs->avail_strm = 1;
>
> - zstrm = zcomp_strm_alloc(comp);
> - if (!zstrm) {
> - kfree(zs);
> - return -ENOMEM;
> + while (1) {
> + /* allocate new zstrm stream */
> + zs->avail_strm++;
> + zstrm = zcomp_strm_alloc(comp);
> + if (!zstrm) {
> + zs->avail_strm--;
> + kfree(zs);
> + return -ENOMEM;
> + }
> + list_add(&zstrm->list, &zs->idle_strm);
> + if (zs->avail_strm == max_strm)
> + break;
> }
> - list_add(&zstrm->list, &zs->idle_strm);
> +
> return 0;
> }
>
> --
> 1.8.2
>


2015-11-19 13:49:13

by Kyeongdon Kim

[permalink] [raw]
Subject: Re: Re: [PATCH] zram: Prevent page allocation failure during zcomp_strm_alloc

Hello,
On 2015-11-19 오후 6:45, Sergey Senozhatsky wrote:
> Hello,
>
> On (11/19/15 15:54), kyeongdon.kim wrote:
>> When we use lzo/lz4 multi compression streams for zram,
>> we might face page allocation failure. In order to make parallel
>> compression private data, we should call kzalloc() with order 2/3
>> in runtime. But if there is no order 2/3 size memory to allocate
>> in that time, page allocation fails.
>>
>> This patch makes new zstrm stream list in zcomp_create() or
>> max_comp_streams set, which means calling (kzalloc()) in runtime
>> is not needed. This prevents page alloc failure.
>>
>
> The decision here is to not create unneded streams (streams are not
> free) -- we create one we need it. And if allocation fails we just
> wait for already existing stream to become available (and we are
> guaranteed to have at least one stream)
>
> zcomp_strm_multi_find
> {
> ...
> while (1) {
> spin_lock(&zs->strm_lock);
> if (!list_empty(&zs->idle_strm)) {
> zstrm = list_entry(zs->idle_strm.next,
> struct zcomp_strm, list);
> list_del(&zstrm->list);
> spin_unlock(&zs->strm_lock);
> return zstrm;
> }
> ...
> /* allocate new zstrm stream */
> zs->avail_strm++;
> spin_unlock(&zs->strm_lock);
>
> zstrm = zcomp_strm_alloc(comp);
> if (!zstrm) {
> spin_lock(&zs->strm_lock);
> zs->avail_strm--;
> spin_unlock(&zs->strm_lock);
> wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
> continue;
> }
> break;
> }
> return zstrm;
> }
>
>
> So, are you trying to fix the warning from zcomp_strm_multi_find()? What
> prevents this warning from happening in zcomp_strm_multi_set_max_streams()
> or zcomp_strm_multi_create()? e.g. I requested 64 streams and ended up not
> having enought memory? The warning will still be there and the streams will
> not be allocated. What's the gain?
>
>
> (besides, your implementation is racy. ->list and ->avail_strm should
> not be modified outside of ->strm_lock).
>
> -ss
>

I know what you mean (streams are not free).
First of all, I'm sorry I would have to tell you why I try this patch.
When we're using LZ4 multi stream for zram swap, I found out this
allocation failure message in system running test,
That was not only once, but a few. Also, some failure cases were
continually occurring to try allocation order 3.
I believed that status makes time delay issue to process launch.
So I tried to modify this by pre-allocating and solved it, even if there
was small memory using in advance.

But because of this patch, if there is an unneeded memory using.
I want to try new patch from another angle.
Firstly, we can change 'vmalloc()' instead of 'kzalloc()' for the
'zstrm->private".
Secondly, we can use GFP_NOWAIT flag to avoid this warning after 2nd
stream allocation.
How do you think about it? (btw, sorry for spinlock missing)
Thanks,

>> For reference a call trace :
>>
>> Binder_1: page allocation failure: order:3, mode:0x10c0d0
>> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty
> #20
>> Call trace:
>> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
>> [<ffffffc000206c48>] show_stack+0x10/0x1c
>> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
>> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
>> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
>> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
>> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
>> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
>> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
>> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
>> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
>> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
>> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
>> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
>> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
>> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
>> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
>> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
>> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
>> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
>> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
>> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
>> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
>> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
>> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
>> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
>> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
>> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
>> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
>>
>> Signed-off-by: kyeongdon.kim <[email protected]>
>> ---
>> drivers/block/zram/zcomp.c | 33 ++++++++++++++++++++++++++++-----
>> 1 file changed, 28 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
>> index f1ff39a..45f39b9 100644
>> --- a/drivers/block/zram/zcomp.c
>> +++ b/drivers/block/zram/zcomp.c
>> @@ -158,6 +158,21 @@ static bool
> zcomp_strm_multi_set_max_streams(struct zcomp *comp, int num_strm)
>> struct zcomp_strm_multi *zs = comp->stream;
>> struct zcomp_strm *zstrm;
>>
>> + if (zs->avail_strm < num_strm) {
>> + while (1) {
>> + /* allocate new zstrm stream */
>> + zs->avail_strm++;
>> + zstrm = zcomp_strm_alloc(comp);
>> + if (!zstrm) {
>> + zs->avail_strm--;
>> + kfree(zs);
>> + return false;
>> + }
>> + list_add(&zstrm->list, &zs->idle_strm);
>> + if (zs->avail_strm == num_strm)
>> + break;
>> + }
>> + }
>> spin_lock(&zs->strm_lock);
>> zs->max_strm = num_strm;
>> /*
>> @@ -209,12 +224,20 @@ static int zcomp_strm_multi_create(struct zcomp
> *comp, int max_strm)
>> zs->max_strm = max_strm;
>> zs->avail_strm = 1;
>>
>> - zstrm = zcomp_strm_alloc(comp);
>> - if (!zstrm) {
>> - kfree(zs);
>> - return -ENOMEM;
>> + while (1) {
>> + /* allocate new zstrm stream */
>> + zs->avail_strm++;
>> + zstrm = zcomp_strm_alloc(comp);
>> + if (!zstrm) {
>> + zs->avail_strm--;
>> + kfree(zs);
>> + return -ENOMEM;
>> + }
>> + list_add(&zstrm->list, &zs->idle_strm);
>> + if (zs->avail_strm == max_strm)
>> + break;
>> }
>> - list_add(&zstrm->list, &zs->idle_strm);
>> +
>> return 0;
>> }
>>
>> --
>> 1.8.2
>>
With Best Regards,
Kyeongdon Kim

2015-11-20 00:48:09

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: Prevent page allocation failure during zcomp_strm_alloc

Hello,

On (11/19/15 22:49), kyeongdon.kim wrote:
[..]
> I know what you mean (streams are not free).
> First of all, I'm sorry I would have to tell you why I try this patch.

nothing to be sorry about.

> When we're using LZ4 multi stream for zram swap, I found out this
> allocation failure message in system running test,
> That was not only once, but a few. Also, some failure cases were
> continually occurring to try allocation order 3.
> I believed that status makes time delay issue to process launch.

ahhh, I see. repetitive allocation failures during heavy I/O and low
memory conditions. hm... does it actually make any sense to make
something like this (*just an idea*)

---

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 5cb13ca..049ae6b 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -124,6 +124,19 @@ static struct zcomp_strm *zcomp_strm_multi_find(struct zcomp *comp)
zstrm = zcomp_strm_alloc(comp);
if (!zstrm) {
spin_lock(&zs->strm_lock);
+ /*
+ * If current IO path has failed to allocate a new
+ * stream then most likely lots of subsequent IO
+ * requests will do the same, wasting time attempting
+ * to reclaim pages, printing warning, etc. Reduce
+ * the number of max_stream and print an error.
+ */
+ if (zs->max_strm > 1) {
+ zs->max_strm--;
+ pr_err("%s: reduce max_comp_streams to %d\n",
+ "Low memory",
+ zs->max_strm);
+ }
zs->avail_strm--;
spin_unlock(&zs->strm_lock);
wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));

---

A 'broken English' comment can shed some light on the idea.
Hopefully, by the time we reduce ->max_strm there are N (>1)
streams already. In the worst case we go down to a single stream,
but that is something what would have happened anyway -- we don't
have memory for N streams. We don't rollback the max_stream value
(at least not in this version); we basically don't know when the
low memory condition goes away -- may be never.


> So I tried to modify this by pre-allocating and solved it, even if there
> was small memory using in advance.
>
> But because of this patch, if there is an unneeded memory using.
> I want to try new patch from another angle.
> Firstly, we can change 'vmalloc()' instead of 'kzalloc()' for the
> 'zstrm->private".

Yes, we can try this, I guess.


> Secondly, we can use GFP_NOWAIT flag to avoid this warning after 2nd
> stream allocation.

GFP_NOWAIT. hm... no, I don't think so.

-ss