2012-11-23 02:42:34

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 1/2] zram: force disksize setting before using zram

Now zram document syas "set disksize is optional"
but partly it's wrong. When you try to use zram firstly after
booting, you must set disksize, otherwise zram can't work because
zram gendisk's size is 0. But once you do it, you can use zram freely
after reset because reset doesn't reset to zero paradoxically.
So in this time, disksize setting is optional.:(
It's inconsitent for user behavior and not straightforward.

This patch forces always setting disksize firstly before using zram.
Yes. It changes current behavior so someone could complain when
he upgrades zram. Apparently it could be a problem if zram is mainline
but it still lives in staging so behavior could be changed for right
way to go. Let them excuse.

Signed-off-by: Minchan Kim <[email protected]>
---
drivers/staging/zram/zram.txt | 7 +++--
drivers/staging/zram/zram_drv.c | 57 ++++++++++++++-----------------------
drivers/staging/zram/zram_drv.h | 5 +---
drivers/staging/zram/zram_sysfs.c | 6 +---
4 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/staging/zram/zram.txt b/drivers/staging/zram/zram.txt
index 5f75d29..00ae66b 100644
--- a/drivers/staging/zram/zram.txt
+++ b/drivers/staging/zram/zram.txt
@@ -23,10 +23,9 @@ Following shows a typical sequence of steps for using zram.
This creates 4 devices: /dev/zram{0,1,2,3}
(num_devices parameter is optional. Default: 1)

-2) Set Disksize (Optional):
+2) Set Disksize
Set disk size by writing the value to sysfs node 'disksize'
- (in bytes). If disksize is not given, default value of 25%
- of RAM is used.
+ (in bytes).

# Initialize /dev/zram0 with 50MB disksize
echo $((50*1024*1024)) > /sys/block/zram0/disksize
@@ -67,6 +66,8 @@ Following shows a typical sequence of steps for using zram.

(This frees all the memory allocated for the given device).

+ If you want to use zram again, you should set disksize first
+ due to reset zram.

Please report any problems at:
- Mailing list: linux-mm-cc at laptop dot org
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index fb4a7c9..9ef1eca 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -104,35 +104,6 @@ static int page_zero_filled(void *ptr)
return 1;
}

-static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
-{
- if (!zram->disksize) {
- pr_info(
- "disk size not provided. You can use disksize_kb module "
- "param to specify size.\nUsing default: (%u%% of RAM).\n",
- default_disksize_perc_ram
- );
- zram->disksize = default_disksize_perc_ram *
- (totalram_bytes / 100);
- }
-
- if (zram->disksize > 2 * (totalram_bytes)) {
- pr_info(
- "There is little point creating a zram of greater than "
- "twice the size of memory since we expect a 2:1 compression "
- "ratio. Note that zram uses about 0.1%% of the size of "
- "the disk when not in use so a huge zram is "
- "wasteful.\n"
- "\tMemory Size: %zu kB\n"
- "\tSize you selected: %llu kB\n"
- "Continuing anyway ...\n",
- totalram_bytes >> 10, zram->disksize
- );
- }
-
- zram->disksize &= PAGE_MASK;
-}
-
static void zram_free_page(struct zram *zram, size_t index)
{
unsigned long handle = zram->table[index].handle;
@@ -497,6 +468,9 @@ void __zram_reset_device(struct zram *zram)
{
size_t index;

+ if (!zram->init_done)
+ goto out;
+
zram->init_done = 0;

/* Free various per-device buffers */
@@ -523,8 +497,9 @@ void __zram_reset_device(struct zram *zram)

/* Reset stats */
memset(&zram->stats, 0, sizeof(zram->stats));
-
+out:
zram->disksize = 0;
+ set_capacity(zram->disk, 0);
}

void zram_reset_device(struct zram *zram)
@@ -540,13 +515,26 @@ int zram_init_device(struct zram *zram)
size_t num_pages;

down_write(&zram->init_lock);
-
if (zram->init_done) {
up_write(&zram->init_lock);
return 0;
}

- zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
+ BUG_ON(!zram->disksize);
+
+ if (zram->disksize > 2 * (totalram_pages << PAGE_SHIFT)) {
+ pr_info(
+ "There is little point creating a zram of greater than "
+ "twice the size of memory since we expect a 2:1 compression "
+ "ratio. Note that zram uses about 0.1%% of the size of "
+ "the disk when not in use so a huge zram is "
+ "wasteful.\n"
+ "\tMemory Size: %zu kB\n"
+ "\tSize you selected: %llu kB\n"
+ "Continuing anyway ...\n",
+ (totalram_pages << PAGE_SHIFT) >> 10, zram->disksize
+ );
+ }

zram->compress_workmem = kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
if (!zram->compress_workmem) {
@@ -571,8 +559,6 @@ int zram_init_device(struct zram *zram)
goto fail_no_table;
}

- set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
-
/* zram devices sort of resembles non-rotational disks */
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);

@@ -755,8 +741,7 @@ static void __exit zram_exit(void)
zram = &zram_devices[i];

destroy_device(zram);
- if (zram->init_done)
- zram_reset_device(zram);
+ zram_reset_device(zram);
}

unregister_blkdev(zram_major, "zram");
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index df2eec4..5b671d1 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -28,9 +28,6 @@ static const unsigned max_num_devices = 32;

/*-- Configurable parameters */

-/* Default zram disk size: 25% of total RAM */
-static const unsigned default_disksize_perc_ram = 25;
-
/*
* Pages that compress to size greater than this are stored
* uncompressed in memory.
@@ -115,6 +112,6 @@ extern struct attribute_group zram_disk_attr_group;
#endif

extern int zram_init_device(struct zram *zram);
-extern void __zram_reset_device(struct zram *zram);
+extern void zram_reset_device(struct zram *zram);

#endif
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index de1eacf..4143af9 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -110,11 +110,7 @@ static ssize_t reset_store(struct device *dev,
if (bdev)
fsync_bdev(bdev);

- down_write(&zram->init_lock);
- if (zram->init_done)
- __zram_reset_device(zram);
- up_write(&zram->init_lock);
-
+ zram_reset_device(zram);
return len;
}

--
1.7.9.5


2012-11-23 02:42:36

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 2/2] zram: allocate metadata when disksize is set up

Lockdep complains about recursive deadlock of zram->init_lock.
Because zram_init_device could be called in reclaim context and
it requires a page with GFP_KERNEL.

We can fix it via replacing GFP_KERNEL with GFP_NOIO.
But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
We can change it with __vmalloc which can receive gfp_t.
But still we have a problem. Although __vmalloc can handle gfp_t, it calls
allocation of GFP_KERNEL. That's why I sent the patch.
https://lkml.org/lkml/2012/4/23/77

Yes. Fundamental problem is utter crap API vmalloc.
If we can fix it, everyone would be happy. But life isn't simple
like seeing my thread of the patch.

So next option is to give up lazy initialization and initialize it at the
very disksize setting time. But it makes unnecessary metadata waste until
zram is really used. But let's think about it.

1) User of zram normally do mkfs.xxx or mkswap before using
the zram block device(ex, normally, do it at booting time)
It ends up allocating such metadata of zram before real usage so
benefit of lazy initialzation would be mitigated.

2) Some user want to use zram when memory pressure is high.(ie, load zram
dynamically, NOT booting time). It does make sense because people don't
want to waste memory until memory pressure is high(ie, where zram is really
helpful time). In this case, lazy initialzation could be failed easily
because we will use GFP_NOIO instead of GFP_KERNEL for avoiding deadlock.
So the benefit of lazy initialzation would be mitigated, too.

3) Metadata overhead is not critical and Nitin has a plan to diet it.
4K : 12 byte(64bit machine) -> 64G : 192M so 0.3% isn't big overhead
If insane user use such big zram device up to 20, it could consume 6% of ram
but efficieny of zram will cover the waste.

So this patch gives up lazy initialization and instead we initialize metadata
at disksize setting time.

Signed-off-by: Minchan Kim <[email protected]>
---
drivers/staging/zram/zram_drv.c | 21 ++++-----------------
drivers/staging/zram/zram_sysfs.c | 1 +
2 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 9ef1eca..f364fb5 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -441,16 +441,13 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
{
struct zram *zram = queue->queuedata;

- if (unlikely(!zram->init_done) && zram_init_device(zram))
- goto error;
-
down_read(&zram->init_lock);
if (unlikely(!zram->init_done))
- goto error_unlock;
+ goto error;

if (!valid_io_request(zram, bio)) {
zram_stat64_inc(zram, &zram->stats.invalid_io);
- goto error_unlock;
+ goto error;
}

__zram_make_request(zram, bio, bio_data_dir(bio));
@@ -458,9 +455,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)

return;

-error_unlock:
- up_read(&zram->init_lock);
error:
+ up_read(&zram->init_lock);
bio_io_error(bio);
}

@@ -509,19 +505,12 @@ void zram_reset_device(struct zram *zram)
up_write(&zram->init_lock);
}

+/* zram->init_lock should be hold */
int zram_init_device(struct zram *zram)
{
int ret;
size_t num_pages;

- down_write(&zram->init_lock);
- if (zram->init_done) {
- up_write(&zram->init_lock);
- return 0;
- }
-
- BUG_ON(!zram->disksize);
-
if (zram->disksize > 2 * (totalram_pages << PAGE_SHIFT)) {
pr_info(
"There is little point creating a zram of greater than "
@@ -570,7 +559,6 @@ int zram_init_device(struct zram *zram)
}

zram->init_done = 1;
- up_write(&zram->init_lock);

pr_debug("Initialization done!\n");
return 0;
@@ -580,7 +568,6 @@ fail_no_table:
zram->disksize = 0;
fail:
__zram_reset_device(zram);
- up_write(&zram->init_lock);
pr_err("Initialization failed: err=%d\n", ret);
return ret;
}
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index 4143af9..369db12 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -71,6 +71,7 @@ static ssize_t disksize_store(struct device *dev,

zram->disksize = PAGE_ALIGN(disksize);
set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
+ zram_init_device(zram);
up_write(&zram->init_lock);

return len;
--
1.7.9.5

2012-11-27 05:04:52

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 1/2] zram: force disksize setting before using zram

On 11/22/2012 06:42 PM, Minchan Kim wrote:
> Now zram document syas "set disksize is optional"
> but partly it's wrong. When you try to use zram firstly after
> booting, you must set disksize, otherwise zram can't work because
> zram gendisk's size is 0. But once you do it, you can use zram freely
> after reset because reset doesn't reset to zero paradoxically.
> So in this time, disksize setting is optional.:(
> It's inconsitent for user behavior and not straightforward.
>
> This patch forces always setting disksize firstly before using zram.
> Yes. It changes current behavior so someone could complain when
> he upgrades zram. Apparently it could be a problem if zram is mainline
> but it still lives in staging so behavior could be changed for right
> way to go. Let them excuse.
>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> drivers/staging/zram/zram.txt | 7 +++--
> drivers/staging/zram/zram_drv.c | 57 ++++++++++++++-----------------------
> drivers/staging/zram/zram_drv.h | 5 +---
> drivers/staging/zram/zram_sysfs.c | 6 +---
> 4 files changed, 27 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/staging/zram/zram.txt b/drivers/staging/zram/zram.txt
> index 5f75d29..00ae66b 100644
> --- a/drivers/staging/zram/zram.txt
> +++ b/drivers/staging/zram/zram.txt
> @@ -23,10 +23,9 @@ Following shows a typical sequence of steps for using zram.
> This creates 4 devices: /dev/zram{0,1,2,3}
> (num_devices parameter is optional. Default: 1)
>
> -2) Set Disksize (Optional):
> +2) Set Disksize
> Set disk size by writing the value to sysfs node 'disksize'
> - (in bytes). If disksize is not given, default value of 25%
> - of RAM is used.
> + (in bytes).
>

Disksize can now be set using K/M/G suffixes also (see Sergey's change:
handle mem suffixes in disk size ...). So, this should be documented as:

2) Set Disksize
Set disk size by writing the value to sysfs node 'disksize'.
The value can be either in bytes or you can use mem suffixes.
Examples:
# Initialize /dev/zram0 with 50MB disksize
echo $((50*1024*1024)) > /sys/block/zram0/disksize

# Using mem suffixes
echo 256K > /sys/block/zram0/disksize
echo 512M > /sys/block/zram0/disksize
echo 1G > /sys/block/zram0/disksize


> # Initialize /dev/zram0 with 50MB disksize
> echo $((50*1024*1024)) > /sys/block/zram0/disksize
> @@ -67,6 +66,8 @@ Following shows a typical sequence of steps for using zram.
>
> (This frees all the memory allocated for the given device).
>
> + If you want to use zram again, you should set disksize first
> + due to reset zram.


This frees all the memory allocated for the given device and resets the
disksize to zero. You must set the disksize again before reusing the device.

>
> Please report any problems at:
> - Mailing list: linux-mm-cc at laptop dot org
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index fb4a7c9..9ef1eca 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -104,35 +104,6 @@ static int page_zero_filled(void *ptr)
> return 1;
> }
>
> -static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
> -{
> - if (!zram->disksize) {
> - pr_info(
> - "disk size not provided. You can use disksize_kb module "
> - "param to specify size.\nUsing default: (%u%% of RAM).\n",
> - default_disksize_perc_ram
> - );
> - zram->disksize = default_disksize_perc_ram *
> - (totalram_bytes / 100);
> - }
> -
> - if (zram->disksize > 2 * (totalram_bytes)) {
> - pr_info(
> - "There is little point creating a zram of greater than "
> - "twice the size of memory since we expect a 2:1 compression "
> - "ratio. Note that zram uses about 0.1%% of the size of "
> - "the disk when not in use so a huge zram is "
> - "wasteful.\n"
> - "\tMemory Size: %zu kB\n"
> - "\tSize you selected: %llu kB\n"
> - "Continuing anyway ...\n",
> - totalram_bytes >> 10, zram->disksize
> - );
> - }
> -
> - zram->disksize &= PAGE_MASK;
> -}
> -
> static void zram_free_page(struct zram *zram, size_t index)
> {
> unsigned long handle = zram->table[index].handle;
> @@ -497,6 +468,9 @@ void __zram_reset_device(struct zram *zram)
> {
> size_t index;
>
> + if (!zram->init_done)
> + goto out;
> +
> zram->init_done = 0;
>
> /* Free various per-device buffers */
> @@ -523,8 +497,9 @@ void __zram_reset_device(struct zram *zram)
>
> /* Reset stats */
> memset(&zram->stats, 0, sizeof(zram->stats));
> -
> +out:
> zram->disksize = 0;
> + set_capacity(zram->disk, 0);
> }
>
> void zram_reset_device(struct zram *zram)
> @@ -540,13 +515,26 @@ int zram_init_device(struct zram *zram)
> size_t num_pages;
>
> down_write(&zram->init_lock);
> -
> if (zram->init_done) {
> up_write(&zram->init_lock);
> return 0;
> }
>
> - zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
> + BUG_ON(!zram->disksize);

It shouldn't cause a crash if user sets disksize to zero; a noop seems
better.

Thanks,
Nitin

2012-11-27 05:13:29

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 2/2] zram: allocate metadata when disksize is set up

On 11/22/2012 06:42 PM, Minchan Kim wrote:
> Lockdep complains about recursive deadlock of zram->init_lock.
> Because zram_init_device could be called in reclaim context and
> it requires a page with GFP_KERNEL.
>
> We can fix it via replacing GFP_KERNEL with GFP_NOIO.
> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> We can change it with __vmalloc which can receive gfp_t.
> But still we have a problem. Although __vmalloc can handle gfp_t, it calls
> allocation of GFP_KERNEL. That's why I sent the patch.
> https://lkml.org/lkml/2012/4/23/77
>
> Yes. Fundamental problem is utter crap API vmalloc.
> If we can fix it, everyone would be happy. But life isn't simple
> like seeing my thread of the patch.
>
> So next option is to give up lazy initialization and initialize it at the
> very disksize setting time. But it makes unnecessary metadata waste until
> zram is really used. But let's think about it.
>
> 1) User of zram normally do mkfs.xxx or mkswap before using
> the zram block device(ex, normally, do it at booting time)
> It ends up allocating such metadata of zram before real usage so
> benefit of lazy initialzation would be mitigated.
>
> 2) Some user want to use zram when memory pressure is high.(ie, load zram
> dynamically, NOT booting time). It does make sense because people don't
> want to waste memory until memory pressure is high(ie, where zram is really
> helpful time). In this case, lazy initialzation could be failed easily
> because we will use GFP_NOIO instead of GFP_KERNEL for avoiding deadlock.
> So the benefit of lazy initialzation would be mitigated, too.
>
> 3) Metadata overhead is not critical and Nitin has a plan to diet it.
> 4K : 12 byte(64bit machine) -> 64G : 192M so 0.3% isn't big overhead
> If insane user use such big zram device up to 20, it could consume 6% of ram
> but efficieny of zram will cover the waste.
>
> So this patch gives up lazy initialization and instead we initialize metadata
> at disksize setting time.
>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> drivers/staging/zram/zram_drv.c | 21 ++++-----------------
> drivers/staging/zram/zram_sysfs.c | 1 +
> 2 files changed, 5 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 9ef1eca..f364fb5 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -441,16 +441,13 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> {
> struct zram *zram = queue->queuedata;
>
> - if (unlikely(!zram->init_done) && zram_init_device(zram))
> - goto error;
> -
> down_read(&zram->init_lock);
> if (unlikely(!zram->init_done))
> - goto error_unlock;
> + goto error;
>
> if (!valid_io_request(zram, bio)) {
> zram_stat64_inc(zram, &zram->stats.invalid_io);
> - goto error_unlock;
> + goto error;
> }
>
> __zram_make_request(zram, bio, bio_data_dir(bio));
> @@ -458,9 +455,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>
> return;
>
> -error_unlock:
> - up_read(&zram->init_lock);
> error:
> + up_read(&zram->init_lock);
> bio_io_error(bio);
> }
>
> @@ -509,19 +505,12 @@ void zram_reset_device(struct zram *zram)
> up_write(&zram->init_lock);
> }
>
> +/* zram->init_lock should be hold */

s/hold/held

btw, shouldn't we also change GFP_KERNEL to GFP_NOIO in is_partial_io()
case in both read/write handlers?

Rest of the patch looks good.


Thanks,
Nitin

> int zram_init_device(struct zram *zram)
> {
> int ret;
> size_t num_pages;
>
> - down_write(&zram->init_lock);
> - if (zram->init_done) {
> - up_write(&zram->init_lock);
> - return 0;
> - }
> -
> - BUG_ON(!zram->disksize);
> -
> if (zram->disksize > 2 * (totalram_pages << PAGE_SHIFT)) {
> pr_info(
> "There is little point creating a zram of greater than "
> @@ -570,7 +559,6 @@ int zram_init_device(struct zram *zram)
> }
>
> zram->init_done = 1;
> - up_write(&zram->init_lock);
>
> pr_debug("Initialization done!\n");
> return 0;
> @@ -580,7 +568,6 @@ fail_no_table:
> zram->disksize = 0;
> fail:
> __zram_reset_device(zram);
> - up_write(&zram->init_lock);
> pr_err("Initialization failed: err=%d\n", ret);
> return ret;
> }
> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
> index 4143af9..369db12 100644
> --- a/drivers/staging/zram/zram_sysfs.c
> +++ b/drivers/staging/zram/zram_sysfs.c
> @@ -71,6 +71,7 @@ static ssize_t disksize_store(struct device *dev,
>
> zram->disksize = PAGE_ALIGN(disksize);
> set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
> + zram_init_device(zram);
> up_write(&zram->init_lock);
>
> return len;
>

2012-11-27 15:46:59

by Jerome Marchand

[permalink] [raw]
Subject: Re: [PATCH 2/2] zram: allocate metadata when disksize is set up

On 11/27/2012 06:13 AM, Nitin Gupta wrote:
> On 11/22/2012 06:42 PM, Minchan Kim wrote:
>> Lockdep complains about recursive deadlock of zram->init_lock.
>> Because zram_init_device could be called in reclaim context and
>> it requires a page with GFP_KERNEL.
>>
>> We can fix it via replacing GFP_KERNEL with GFP_NOIO.
>> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
>> We can change it with __vmalloc which can receive gfp_t.
>> But still we have a problem. Although __vmalloc can handle gfp_t, it calls
>> allocation of GFP_KERNEL. That's why I sent the patch.
>> https://lkml.org/lkml/2012/4/23/77
>>
>> Yes. Fundamental problem is utter crap API vmalloc.
>> If we can fix it, everyone would be happy. But life isn't simple
>> like seeing my thread of the patch.
>>
>> So next option is to give up lazy initialization and initialize it at the
>> very disksize setting time. But it makes unnecessary metadata waste until
>> zram is really used. But let's think about it.
>>
>> 1) User of zram normally do mkfs.xxx or mkswap before using
>> the zram block device(ex, normally, do it at booting time)
>> It ends up allocating such metadata of zram before real usage so
>> benefit of lazy initialzation would be mitigated.
>>
>> 2) Some user want to use zram when memory pressure is high.(ie, load zram
>> dynamically, NOT booting time). It does make sense because people don't
>> want to waste memory until memory pressure is high(ie, where zram is really
>> helpful time). In this case, lazy initialzation could be failed easily
>> because we will use GFP_NOIO instead of GFP_KERNEL for avoiding deadlock.
>> So the benefit of lazy initialzation would be mitigated, too.
>>
>> 3) Metadata overhead is not critical and Nitin has a plan to diet it.
>> 4K : 12 byte(64bit machine) -> 64G : 192M so 0.3% isn't big overhead
>> If insane user use such big zram device up to 20, it could consume 6% of ram
>> but efficieny of zram will cover the waste.
>>
>> So this patch gives up lazy initialization and instead we initialize metadata
>> at disksize setting time.
>>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> drivers/staging/zram/zram_drv.c | 21 ++++-----------------
>> drivers/staging/zram/zram_sysfs.c | 1 +
>> 2 files changed, 5 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
>> index 9ef1eca..f364fb5 100644
>> --- a/drivers/staging/zram/zram_drv.c
>> +++ b/drivers/staging/zram/zram_drv.c
>> @@ -441,16 +441,13 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>> {
>> struct zram *zram = queue->queuedata;
>>
>> - if (unlikely(!zram->init_done) && zram_init_device(zram))
>> - goto error;
>> -
>> down_read(&zram->init_lock);
>> if (unlikely(!zram->init_done))
>> - goto error_unlock;
>> + goto error;
>>
>> if (!valid_io_request(zram, bio)) {
>> zram_stat64_inc(zram, &zram->stats.invalid_io);
>> - goto error_unlock;
>> + goto error;
>> }
>>
>> __zram_make_request(zram, bio, bio_data_dir(bio));
>> @@ -458,9 +455,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>>
>> return;
>>
>> -error_unlock:
>> - up_read(&zram->init_lock);
>> error:
>> + up_read(&zram->init_lock);
>> bio_io_error(bio);
>> }
>>
>> @@ -509,19 +505,12 @@ void zram_reset_device(struct zram *zram)
>> up_write(&zram->init_lock);
>> }
>>
>> +/* zram->init_lock should be hold */
>
> s/hold/held
>
> btw, shouldn't we also change GFP_KERNEL to GFP_NOIO in is_partial_io()
> case in both read/write handlers?

Good point. Actually, the one in zram_bvec_read() should actually be
GFP_ATOMIC because of the kmap_atomic() above (or be moved out of
kmap_atomic/kunmap_atomic nest).
Another solution would be to allocate some working buffer at device
init as it's done for compress_buffer/workmem. It would make
zram_bvec_read/write look simpler (no need to free memory or manage
kmalloc failure).

Jerome

>
> Rest of the patch looks good.
>
>
> Thanks,
> Nitin
>
>> int zram_init_device(struct zram *zram)
>> {
>> int ret;
>> size_t num_pages;
>>
>> - down_write(&zram->init_lock);
>> - if (zram->init_done) {
>> - up_write(&zram->init_lock);
>> - return 0;
>> - }
>> -
>> - BUG_ON(!zram->disksize);
>> -
>> if (zram->disksize > 2 * (totalram_pages << PAGE_SHIFT)) {
>> pr_info(
>> "There is little point creating a zram of greater than "
>> @@ -570,7 +559,6 @@ int zram_init_device(struct zram *zram)
>> }
>>
>> zram->init_done = 1;
>> - up_write(&zram->init_lock);
>>
>> pr_debug("Initialization done!\n");
>> return 0;
>> @@ -580,7 +568,6 @@ fail_no_table:
>> zram->disksize = 0;
>> fail:
>> __zram_reset_device(zram);
>> - up_write(&zram->init_lock);
>> pr_err("Initialization failed: err=%d\n", ret);
>> return ret;
>> }
>> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
>> index 4143af9..369db12 100644
>> --- a/drivers/staging/zram/zram_sysfs.c
>> +++ b/drivers/staging/zram/zram_sysfs.c
>> @@ -71,6 +71,7 @@ static ssize_t disksize_store(struct device *dev,
>>
>> zram->disksize = PAGE_ALIGN(disksize);
>> set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
>> + zram_init_device(zram);
>> up_write(&zram->init_lock);
>>
>> return len;
>>
>

2012-11-28 04:20:10

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 1/2] zram: force disksize setting before using zram

On Mon, Nov 26, 2012 at 09:04:47PM -0800, Nitin Gupta wrote:
> On 11/22/2012 06:42 PM, Minchan Kim wrote:
> >Now zram document syas "set disksize is optional"
> >but partly it's wrong. When you try to use zram firstly after
> >booting, you must set disksize, otherwise zram can't work because
> >zram gendisk's size is 0. But once you do it, you can use zram freely
> >after reset because reset doesn't reset to zero paradoxically.
> >So in this time, disksize setting is optional.:(
> >It's inconsitent for user behavior and not straightforward.
> >
> >This patch forces always setting disksize firstly before using zram.
> >Yes. It changes current behavior so someone could complain when
> >he upgrades zram. Apparently it could be a problem if zram is mainline
> >but it still lives in staging so behavior could be changed for right
> >way to go. Let them excuse.
> >
> >Signed-off-by: Minchan Kim <[email protected]>
> >---
> > drivers/staging/zram/zram.txt | 7 +++--
> > drivers/staging/zram/zram_drv.c | 57 ++++++++++++++-----------------------
> > drivers/staging/zram/zram_drv.h | 5 +---
> > drivers/staging/zram/zram_sysfs.c | 6 +---
> > 4 files changed, 27 insertions(+), 48 deletions(-)
> >
> >diff --git a/drivers/staging/zram/zram.txt b/drivers/staging/zram/zram.txt
> >index 5f75d29..00ae66b 100644
> >--- a/drivers/staging/zram/zram.txt
> >+++ b/drivers/staging/zram/zram.txt
> >@@ -23,10 +23,9 @@ Following shows a typical sequence of steps for using zram.
> > This creates 4 devices: /dev/zram{0,1,2,3}
> > (num_devices parameter is optional. Default: 1)
> >
> >-2) Set Disksize (Optional):
> >+2) Set Disksize
> > Set disk size by writing the value to sysfs node 'disksize'
> >- (in bytes). If disksize is not given, default value of 25%
> >- of RAM is used.
> >+ (in bytes).
> >
>
> Disksize can now be set using K/M/G suffixes also (see Sergey's
> change: handle mem suffixes in disk size ...). So, this should be
> documented as:
>
> 2) Set Disksize
> Set disk size by writing the value to sysfs node 'disksize'.
> The value can be either in bytes or you can use mem suffixes.
> Examples:
> # Initialize /dev/zram0 with 50MB disksize
> echo $((50*1024*1024)) > /sys/block/zram0/disksize
>
> # Using mem suffixes
> echo 256K > /sys/block/zram0/disksize
> echo 512M > /sys/block/zram0/disksize
> echo 1G > /sys/block/zram0/disksize
>

Done.

>
> > # Initialize /dev/zram0 with 50MB disksize
> > echo $((50*1024*1024)) > /sys/block/zram0/disksize
> >@@ -67,6 +66,8 @@ Following shows a typical sequence of steps for using zram.
> >
> > (This frees all the memory allocated for the given device).
> >
> >+ If you want to use zram again, you should set disksize first
> >+ due to reset zram.
>
>
> This frees all the memory allocated for the given device and resets
> the disksize to zero. You must set the disksize again before reusing
> the device.

Done.

>
> >
> > Please report any problems at:
> > - Mailing list: linux-mm-cc at laptop dot org
> >diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> >index fb4a7c9..9ef1eca 100644
> >--- a/drivers/staging/zram/zram_drv.c
> >+++ b/drivers/staging/zram/zram_drv.c
> >@@ -104,35 +104,6 @@ static int page_zero_filled(void *ptr)
> > return 1;
> > }
> >
> >-static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
> >-{
> >- if (!zram->disksize) {
> >- pr_info(
> >- "disk size not provided. You can use disksize_kb module "
> >- "param to specify size.\nUsing default: (%u%% of RAM).\n",
> >- default_disksize_perc_ram
> >- );
> >- zram->disksize = default_disksize_perc_ram *
> >- (totalram_bytes / 100);
> >- }
> >-
> >- if (zram->disksize > 2 * (totalram_bytes)) {
> >- pr_info(
> >- "There is little point creating a zram of greater than "
> >- "twice the size of memory since we expect a 2:1 compression "
> >- "ratio. Note that zram uses about 0.1%% of the size of "
> >- "the disk when not in use so a huge zram is "
> >- "wasteful.\n"
> >- "\tMemory Size: %zu kB\n"
> >- "\tSize you selected: %llu kB\n"
> >- "Continuing anyway ...\n",
> >- totalram_bytes >> 10, zram->disksize
> >- );
> >- }
> >-
> >- zram->disksize &= PAGE_MASK;
> >-}
> >-
> > static void zram_free_page(struct zram *zram, size_t index)
> > {
> > unsigned long handle = zram->table[index].handle;
> >@@ -497,6 +468,9 @@ void __zram_reset_device(struct zram *zram)
> > {
> > size_t index;
> >
> >+ if (!zram->init_done)
> >+ goto out;
> >+
> > zram->init_done = 0;
> >
> > /* Free various per-device buffers */
> >@@ -523,8 +497,9 @@ void __zram_reset_device(struct zram *zram)
> >
> > /* Reset stats */
> > memset(&zram->stats, 0, sizeof(zram->stats));
> >-
> >+out:
> > zram->disksize = 0;
> >+ set_capacity(zram->disk, 0);
> > }
> >
> > void zram_reset_device(struct zram *zram)
> >@@ -540,13 +515,26 @@ int zram_init_device(struct zram *zram)
> > size_t num_pages;
> >
> > down_write(&zram->init_lock);
> >-
> > if (zram->init_done) {
> > up_write(&zram->init_lock);
> > return 0;
> > }
> >
> >- zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
> >+ BUG_ON(!zram->disksize);
>
> It shouldn't cause a crash if user sets disksize to zero; a noop
> seems better.

I removed it because following patch gets rid of it.
Thanks for good suggestion for document.

--
Kind regards,
Minchan Kim

2012-11-28 04:22:13

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] zram: allocate metadata when disksize is set up

On Mon, Nov 26, 2012 at 09:13:24PM -0800, Nitin Gupta wrote:
> On 11/22/2012 06:42 PM, Minchan Kim wrote:
> >Lockdep complains about recursive deadlock of zram->init_lock.
> >Because zram_init_device could be called in reclaim context and
> >it requires a page with GFP_KERNEL.
> >
> >We can fix it via replacing GFP_KERNEL with GFP_NOIO.
> >But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> >We can change it with __vmalloc which can receive gfp_t.
> >But still we have a problem. Although __vmalloc can handle gfp_t, it calls
> >allocation of GFP_KERNEL. That's why I sent the patch.
> >https://lkml.org/lkml/2012/4/23/77
> >
> >Yes. Fundamental problem is utter crap API vmalloc.
> >If we can fix it, everyone would be happy. But life isn't simple
> >like seeing my thread of the patch.
> >
> >So next option is to give up lazy initialization and initialize it at the
> >very disksize setting time. But it makes unnecessary metadata waste until
> >zram is really used. But let's think about it.
> >
> >1) User of zram normally do mkfs.xxx or mkswap before using
> > the zram block device(ex, normally, do it at booting time)
> > It ends up allocating such metadata of zram before real usage so
> > benefit of lazy initialzation would be mitigated.
> >
> >2) Some user want to use zram when memory pressure is high.(ie, load zram
> > dynamically, NOT booting time). It does make sense because people don't
> > want to waste memory until memory pressure is high(ie, where zram is really
> > helpful time). In this case, lazy initialzation could be failed easily
> > because we will use GFP_NOIO instead of GFP_KERNEL for avoiding deadlock.
> > So the benefit of lazy initialzation would be mitigated, too.
> >
> >3) Metadata overhead is not critical and Nitin has a plan to diet it.
> > 4K : 12 byte(64bit machine) -> 64G : 192M so 0.3% isn't big overhead
> > If insane user use such big zram device up to 20, it could consume 6% of ram
> > but efficieny of zram will cover the waste.
> >
> >So this patch gives up lazy initialization and instead we initialize metadata
> >at disksize setting time.
> >
> >Signed-off-by: Minchan Kim <[email protected]>
> >---
> > drivers/staging/zram/zram_drv.c | 21 ++++-----------------
> > drivers/staging/zram/zram_sysfs.c | 1 +
> > 2 files changed, 5 insertions(+), 17 deletions(-)
> >
> >diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> >index 9ef1eca..f364fb5 100644
> >--- a/drivers/staging/zram/zram_drv.c
> >+++ b/drivers/staging/zram/zram_drv.c
> >@@ -441,16 +441,13 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> > {
> > struct zram *zram = queue->queuedata;
> >
> >- if (unlikely(!zram->init_done) && zram_init_device(zram))
> >- goto error;
> >-
> > down_read(&zram->init_lock);
> > if (unlikely(!zram->init_done))
> >- goto error_unlock;
> >+ goto error;
> >
> > if (!valid_io_request(zram, bio)) {
> > zram_stat64_inc(zram, &zram->stats.invalid_io);
> >- goto error_unlock;
> >+ goto error;
> > }
> >
> > __zram_make_request(zram, bio, bio_data_dir(bio));
> >@@ -458,9 +455,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> >
> > return;
> >
> >-error_unlock:
> >- up_read(&zram->init_lock);
> > error:
> >+ up_read(&zram->init_lock);
> > bio_io_error(bio);
> > }
> >
> >@@ -509,19 +505,12 @@ void zram_reset_device(struct zram *zram)
> > up_write(&zram->init_lock);
> > }
> >
> >+/* zram->init_lock should be hold */
>
> s/hold/held

Done.

>
> btw, shouldn't we also change GFP_KERNEL to GFP_NOIO in
> is_partial_io() case in both read/write handlers?

Absolutely. The previous patch isn't complete but sent by mistake.
Sorry for the noise.
I just sent new patch.

Thanks.

--
Kind regards,
Minchan Kim

2012-11-28 04:26:10

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] zram: allocate metadata when disksize is set up

On Tue, Nov 27, 2012 at 04:45:54PM +0100, Jerome Marchand wrote:
> On 11/27/2012 06:13 AM, Nitin Gupta wrote:
> > On 11/22/2012 06:42 PM, Minchan Kim wrote:
> >> Lockdep complains about recursive deadlock of zram->init_lock.
> >> Because zram_init_device could be called in reclaim context and
> >> it requires a page with GFP_KERNEL.
> >>
> >> We can fix it via replacing GFP_KERNEL with GFP_NOIO.
> >> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL.
> >> We can change it with __vmalloc which can receive gfp_t.
> >> But still we have a problem. Although __vmalloc can handle gfp_t, it calls
> >> allocation of GFP_KERNEL. That's why I sent the patch.
> >> https://lkml.org/lkml/2012/4/23/77
> >>
> >> Yes. Fundamental problem is utter crap API vmalloc.
> >> If we can fix it, everyone would be happy. But life isn't simple
> >> like seeing my thread of the patch.
> >>
> >> So next option is to give up lazy initialization and initialize it at the
> >> very disksize setting time. But it makes unnecessary metadata waste until
> >> zram is really used. But let's think about it.
> >>
> >> 1) User of zram normally do mkfs.xxx or mkswap before using
> >> the zram block device(ex, normally, do it at booting time)
> >> It ends up allocating such metadata of zram before real usage so
> >> benefit of lazy initialzation would be mitigated.
> >>
> >> 2) Some user want to use zram when memory pressure is high.(ie, load zram
> >> dynamically, NOT booting time). It does make sense because people don't
> >> want to waste memory until memory pressure is high(ie, where zram is really
> >> helpful time). In this case, lazy initialzation could be failed easily
> >> because we will use GFP_NOIO instead of GFP_KERNEL for avoiding deadlock.
> >> So the benefit of lazy initialzation would be mitigated, too.
> >>
> >> 3) Metadata overhead is not critical and Nitin has a plan to diet it.
> >> 4K : 12 byte(64bit machine) -> 64G : 192M so 0.3% isn't big overhead
> >> If insane user use such big zram device up to 20, it could consume 6% of ram
> >> but efficieny of zram will cover the waste.
> >>
> >> So this patch gives up lazy initialization and instead we initialize metadata
> >> at disksize setting time.
> >>
> >> Signed-off-by: Minchan Kim <[email protected]>
> >> ---
> >> drivers/staging/zram/zram_drv.c | 21 ++++-----------------
> >> drivers/staging/zram/zram_sysfs.c | 1 +
> >> 2 files changed, 5 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> >> index 9ef1eca..f364fb5 100644
> >> --- a/drivers/staging/zram/zram_drv.c
> >> +++ b/drivers/staging/zram/zram_drv.c
> >> @@ -441,16 +441,13 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> >> {
> >> struct zram *zram = queue->queuedata;
> >>
> >> - if (unlikely(!zram->init_done) && zram_init_device(zram))
> >> - goto error;
> >> -
> >> down_read(&zram->init_lock);
> >> if (unlikely(!zram->init_done))
> >> - goto error_unlock;
> >> + goto error;
> >>
> >> if (!valid_io_request(zram, bio)) {
> >> zram_stat64_inc(zram, &zram->stats.invalid_io);
> >> - goto error_unlock;
> >> + goto error;
> >> }
> >>
> >> __zram_make_request(zram, bio, bio_data_dir(bio));
> >> @@ -458,9 +455,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
> >>
> >> return;
> >>
> >> -error_unlock:
> >> - up_read(&zram->init_lock);
> >> error:
> >> + up_read(&zram->init_lock);
> >> bio_io_error(bio);
> >> }
> >>
> >> @@ -509,19 +505,12 @@ void zram_reset_device(struct zram *zram)
> >> up_write(&zram->init_lock);
> >> }
> >>
> >> +/* zram->init_lock should be hold */
> >
> > s/hold/held
> >
> > btw, shouldn't we also change GFP_KERNEL to GFP_NOIO in is_partial_io()
> > case in both read/write handlers?
>
> Good point. Actually, the one in zram_bvec_read() should actually be
> GFP_ATOMIC because of the kmap_atomic() above (or be moved out of

Right.

> kmap_atomic/kunmap_atomic nest).
> Another solution would be to allocate some working buffer at device
> init as it's done for compress_buffer/workmem. It would make
> zram_bvec_read/write look simpler (no need to free memory or manage
> kmalloc failure).

Fair enough.
I sent a patch which replace GFP_KERNEL with GFP_ATOMIC but your suggestion
would be better. It could be a separate patch. I will send it.

Thanks.

>
> Jerome
>
> >
> > Rest of the patch looks good.
> >
> >
> > Thanks,
> > Nitin
> >
> >> int zram_init_device(struct zram *zram)
> >> {
> >> int ret;
> >> size_t num_pages;
> >>
> >> - down_write(&zram->init_lock);
> >> - if (zram->init_done) {
> >> - up_write(&zram->init_lock);
> >> - return 0;
> >> - }
> >> -
> >> - BUG_ON(!zram->disksize);
> >> -
> >> if (zram->disksize > 2 * (totalram_pages << PAGE_SHIFT)) {
> >> pr_info(
> >> "There is little point creating a zram of greater than "
> >> @@ -570,7 +559,6 @@ int zram_init_device(struct zram *zram)
> >> }
> >>
> >> zram->init_done = 1;
> >> - up_write(&zram->init_lock);
> >>
> >> pr_debug("Initialization done!\n");
> >> return 0;
> >> @@ -580,7 +568,6 @@ fail_no_table:
> >> zram->disksize = 0;
> >> fail:
> >> __zram_reset_device(zram);
> >> - up_write(&zram->init_lock);
> >> pr_err("Initialization failed: err=%d\n", ret);
> >> return ret;
> >> }
> >> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
> >> index 4143af9..369db12 100644
> >> --- a/drivers/staging/zram/zram_sysfs.c
> >> +++ b/drivers/staging/zram/zram_sysfs.c
> >> @@ -71,6 +71,7 @@ static ssize_t disksize_store(struct device *dev,
> >>
> >> zram->disksize = PAGE_ALIGN(disksize);
> >> set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
> >> + zram_init_device(zram);
> >> up_write(&zram->init_lock);
> >>
> >> return len;
> >>
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim