2010-08-09 17:26:41

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 00/10] zram: various improvements and cleanups

The zram module creates RAM based block devices named /dev/zram<id>
(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
in memory itself.

One of the major changes done is the replacement of ioctls with sysfs
interface. One of the advantages of this approach is we no longer depend on the
userspace tool (rzscontrol) which was used to set various parameters and check
statistics. Maintaining updated version of rzscontrol as changes were done to
ioctls, statistics exported etc. was a major pain.

Another significant change is the introduction of percpu stats and compression
buffers. Earlier, we had per-device buffers protected by a mutex. This was a
major bottleneck on multi-core systems. With these changes, benchmarks with
fio[1] showed a speedup of about 20% for write performance on dual-core
system (see patch 4/10 description for details).


For easier testing, a single patch against 2.6.35-git8 has been uploaded at:
http://compcache.googlecode.com/hg/sub-projects/mainline/zram_2.6.36-rc0.patch

[1] fio I/O benchmarking tool: http://freshmeat.net/projects/fio/

Nitin Gupta (10):
Replace ioctls with sysfs interface
Remove need for explicit device initialization
Use percpu stats
Use percpu buffers
Reduce per table entry overhead by 4 bytes
Block discard support
Increase compressed page size threshold
Some cleanups
Update zram documentation
Document sysfs entries

Documentation/ABI/testing/sysfs-block-zram | 99 ++++
drivers/staging/zram/Kconfig | 12 -
drivers/staging/zram/Makefile | 2 +-
drivers/staging/zram/xvmalloc.c | 22 +-
drivers/staging/zram/zram.txt | 58 ++-
drivers/staging/zram/zram_drv.c | 670 ++++++++++++++--------------
drivers/staging/zram/zram_drv.h | 138 ++----
drivers/staging/zram/zram_ioctl.h | 41 --
drivers/staging/zram/zram_sysfs.c | 253 +++++++++++
9 files changed, 760 insertions(+), 535 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-block-zram
delete mode 100644 drivers/staging/zram/zram_ioctl.h
create mode 100644 drivers/staging/zram/zram_sysfs.c

--
1.7.2.1


2010-08-09 17:26:49

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 01/10] Replace ioctls with sysfs interface

Creates per-device sysfs nodes in /sys/block/zram<id>/
Currently following stats are exported:
- disksize
- num_reads
- num_writes
- invalid_io
- zero_pages
- orig_data_size
- compr_data_size
- mem_used_total

By default, disksize is set to 0. So, to start using
a zram device, fist write a disksize value and then
initialize device by writing any positive value to
initstate. For example:

# initialize /dev/zram0 with 50MB disksize
echo 50*1024*1024 | bc > /sys/block/zram0/disksize
echo 1 > /sys/block/zram0/initstate

When done using a disk, issue reset to free its memory
by writing any positive value to reset node:

echo 1 > /sys/block/zram0/reset

This change also obviates the need for 'rzscontrol' utility.

Signed-off-by: Nitin Gupta <[email protected]>
---
drivers/staging/zram/Kconfig | 12 --
drivers/staging/zram/Makefile | 2 +-
drivers/staging/zram/zram_drv.c | 186 ++++++++--------------------
drivers/staging/zram/zram_drv.h | 55 ++-------
drivers/staging/zram/zram_ioctl.h | 41 ------
drivers/staging/zram/zram_sysfs.c | 242 +++++++++++++++++++++++++++++++++++++
6 files changed, 306 insertions(+), 232 deletions(-)
delete mode 100644 drivers/staging/zram/zram_ioctl.h
create mode 100644 drivers/staging/zram/zram_sysfs.c

diff --git a/drivers/staging/zram/Kconfig b/drivers/staging/zram/Kconfig
index 4654ae2..da079f8 100644
--- a/drivers/staging/zram/Kconfig
+++ b/drivers/staging/zram/Kconfig
@@ -15,15 +15,3 @@ config ZRAM

See zram.txt for more information.
Project home: http://compcache.googlecode.com/
-
-config ZRAM_STATS
- bool "Enable statistics for compressed RAM disks"
- depends on ZRAM
- default y
- help
- Enable statistics collection for compressed RAM devices. Statistics
- are exported through ioctl interface, so you have to use zramconfig
- program to get them. This adds only a minimal overhead.
-
- If unsure, say Y.
-
diff --git a/drivers/staging/zram/Makefile b/drivers/staging/zram/Makefile
index b2c087a..c01160a 100644
--- a/drivers/staging/zram/Makefile
+++ b/drivers/staging/zram/Makefile
@@ -1,3 +1,3 @@
-zram-objs := zram_drv.o xvmalloc.o
+zram-objs := zram_drv.o zram_sysfs.o xvmalloc.o

obj-$(CONFIG_ZRAM) += zram.o
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 77d4d71..3f698a5 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -33,10 +33,39 @@

/* Globals */
static int zram_major;
-static struct zram *devices;
+struct zram *devices;

/* Module params (documentation at end) */
-static unsigned int num_devices;
+unsigned int num_devices;
+
+static void zram_stat_inc(u32 *v)
+{
+ *v = *v + 1;
+}
+
+static void zram_stat_dec(u32 *v)
+{
+ *v = *v - 1;
+}
+
+static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
+{
+ spin_lock(&zram->stat64_lock);
+ *v = *v + inc;
+ spin_unlock(&zram->stat64_lock);
+}
+
+static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
+{
+ spin_lock(&zram->stat64_lock);
+ *v = *v - dec;
+ spin_unlock(&zram->stat64_lock);
+}
+
+static void zram_stat64_inc(struct zram *zram, u64 *v)
+{
+ zram_stat64_add(zram, v, 1);
+}

static int zram_test_flag(struct zram *zram, u32 index,
enum zram_pageflags flag)
@@ -91,7 +120,7 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
"the disk when not in use so a huge zram is "
"wasteful.\n"
"\tMemory Size: %zu kB\n"
- "\tSize you selected: %zu kB\n"
+ "\tSize you selected: %llu kB\n"
"Continuing anyway ...\n",
totalram_bytes >> 10, zram->disksize
);
@@ -100,49 +129,6 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
zram->disksize &= PAGE_MASK;
}

-static void zram_ioctl_get_stats(struct zram *zram,
- struct zram_ioctl_stats *s)
-{
- s->disksize = zram->disksize;
-
-#if defined(CONFIG_ZRAM_STATS)
- {
- struct zram_stats *rs = &zram->stats;
- size_t succ_writes, mem_used;
- unsigned int good_compress_perc = 0, no_compress_perc = 0;
-
- mem_used = xv_get_total_size_bytes(zram->mem_pool)
- + (rs->pages_expand << PAGE_SHIFT);
- succ_writes = zram_stat64_read(zram, &rs->num_writes) -
- zram_stat64_read(zram, &rs->failed_writes);
-
- if (succ_writes && rs->pages_stored) {
- good_compress_perc = rs->good_compress * 100
- / rs->pages_stored;
- no_compress_perc = rs->pages_expand * 100
- / rs->pages_stored;
- }
-
- s->num_reads = zram_stat64_read(zram, &rs->num_reads);
- s->num_writes = zram_stat64_read(zram, &rs->num_writes);
- s->failed_reads = zram_stat64_read(zram, &rs->failed_reads);
- s->failed_writes = zram_stat64_read(zram, &rs->failed_writes);
- s->invalid_io = zram_stat64_read(zram, &rs->invalid_io);
- s->notify_free = zram_stat64_read(zram, &rs->notify_free);
- s->pages_zero = rs->pages_zero;
-
- s->good_compress_pct = good_compress_perc;
- s->pages_expand_pct = no_compress_perc;
-
- s->pages_stored = rs->pages_stored;
- s->pages_used = mem_used >> PAGE_SHIFT;
- s->orig_data_size = rs->pages_stored << PAGE_SHIFT;
- s->compr_data_size = rs->compr_size;
- s->mem_used_total = mem_used;
- }
-#endif /* CONFIG_ZRAM_STATS */
-}
-
static void zram_free_page(struct zram *zram, size_t index)
{
u32 clen;
@@ -180,7 +166,7 @@ static void zram_free_page(struct zram *zram, size_t index)
zram_stat_dec(&zram->stats.good_compress);

out:
- zram->stats.compr_size -= clen;
+ zram_stat64_sub(zram, &zram->stats.compr_size, clen);
zram_stat_dec(&zram->stats.pages_stored);

zram->table[index].page = NULL;
@@ -396,7 +382,7 @@ memstore:
kunmap_atomic(src, KM_USER0);

/* Update stats */
- zram->stats.compr_size += clen;
+ zram_stat64_add(zram, &zram->stats.compr_size, clen);
zram_stat_inc(&zram->stats.pages_stored);
if (clen <= PAGE_SIZE / 2)
zram_stat_inc(&zram->stats.good_compress);
@@ -463,7 +449,7 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
return ret;
}

-static void reset_device(struct zram *zram)
+void zram_reset_device(struct zram *zram)
{
size_t index;

@@ -506,7 +492,7 @@ static void reset_device(struct zram *zram)
zram->disksize = 0;
}

-static int zram_ioctl_init_device(struct zram *zram)
+int zram_init_device(struct zram *zram)
{
int ret;
size_t num_pages;
@@ -561,91 +547,12 @@ static int zram_ioctl_init_device(struct zram *zram)
return 0;

fail:
- reset_device(zram);
+ zram_reset_device(zram);

pr_err("Initialization failed: err=%d\n", ret);
return ret;
}

-static int zram_ioctl_reset_device(struct zram *zram)
-{
- if (zram->init_done)
- reset_device(zram);
-
- return 0;
-}
-
-static int zram_ioctl(struct block_device *bdev, fmode_t mode,
- unsigned int cmd, unsigned long arg)
-{
- int ret = 0;
- size_t disksize_kb;
-
- struct zram *zram = bdev->bd_disk->private_data;
-
- switch (cmd) {
- case ZRAMIO_SET_DISKSIZE_KB:
- if (zram->init_done) {
- ret = -EBUSY;
- goto out;
- }
- if (copy_from_user(&disksize_kb, (void *)arg,
- _IOC_SIZE(cmd))) {
- ret = -EFAULT;
- goto out;
- }
- zram->disksize = disksize_kb << 10;
- pr_info("Disk size set to %zu kB\n", disksize_kb);
- break;
-
- case ZRAMIO_GET_STATS:
- {
- struct zram_ioctl_stats *stats;
- if (!zram->init_done) {
- ret = -ENOTTY;
- goto out;
- }
- stats = kzalloc(sizeof(*stats), GFP_KERNEL);
- if (!stats) {
- ret = -ENOMEM;
- goto out;
- }
- zram_ioctl_get_stats(zram, stats);
- if (copy_to_user((void *)arg, stats, sizeof(*stats))) {
- kfree(stats);
- ret = -EFAULT;
- goto out;
- }
- kfree(stats);
- break;
- }
- case ZRAMIO_INIT:
- ret = zram_ioctl_init_device(zram);
- break;
-
- case ZRAMIO_RESET:
- /* Do not reset an active device! */
- if (bdev->bd_holders) {
- ret = -EBUSY;
- goto out;
- }
-
- /* Make sure all pending I/O is finished */
- if (bdev)
- fsync_bdev(bdev);
-
- ret = zram_ioctl_reset_device(zram);
- break;
-
- default:
- pr_info("Invalid ioctl %u\n", cmd);
- ret = -ENOTTY;
- }
-
-out:
- return ret;
-}
-
void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
{
struct zram *zram;
@@ -656,7 +563,6 @@ void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
}

static const struct block_device_operations zram_devops = {
- .ioctl = zram_ioctl,
.swap_slot_free_notify = zram_slot_free_notify,
.owner = THIS_MODULE
};
@@ -696,7 +602,7 @@ static int create_device(struct zram *zram, int device_id)
zram->disk->private_data = zram;
snprintf(zram->disk->disk_name, 16, "zram%d", device_id);

- /* Actual capacity set using ZRAMIO_SET_DISKSIZE_KB ioctl */
+ /* Actual capacity set using syfs (/sys/block/zram<id>/disksize */
set_capacity(zram->disk, 0);

/*
@@ -710,6 +616,15 @@ static int create_device(struct zram *zram, int device_id)

add_disk(zram->disk);

+#ifdef CONFIG_SYSFS
+ ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
+ &zram_disk_attr_group);
+ if (ret < 0) {
+ pr_warning("Error creating sysfs group");
+ goto out;
+ }
+#endif
+
zram->init_done = 0;

out:
@@ -718,6 +633,11 @@ out:

static void destroy_device(struct zram *zram)
{
+#ifdef CONFIG_SYSFS
+ sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
+ &zram_disk_attr_group);
+#endif
+
if (zram->disk) {
del_gendisk(zram->disk);
put_disk(zram->disk);
@@ -785,7 +705,7 @@ static void __exit zram_exit(void)

destroy_device(zram);
if (zram->init_done)
- reset_device(zram);
+ zram_reset_device(zram);
}

unregister_blkdev(zram_major, "zram");
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 945f974..2ef93cc 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -18,7 +18,6 @@
#include <linux/spinlock.h>
#include <linux/mutex.h>

-#include "zram_ioctl.h"
#include "xvmalloc.h"

/*
@@ -85,11 +84,7 @@ struct table {
} __attribute__((aligned(4)));

struct zram_stats {
- /* basic stats */
- size_t compr_size; /* compressed size of pages stored -
- * needed to enforce memlimit */
- /* more stats */
-#if defined(CONFIG_ZRAM_STATS)
+ u64 compr_size; /* compressed size of pages stored */
u64 num_reads; /* failed + successful */
u64 num_writes; /* --do-- */
u64 failed_reads; /* should NEVER! happen */
@@ -100,7 +95,6 @@ struct zram_stats {
u32 pages_stored; /* no. of pages currently stored */
u32 good_compress; /* % of pages with compression ratio<=50% */
u32 pages_expand; /* % of incompressible pages */
-#endif
};

struct zram {
@@ -118,47 +112,18 @@ struct zram {
* This is the limit on amount of *uncompressed* worth of data
* we can store in a disk.
*/
- size_t disksize; /* bytes */
+ u64 disksize; /* bytes */

struct zram_stats stats;
};

-/*-- */
-
-/* Debugging and Stats */
-#if defined(CONFIG_ZRAM_STATS)
-static void zram_stat_inc(u32 *v)
-{
- *v = *v + 1;
-}
-
-static void zram_stat_dec(u32 *v)
-{
- *v = *v - 1;
-}
-
-static void zram_stat64_inc(struct zram *zram, u64 *v)
-{
- spin_lock(&zram->stat64_lock);
- *v = *v + 1;
- spin_unlock(&zram->stat64_lock);
-}
-
-static u64 zram_stat64_read(struct zram *zram, u64 *v)
-{
- u64 val;
-
- spin_lock(&zram->stat64_lock);
- val = *v;
- spin_unlock(&zram->stat64_lock);
-
- return val;
-}
-#else
-#define zram_stat_inc(v)
-#define zram_stat_dec(v)
-#define zram_stat64_inc(r, v)
-#define zram_stat64_read(r, v)
-#endif /* CONFIG_ZRAM_STATS */
+extern struct zram *devices;
+extern unsigned int num_devices;
+#ifdef CONFIG_SYSFS
+extern struct attribute_group zram_disk_attr_group;
+#endif
+
+extern int zram_init_device(struct zram *zram);
+extern void zram_reset_device(struct zram *zram);

#endif
diff --git a/drivers/staging/zram/zram_ioctl.h b/drivers/staging/zram/zram_ioctl.h
deleted file mode 100644
index 5c415fa..0000000
--- a/drivers/staging/zram/zram_ioctl.h
+++ /dev/null
@@ -1,41 +0,0 @@
-/*
- * Compressed RAM block device
- *
- * Copyright (C) 2008, 2009, 2010 Nitin Gupta
- *
- * This code is released using a dual license strategy: BSD/GPL
- * You can choose the licence that better fits your requirements.
- *
- * Released under the terms of 3-clause BSD License
- * Released under the terms of GNU General Public License Version 2.0
- *
- * Project home: http://compcache.googlecode.com
- */
-
-#ifndef _ZRAM_IOCTL_H_
-#define _ZRAM_IOCTL_H_
-
-struct zram_ioctl_stats {
- u64 disksize; /* disksize in bytes (user specifies in KB) */
- u64 num_reads; /* failed + successful */
- u64 num_writes; /* --do-- */
- u64 failed_reads; /* should NEVER! happen */
- u64 failed_writes; /* can happen when memory is too low */
- u64 invalid_io; /* non-page-aligned I/O requests */
- u64 notify_free; /* no. of swap slot free notifications */
- u32 pages_zero; /* no. of zero filled pages */
- u32 good_compress_pct; /* no. of pages with compression ratio<=50% */
- u32 pages_expand_pct; /* no. of incompressible pages */
- u32 pages_stored;
- u32 pages_used;
- u64 orig_data_size;
- u64 compr_data_size;
- u64 mem_used_total;
-} __attribute__ ((packed, aligned(4)));
-
-#define ZRAMIO_SET_DISKSIZE_KB _IOW('z', 0, size_t)
-#define ZRAMIO_GET_STATS _IOR('z', 1, struct zram_ioctl_stats)
-#define ZRAMIO_INIT _IO('z', 2)
-#define ZRAMIO_RESET _IO('z', 3)
-
-#endif
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
new file mode 100644
index 0000000..b8dbaee
--- /dev/null
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -0,0 +1,242 @@
+/*
+ * Compressed RAM block device
+ *
+ * Copyright (C) 2008, 2009, 2010 Nitin Gupta
+ *
+ * This code is released using a dual license strategy: BSD/GPL
+ * You can choose the licence that better fits your requirements.
+ *
+ * Released under the terms of 3-clause BSD License
+ * Released under the terms of GNU General Public License Version 2.0
+ *
+ * Project home: http://compcache.googlecode.com/
+ */
+
+#include <linux/device.h>
+#include <linux/genhd.h>
+
+#include "zram_drv.h"
+
+#ifdef CONFIG_SYSFS
+
+static u64 zram_stat64_read(struct zram *zram, u64 *v)
+{
+ u64 val;
+
+ spin_lock(&zram->stat64_lock);
+ val = *v;
+ spin_unlock(&zram->stat64_lock);
+
+ return val;
+}
+
+static struct zram *dev_to_zram(struct device *dev)
+{
+ int i;
+ struct zram *zram = NULL;
+
+ for (i = 0; i < num_devices; i++) {
+ zram = &devices[i];
+ if (disk_to_dev(zram->disk) == dev)
+ break;
+ }
+
+ return zram;
+}
+
+static ssize_t disksize_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%llu\n", zram->disksize);
+}
+
+static ssize_t disksize_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ int ret;
+ struct zram *zram = dev_to_zram(dev);
+
+ if (zram->init_done)
+ return -EBUSY;
+
+ ret = strict_strtoull(buf, 10, &zram->disksize);
+ if (ret)
+ return ret;
+
+ zram->disksize &= PAGE_MASK;
+ set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
+
+ return len;
+}
+
+static ssize_t initstate_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%u\n", zram->init_done);
+}
+
+static ssize_t initstate_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ int ret;
+ unsigned long do_init;
+ struct zram *zram = dev_to_zram(dev);
+
+ ret = strict_strtoul(buf, 10, &do_init);
+ if (ret)
+ return ret;
+
+ if (!do_init)
+ return -EINVAL;
+
+ zram_init_device(zram);
+
+ return len;
+}
+
+static ssize_t reset_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ int ret;
+ unsigned long do_reset;
+ struct zram *zram;
+ struct block_device *bdev;
+
+ zram = dev_to_zram(dev);
+ bdev = bdget_disk(zram->disk, 0);
+
+ /* Do not reset an active device! */
+ if (bdev->bd_holders)
+ return -EBUSY;
+
+ ret = strict_strtoul(buf, 10, &do_reset);
+ if (ret)
+ return ret;
+
+ if (!do_reset)
+ return -EINVAL;
+
+ /* Make sure all pending I/O is finished */
+ if (bdev)
+ fsync_bdev(bdev);
+
+ if (zram->init_done)
+ zram_reset_device(zram);
+
+ return len;
+}
+
+static ssize_t num_reads_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%llu\n",
+ zram_stat64_read(zram, &zram->stats.num_reads));
+}
+
+static ssize_t num_writes_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%llu\n",
+ zram_stat64_read(zram, &zram->stats.num_writes));
+}
+
+static ssize_t invalid_io_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%llu\n",
+ zram_stat64_read(zram, &zram->stats.invalid_io));
+}
+
+static ssize_t notify_free_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%llu\n",
+ zram_stat64_read(zram, &zram->stats.notify_free));
+}
+
+static ssize_t zero_pages_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%u\n", zram->stats.pages_zero);
+}
+
+static ssize_t orig_data_size_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%llu\n",
+ (u64)(zram->stats.pages_stored) << PAGE_SHIFT);
+}
+
+static ssize_t compr_data_size_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%llu\n",
+ zram_stat64_read(zram, &zram->stats.compr_size));
+}
+
+static ssize_t mem_used_total_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ u64 val = 0;
+ struct zram *zram = dev_to_zram(dev);
+
+ if (zram->init_done) {
+ val = xv_get_total_size_bytes(zram->mem_pool) +
+ ((u64)(zram->stats.pages_expand) << PAGE_SHIFT);
+ }
+
+ return sprintf(buf, "%llu\n", val);
+}
+
+static DEVICE_ATTR(disksize, S_IRUGO | S_IWUGO,
+ disksize_show, disksize_store);
+static DEVICE_ATTR(initstate, S_IRUGO | S_IWUGO,
+ initstate_show, initstate_store);
+static DEVICE_ATTR(reset, S_IWUGO, NULL, reset_store);
+static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
+static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
+static DEVICE_ATTR(invalid_io, S_IRUGO, invalid_io_show, NULL);
+static DEVICE_ATTR(notify_free, S_IRUGO, notify_free_show, NULL);
+static DEVICE_ATTR(zero_pages, S_IRUGO, zero_pages_show, NULL);
+static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
+static DEVICE_ATTR(compr_data_size, S_IRUGO, compr_data_size_show, NULL);
+static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
+
+static struct attribute *zram_disk_attrs[] = {
+ &dev_attr_disksize.attr,
+ &dev_attr_initstate.attr,
+ &dev_attr_reset.attr,
+ &dev_attr_num_reads.attr,
+ &dev_attr_num_writes.attr,
+ &dev_attr_invalid_io.attr,
+ &dev_attr_notify_free.attr,
+ &dev_attr_zero_pages.attr,
+ &dev_attr_orig_data_size.attr,
+ &dev_attr_compr_data_size.attr,
+ &dev_attr_mem_used_total.attr,
+ NULL,
+};
+
+struct attribute_group zram_disk_attr_group = {
+ .attrs = zram_disk_attrs,
+};
+
+#endif /* CONFIG_SYSFS */
--
1.7.2.1

2010-08-09 17:26:57

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 02/10] Remove need for explicit device initialization

Currently, the user has to explicitly write a positive value to
initstate sysfs node before the device can be used. This event
triggers allocation of per-device metadata like memory pool,
table array and so on.

We do not pre-initialize all zram devices since the 'table' array,
mapping disk blocks to compressed chunks, takes considerable amount
of memory (8 bytes per page). So, pre-initializing all devices will
be quite wasteful if only few or none of the devices are actually
used.

This explicit device initialization from user is an odd requirement and
can be easily avoided. We now initialize the device when first write is
done to the device.

Signed-off-by: Nitin Gupta <[email protected]>
---
drivers/staging/zram/zram_drv.c | 35 +++++++++++++++++++++++------------
drivers/staging/zram/zram_drv.h | 2 ++
drivers/staging/zram/zram_sysfs.c | 26 ++++----------------------
3 files changed, 29 insertions(+), 34 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 3f698a5..c5f84ee 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -207,9 +207,15 @@ static int zram_read(struct zram *zram, struct bio *bio)
u32 index;
struct bio_vec *bvec;

- zram_stat64_inc(zram, &zram->stats.num_reads);
+ if (unlikely(!zram->init_done)) {
+ set_bit(BIO_UPTODATE, &bio->bi_flags);
+ bio_endio(bio, 0);
+ return 0;
+ }

+ zram_stat64_inc(zram, &zram->stats.num_reads);
index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
+
bio_for_each_segment(bvec, bio, i) {
int ret;
size_t clen;
@@ -275,16 +281,20 @@ out:

static int zram_write(struct zram *zram, struct bio *bio)
{
- int i;
+ int i, ret;
u32 index;
struct bio_vec *bvec;

- zram_stat64_inc(zram, &zram->stats.num_writes);
+ if (unlikely(!zram->init_done)) {
+ ret = zram_init_device(zram);
+ if (ret)
+ goto out;
+ }

+ zram_stat64_inc(zram, &zram->stats.num_writes);
index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;

bio_for_each_segment(bvec, bio, i) {
- int ret;
u32 offset;
size_t clen;
struct zobj_header *zheader;
@@ -425,11 +435,6 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
int ret = 0;
struct zram *zram = queue->queuedata;

- if (unlikely(!zram->init_done)) {
- bio_io_error(bio);
- return 0;
- }
-
if (!valid_io_request(zram, bio)) {
zram_stat64_inc(zram, &zram->stats.invalid_io);
bio_io_error(bio);
@@ -453,7 +458,7 @@ void zram_reset_device(struct zram *zram)
{
size_t index;

- /* Do not accept any new I/O request */
+ mutex_lock(&zram->init_lock);
zram->init_done = 0;

/* Free various per-device buffers */
@@ -490,6 +495,7 @@ void zram_reset_device(struct zram *zram)
memset(&zram->stats, 0, sizeof(zram->stats));

zram->disksize = 0;
+ mutex_unlock(&zram->init_lock);
}

int zram_init_device(struct zram *zram)
@@ -497,9 +503,11 @@ int zram_init_device(struct zram *zram)
int ret;
size_t num_pages;

+ mutex_lock(&zram->init_lock);
+
if (zram->init_done) {
- pr_info("Device already initialized!\n");
- return -EBUSY;
+ mutex_unlock(&zram->init_lock);
+ return 0;
}

zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
@@ -542,11 +550,13 @@ int zram_init_device(struct zram *zram)
}

zram->init_done = 1;
+ mutex_unlock(&zram->init_lock);

pr_debug("Initialization done!\n");
return 0;

fail:
+ mutex_unlock(&zram->init_lock);
zram_reset_device(zram);

pr_err("Initialization failed: err=%d\n", ret);
@@ -572,6 +582,7 @@ static int create_device(struct zram *zram, int device_id)
int ret = 0;

mutex_init(&zram->lock);
+ mutex_init(&zram->init_lock);
spin_lock_init(&zram->stat64_lock);

zram->queue = blk_alloc_queue(GFP_KERNEL);
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 2ef93cc..a481551 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -108,6 +108,8 @@ struct zram {
struct request_queue *queue;
struct gendisk *disk;
int init_done;
+ /* Prevent concurrent execution of device init and reset */
+ struct mutex init_lock;
/*
* This is the limit on amount of *uncompressed* worth of data
* we can store in a disk.
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index b8dbaee..6c574a9 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -58,8 +58,10 @@ static ssize_t disksize_store(struct device *dev,
int ret;
struct zram *zram = dev_to_zram(dev);

- if (zram->init_done)
+ if (zram->init_done) {
+ pr_info("Cannot change disksize for initialized device\n");
return -EBUSY;
+ }

ret = strict_strtoull(buf, 10, &zram->disksize);
if (ret)
@@ -79,25 +81,6 @@ static ssize_t initstate_show(struct device *dev,
return sprintf(buf, "%u\n", zram->init_done);
}

-static ssize_t initstate_store(struct device *dev,
- struct device_attribute *attr, const char *buf, size_t len)
-{
- int ret;
- unsigned long do_init;
- struct zram *zram = dev_to_zram(dev);
-
- ret = strict_strtoul(buf, 10, &do_init);
- if (ret)
- return ret;
-
- if (!do_init)
- return -EINVAL;
-
- zram_init_device(zram);
-
- return len;
-}
-
static ssize_t reset_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
@@ -208,8 +191,7 @@ static ssize_t mem_used_total_show(struct device *dev,

static DEVICE_ATTR(disksize, S_IRUGO | S_IWUGO,
disksize_show, disksize_store);
-static DEVICE_ATTR(initstate, S_IRUGO | S_IWUGO,
- initstate_show, initstate_store);
+static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
static DEVICE_ATTR(reset, S_IWUGO, NULL, reset_store);
static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
--
1.7.2.1

2010-08-09 17:27:08

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 03/10] Use percpu stats

Also remove references to removed stats (ex: good_comress).

Signed-off-by: Nitin Gupta <[email protected]>
---
drivers/staging/zram/zram_drv.c | 77 ++++++++++++++++--------------------
drivers/staging/zram/zram_drv.h | 33 +++++++++-------
drivers/staging/zram/zram_sysfs.c | 46 +++++++++++++++-------
3 files changed, 84 insertions(+), 72 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index c5f84ee..f111b86 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -38,33 +38,27 @@ struct zram *devices;
/* Module params (documentation at end) */
unsigned int num_devices;

-static void zram_stat_inc(u32 *v)
+static void zram_add_stat(struct zram *zram,
+ enum zram_stats_index idx, s64 val)
{
- *v = *v + 1;
+ struct zram_stats_cpu *stats;
+
+ preempt_disable();
+ stats = __this_cpu_ptr(zram->stats);
+ u64_stats_update_begin(&stats->syncp);
+ stats->count[idx] += val;
+ u64_stats_update_end(&stats->syncp);
+ preempt_enable();
}

-static void zram_stat_dec(u32 *v)
+static void zram_inc_stat(struct zram *zram, enum zram_stats_index idx)
{
- *v = *v - 1;
+ zram_add_stat(zram, idx, 1);
}

-static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
+static void zram_dec_stat(struct zram *zram, enum zram_stats_index idx)
{
- spin_lock(&zram->stat64_lock);
- *v = *v + inc;
- spin_unlock(&zram->stat64_lock);
-}
-
-static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
-{
- spin_lock(&zram->stat64_lock);
- *v = *v - dec;
- spin_unlock(&zram->stat64_lock);
-}
-
-static void zram_stat64_inc(struct zram *zram, u64 *v)
-{
- zram_stat64_add(zram, v, 1);
+ zram_add_stat(zram, idx, -1);
}

static int zram_test_flag(struct zram *zram, u32 index,
@@ -131,7 +125,7 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)

static void zram_free_page(struct zram *zram, size_t index)
{
- u32 clen;
+ int clen;
void *obj;

struct page *page = zram->table[index].page;
@@ -144,7 +138,7 @@ static void zram_free_page(struct zram *zram, size_t index)
*/
if (zram_test_flag(zram, index, ZRAM_ZERO)) {
zram_clear_flag(zram, index, ZRAM_ZERO);
- zram_stat_dec(&zram->stats.pages_zero);
+ zram_dec_stat(zram, ZRAM_STAT_PAGES_ZERO);
}
return;
}
@@ -153,7 +147,7 @@ static void zram_free_page(struct zram *zram, size_t index)
clen = PAGE_SIZE;
__free_page(page);
zram_clear_flag(zram, index, ZRAM_UNCOMPRESSED);
- zram_stat_dec(&zram->stats.pages_expand);
+ zram_dec_stat(zram, ZRAM_STAT_PAGES_EXPAND);
goto out;
}

@@ -162,12 +156,10 @@ static void zram_free_page(struct zram *zram, size_t index)
kunmap_atomic(obj, KM_USER0);

xv_free(zram->mem_pool, page, offset);
- if (clen <= PAGE_SIZE / 2)
- zram_stat_dec(&zram->stats.good_compress);

out:
- zram_stat64_sub(zram, &zram->stats.compr_size, clen);
- zram_stat_dec(&zram->stats.pages_stored);
+ zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, -clen);
+ zram_dec_stat(zram, ZRAM_STAT_PAGES_STORED);

zram->table[index].page = NULL;
zram->table[index].offset = 0;
@@ -213,7 +205,7 @@ static int zram_read(struct zram *zram, struct bio *bio)
return 0;
}

- zram_stat64_inc(zram, &zram->stats.num_reads);
+ zram_inc_stat(zram, ZRAM_STAT_NUM_READS);
index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;

bio_for_each_segment(bvec, bio, i) {
@@ -262,7 +254,6 @@ static int zram_read(struct zram *zram, struct bio *bio)
if (unlikely(ret != LZO_E_OK)) {
pr_err("Decompression failed! err=%d, page=%u\n",
ret, index);
- zram_stat64_inc(zram, &zram->stats.failed_reads);
goto out;
}

@@ -291,7 +282,7 @@ static int zram_write(struct zram *zram, struct bio *bio)
goto out;
}

- zram_stat64_inc(zram, &zram->stats.num_writes);
+ zram_inc_stat(zram, ZRAM_STAT_NUM_WRITES);
index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;

bio_for_each_segment(bvec, bio, i) {
@@ -318,7 +309,7 @@ static int zram_write(struct zram *zram, struct bio *bio)
if (page_zero_filled(user_mem)) {
kunmap_atomic(user_mem, KM_USER0);
mutex_unlock(&zram->lock);
- zram_stat_inc(&zram->stats.pages_zero);
+ zram_inc_stat(zram, ZRAM_STAT_PAGES_ZERO);
zram_set_flag(zram, index, ZRAM_ZERO);
continue;
}
@@ -331,7 +322,6 @@ static int zram_write(struct zram *zram, struct bio *bio)
if (unlikely(ret != LZO_E_OK)) {
mutex_unlock(&zram->lock);
pr_err("Compression failed! err=%d\n", ret);
- zram_stat64_inc(zram, &zram->stats.failed_writes);
goto out;
}

@@ -347,14 +337,12 @@ static int zram_write(struct zram *zram, struct bio *bio)
mutex_unlock(&zram->lock);
pr_info("Error allocating memory for "
"incompressible page: %u\n", index);
- zram_stat64_inc(zram,
- &zram->stats.failed_writes);
goto out;
}

offset = 0;
zram_set_flag(zram, index, ZRAM_UNCOMPRESSED);
- zram_stat_inc(&zram->stats.pages_expand);
+ zram_inc_stat(zram, ZRAM_STAT_PAGES_EXPAND);
zram->table[index].page = page_store;
src = kmap_atomic(page, KM_USER0);
goto memstore;
@@ -366,7 +354,6 @@ static int zram_write(struct zram *zram, struct bio *bio)
mutex_unlock(&zram->lock);
pr_info("Error allocating memory for compressed "
"page: %u, size=%zu\n", index, clen);
- zram_stat64_inc(zram, &zram->stats.failed_writes);
goto out;
}

@@ -392,10 +379,8 @@ memstore:
kunmap_atomic(src, KM_USER0);

/* Update stats */
- zram_stat64_add(zram, &zram->stats.compr_size, clen);
- zram_stat_inc(&zram->stats.pages_stored);
- if (clen <= PAGE_SIZE / 2)
- zram_stat_inc(&zram->stats.good_compress);
+ zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, clen);
+ zram_inc_stat(zram, ZRAM_STAT_PAGES_STORED);

mutex_unlock(&zram->lock);
index++;
@@ -436,7 +421,7 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
struct zram *zram = queue->queuedata;

if (!valid_io_request(zram, bio)) {
- zram_stat64_inc(zram, &zram->stats.invalid_io);
+ zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
bio_io_error(bio);
return 0;
}
@@ -542,6 +527,13 @@ int zram_init_device(struct zram *zram)
/* zram devices sort of resembles non-rotational disks */
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);

+ zram->stats = alloc_percpu(struct zram_stats_cpu);
+ if (!zram->stats) {
+ pr_err("Error allocating percpu stats\n");
+ ret = -ENOMEM;
+ goto fail;
+ }
+
zram->mem_pool = xv_create_pool();
if (!zram->mem_pool) {
pr_err("Error creating memory pool\n");
@@ -569,7 +561,7 @@ void zram_slot_free_notify(struct block_device *bdev, unsigned long index)

zram = bdev->bd_disk->private_data;
zram_free_page(zram, index);
- zram_stat64_inc(zram, &zram->stats.notify_free);
+ zram_inc_stat(zram, ZRAM_STAT_NOTIFY_FREE);
}

static const struct block_device_operations zram_devops = {
@@ -583,7 +575,6 @@ static int create_device(struct zram *zram, int device_id)

mutex_init(&zram->lock);
mutex_init(&zram->init_lock);
- spin_lock_init(&zram->stat64_lock);

zram->queue = blk_alloc_queue(GFP_KERNEL);
if (!zram->queue) {
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index a481551..88fddb4 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -15,8 +15,8 @@
#ifndef _ZRAM_DRV_H_
#define _ZRAM_DRV_H_

-#include <linux/spinlock.h>
#include <linux/mutex.h>
+#include <linux/u64_stats_sync.h>

#include "xvmalloc.h"

@@ -83,18 +83,22 @@ struct table {
u8 flags;
} __attribute__((aligned(4)));

-struct zram_stats {
- u64 compr_size; /* compressed size of pages stored */
- u64 num_reads; /* failed + successful */
- u64 num_writes; /* --do-- */
- u64 failed_reads; /* should NEVER! happen */
- u64 failed_writes; /* can happen when memory is too low */
- u64 invalid_io; /* non-page-aligned I/O requests */
- u64 notify_free; /* no. of swap slot free notifications */
- u32 pages_zero; /* no. of zero filled pages */
- u32 pages_stored; /* no. of pages currently stored */
- u32 good_compress; /* % of pages with compression ratio<=50% */
- u32 pages_expand; /* % of incompressible pages */
+enum zram_stats_index {
+ ZRAM_STAT_COMPR_SIZE, /* compressed size of pages stored */
+ ZRAM_STAT_NUM_READS, /* failed + successful */
+ ZRAM_STAT_NUM_WRITES, /* --do-- */
+ ZRAM_STAT_INVALID_IO, /* non-page-aligned I/O requests */
+ ZRAM_STAT_NOTIFY_FREE, /* no. of swap slot free notifications */
+ ZRAM_STAT_DISCARD, /* no. of block discard requests */
+ ZRAM_STAT_PAGES_ZERO, /* no. of zero filled pages */
+ ZRAM_STAT_PAGES_STORED, /* no. of pages currently stored */
+ ZRAM_STAT_PAGES_EXPAND, /* no. of incompressible pages */
+ ZRAM_STAT_NSTATS,
+};
+
+struct zram_stats_cpu {
+ s64 count[ZRAM_STAT_NSTATS];
+ struct u64_stats_sync syncp;
};

struct zram {
@@ -102,7 +106,6 @@ struct zram {
void *compress_workmem;
void *compress_buffer;
struct table *table;
- spinlock_t stat64_lock; /* protect 64-bit stats */
struct mutex lock; /* protect compression buffers against
* concurrent writes */
struct request_queue *queue;
@@ -116,7 +119,7 @@ struct zram {
*/
u64 disksize; /* bytes */

- struct zram_stats stats;
+ struct zram_stats_cpu *stats;
};

extern struct zram *devices;
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index 6c574a9..43bcdd4 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -19,14 +19,30 @@

#ifdef CONFIG_SYSFS

-static u64 zram_stat64_read(struct zram *zram, u64 *v)
+/*
+ * Individual percpu values can go negative but the sum across all CPUs
+ * must always be positive (we store various counts). So, return sum as
+ * unsigned value.
+ */
+static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
{
- u64 val;
-
- spin_lock(&zram->stat64_lock);
- val = *v;
- spin_unlock(&zram->stat64_lock);
+ int cpu;
+ s64 val = 0;
+
+ for_each_possible_cpu(cpu) {
+ s64 temp;
+ unsigned int start;
+ struct zram_stats_cpu *stats;
+
+ stats = per_cpu_ptr(zram->stats, cpu);
+ do {
+ start = u64_stats_fetch_begin(&stats->syncp);
+ temp = stats->count[idx];
+ } while (u64_stats_fetch_retry(&stats->syncp, start));
+ val += temp;
+ }

+ WARN_ON(val < 0);
return val;
}

@@ -119,7 +135,7 @@ static ssize_t num_reads_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);

return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.num_reads));
+ zram_get_stat(zram, ZRAM_STAT_NUM_READS));
}

static ssize_t num_writes_show(struct device *dev,
@@ -128,7 +144,7 @@ static ssize_t num_writes_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);

return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.num_writes));
+ zram_get_stat(zram, ZRAM_STAT_NUM_WRITES));
}

static ssize_t invalid_io_show(struct device *dev,
@@ -137,7 +153,7 @@ static ssize_t invalid_io_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);

return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.invalid_io));
+ zram_get_stat(zram, ZRAM_STAT_INVALID_IO));
}

static ssize_t notify_free_show(struct device *dev,
@@ -146,7 +162,7 @@ static ssize_t notify_free_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);

return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.notify_free));
+ zram_get_stat(zram, ZRAM_STAT_NOTIFY_FREE));
}

static ssize_t zero_pages_show(struct device *dev,
@@ -154,7 +170,8 @@ static ssize_t zero_pages_show(struct device *dev,
{
struct zram *zram = dev_to_zram(dev);

- return sprintf(buf, "%u\n", zram->stats.pages_zero);
+ return sprintf(buf, "%llu\n",
+ zram_get_stat(zram, ZRAM_STAT_PAGES_ZERO));
}

static ssize_t orig_data_size_show(struct device *dev,
@@ -163,7 +180,7 @@ static ssize_t orig_data_size_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);

return sprintf(buf, "%llu\n",
- (u64)(zram->stats.pages_stored) << PAGE_SHIFT);
+ zram_get_stat(zram, ZRAM_STAT_PAGES_STORED) << PAGE_SHIFT);
}

static ssize_t compr_data_size_show(struct device *dev,
@@ -172,7 +189,7 @@ static ssize_t compr_data_size_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);

return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.compr_size));
+ zram_get_stat(zram, ZRAM_STAT_COMPR_SIZE));
}

static ssize_t mem_used_total_show(struct device *dev,
@@ -183,7 +200,8 @@ static ssize_t mem_used_total_show(struct device *dev,

if (zram->init_done) {
val = xv_get_total_size_bytes(zram->mem_pool) +
- ((u64)(zram->stats.pages_expand) << PAGE_SHIFT);
+ (zram_get_stat(zram, ZRAM_STAT_PAGES_EXPAND)
+ << PAGE_SHIFT);
}

return sprintf(buf, "%llu\n", val);
--
1.7.2.1

2010-08-09 17:27:16

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 05/10] Reduce per table entry overhead by 4 bytes

Each zram device maintains an array (table) that maps
index within the device to the location of corresponding
compressed chunk. Currently we store 'struct page' pointer,
offset with page and various flags separately which takes
12 bytes per table entry. Now all these are encoded in a
single 'phys_add_t' value which results in savings of 4 bytes
per entry (except on PAE systems).

Unfortunately, cleanups related to some variable renames
were mixed in this patch. So, please bear some additional
noise.

Signed-off-by: Nitin Gupta <[email protected]>
---
drivers/staging/zram/zram_drv.c | 256 ++++++++++++++++++++------------------
drivers/staging/zram/zram_drv.h | 24 +---
2 files changed, 140 insertions(+), 140 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index c16e09a..efe9c93 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -42,6 +42,13 @@ struct zram *devices;
/* Module params (documentation at end) */
unsigned int num_devices;

+/*
+ * We do not allocate any memory for zero-filled pages.
+ * Rather, we simply mark them in corresponding table
+ * entry by setting this bit.
+ */
+#define ZRAM_ZERO_PAGE_MARK_BIT (1 << 0)
+
static void zram_add_stat(struct zram *zram,
enum zram_stats_index idx, s64 val)
{
@@ -65,37 +72,62 @@ static void zram_dec_stat(struct zram *zram, enum zram_stats_index idx)
zram_add_stat(zram, idx, -1);
}

-static int zram_test_flag(struct zram *zram, u32 index,
- enum zram_pageflags flag)
+static int page_zero_filled(void *ptr)
{
- return zram->table[index].flags & BIT(flag);
+ unsigned int pos;
+ unsigned long *page;
+
+ page = (unsigned long *)ptr;
+
+ for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
+ if (page[pos])
+ return 0;
+ }
+
+ return 1;
}

-static void zram_set_flag(struct zram *zram, u32 index,
- enum zram_pageflags flag)
+static int zram_is_zero_page(struct zram *zram, u32 index)
{
- zram->table[index].flags |= BIT(flag);
+ phys_addr_t addr = zram->table[index].addr;
+
+ return addr & ZRAM_ZERO_PAGE_MARK_BIT;
}

-static void zram_clear_flag(struct zram *zram, u32 index,
- enum zram_pageflags flag)
+static void zram_set_zero_page(struct zram *zram, u32 index)
{
- zram->table[index].flags &= ~BIT(flag);
+ zram->table[index].addr |= ZRAM_ZERO_PAGE_MARK_BIT;
}

-static int page_zero_filled(void *ptr)
+static void zram_clear_zero_page(struct zram *zram, u32 index)
{
- unsigned int pos;
- unsigned long *page;
+ zram->table[index].addr &= ~ZRAM_ZERO_PAGE_MARK_BIT;
+}

- page = (unsigned long *)ptr;
+static void zram_find_obj(struct zram *zram, u32 index, struct page **page,
+ u32 *offset)
+{
+ phys_addr_t addr = zram->table[index].addr;

- for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
- if (page[pos])
- return 0;
+ if (!addr) {
+ *page = NULL;
+ *offset = 0;
+ return;
}

- return 1;
+ *page = pfn_to_page(addr >> PAGE_SHIFT);
+ *offset = addr & ~PAGE_MASK;
+}
+
+static void zram_insert_obj(struct zram *zram, u32 index, struct page *page,
+ u32 offset)
+{
+ phys_addr_t addr;
+
+ addr = page_to_pfn(page) << PAGE_SHIFT;
+ addr |= (offset & ~PAGE_MASK);
+
+ zram->table[index].addr = addr;
}

static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
@@ -129,44 +161,44 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)

static void zram_free_page(struct zram *zram, size_t index)
{
- int clen;
+ int zlen;
void *obj;
+ u32 offset;
+ struct page *page;

- struct page *page = zram->table[index].page;
- u32 offset = zram->table[index].offset;
-
- if (unlikely(!page)) {
- /*
- * No memory is allocated for zero filled pages.
- * Simply clear zero page flag.
- */
- if (zram_test_flag(zram, index, ZRAM_ZERO)) {
- zram_clear_flag(zram, index, ZRAM_ZERO);
- zram_dec_stat(zram, ZRAM_STAT_PAGES_ZERO);
- }
+ /*
+ * No memory is allocated for zero filled pages.
+ * Simply clear corresponding table entry.
+ */
+ if (zram_is_zero_page(zram, index)) {
+ zram_clear_zero_page(zram, index);
+ zram_dec_stat(zram, ZRAM_STAT_PAGES_ZERO);
return;
}

- if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED))) {
- clen = PAGE_SIZE;
+ zram_find_obj(zram, index, &page, &offset);
+ if (!page)
+ return;
+
+ /* Uncompressed pages cosume whole page, so offset is zero */
+ if (unlikely(!offset)) {
+ zlen = PAGE_SIZE;
__free_page(page);
- zram_clear_flag(zram, index, ZRAM_UNCOMPRESSED);
zram_dec_stat(zram, ZRAM_STAT_PAGES_EXPAND);
goto out;
}

obj = kmap_atomic(page, KM_USER0) + offset;
- clen = xv_get_object_size(obj) - sizeof(struct zobj_header);
+ zlen = xv_get_object_size(obj);
kunmap_atomic(obj, KM_USER0);

xv_free(zram->mem_pool, page, offset);

out:
- zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, -clen);
+ zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, -zlen);
zram_dec_stat(zram, ZRAM_STAT_PAGES_STORED);

- zram->table[index].page = NULL;
- zram->table[index].offset = 0;
+ zram->table[index].addr = 0;
}

static void handle_zero_page(struct page *page)
@@ -181,24 +213,27 @@ static void handle_zero_page(struct page *page)
}

static void handle_uncompressed_page(struct zram *zram,
- struct page *page, u32 index)
+ struct page *bio_page, u32 index)
{
- unsigned char *user_mem, *cmem;
+ u32 zoffset;
+ struct page *zpage;
+ unsigned char *bio_mem, *zmem;

- user_mem = kmap_atomic(page, KM_USER0);
- cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
- zram->table[index].offset;
+ zram_find_obj(zram, index, &zpage, &zoffset);
+ BUG_ON(zoffset);

- memcpy(user_mem, cmem, PAGE_SIZE);
- kunmap_atomic(user_mem, KM_USER0);
- kunmap_atomic(cmem, KM_USER1);
+ bio_mem = kmap_atomic(bio_page, KM_USER0);
+ zmem = kmap_atomic(zpage, KM_USER1);

- flush_dcache_page(page);
+ memcpy(bio_mem, zmem, PAGE_SIZE);
+ kunmap_atomic(bio_mem, KM_USER0);
+ kunmap_atomic(zmem, KM_USER1);
+
+ flush_dcache_page(bio_page);
}

static int zram_read(struct zram *zram, struct bio *bio)
{
-
int i;
u32 index;
struct bio_vec *bvec;
@@ -214,54 +249,54 @@ static int zram_read(struct zram *zram, struct bio *bio)

bio_for_each_segment(bvec, bio, i) {
int ret;
- size_t clen;
- struct page *page;
- struct zobj_header *zheader;
- unsigned char *user_mem, *cmem;
+ size_t zlen;
+ u32 zoffset;
+ struct page *bio_page, *zpage;
+ unsigned char *bio_mem, *zmem;

- page = bvec->bv_page;
+ bio_page = bvec->bv_page;

- if (zram_test_flag(zram, index, ZRAM_ZERO)) {
- handle_zero_page(page);
+ if (zram_is_zero_page(zram, index)) {
+ handle_zero_page(bio_page);
continue;
}

+ zram_find_obj(zram, index, &zpage, &zoffset);
+
/* Requested page is not present in compressed area */
- if (unlikely(!zram->table[index].page)) {
- pr_debug("Read before write: sector=%lu, size=%u",
+ if (unlikely(!zpage)) {
+ pr_debug("Read before write on swap device: "
+ "sector=%lu, size=%u",
(ulong)(bio->bi_sector), bio->bi_size);
/* Do nothing */
continue;
}

/* Page is stored uncompressed since it's incompressible */
- if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED))) {
- handle_uncompressed_page(zram, page, index);
+ if (unlikely(!zoffset)) {
+ handle_uncompressed_page(zram, bio_page, index);
continue;
}

- user_mem = kmap_atomic(page, KM_USER0);
- clen = PAGE_SIZE;
+ bio_mem = kmap_atomic(bio_page, KM_USER0);
+ zlen = PAGE_SIZE;

- cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
- zram->table[index].offset;
+ zmem = kmap_atomic(zpage, KM_USER1) + zoffset;

- ret = lzo1x_decompress_safe(
- cmem + sizeof(*zheader),
- xv_get_object_size(cmem) - sizeof(*zheader),
- user_mem, &clen);
+ ret = lzo1x_decompress_safe(zmem, xv_get_object_size(zmem),
+ bio_mem, &zlen);

- kunmap_atomic(user_mem, KM_USER0);
- kunmap_atomic(cmem, KM_USER1);
+ kunmap_atomic(bio_mem, KM_USER0);
+ kunmap_atomic(zmem, KM_USER1);

- /* Should NEVER happen. Return bio error if it does. */
+ /* This should NEVER happen - return bio error if it does! */
if (unlikely(ret != LZO_E_OK)) {
pr_err("Decompression failed! err=%d, page=%u\n",
ret, index);
goto out;
}

- flush_dcache_page(page);
+ flush_dcache_page(bio_page);
index++;
}

@@ -290,22 +325,19 @@ static int zram_write(struct zram *zram, struct bio *bio)
index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;

bio_for_each_segment(bvec, bio, i) {
- u32 offset;
- size_t clen;
- struct zobj_header *zheader;
- struct page *page, *page_store;
+ size_t zlen;
+ u32 zoffset;
+ struct page *bio_page, *zpage;
unsigned char *zbuffer, *zworkmem;
- unsigned char *user_mem, *cmem, *src;
+ unsigned char *bio_mem, *zmem, *src;

- page = bvec->bv_page;
+ bio_page = bvec->bv_page;

/*
* System overwrites unused sectors. Free memory associated
- * with this sector now.
+ * with this sector now (if used).
*/
- if (zram->table[index].page ||
- zram_test_flag(zram, index, ZRAM_ZERO))
- zram_free_page(zram, index);
+ zram_free_page(zram, index);

preempt_disable();
zbuffer = __get_cpu_var(compress_buffer);
@@ -316,19 +348,19 @@ static int zram_write(struct zram *zram, struct bio *bio)
}

src = zbuffer;
- user_mem = kmap_atomic(page, KM_USER0);
- if (page_zero_filled(user_mem)) {
- kunmap_atomic(user_mem, KM_USER0);
+ bio_mem = kmap_atomic(bio_page, KM_USER0);
+ if (page_zero_filled(bio_mem)) {
+ kunmap_atomic(bio_mem, KM_USER0);
preempt_enable();
zram_inc_stat(zram, ZRAM_STAT_PAGES_ZERO);
- zram_set_flag(zram, index, ZRAM_ZERO);
+ zram_set_zero_page(zram, index);
continue;
}

- ret = lzo1x_1_compress(user_mem, PAGE_SIZE, src, &clen,
+ ret = lzo1x_1_compress(bio_mem, PAGE_SIZE, src, &zlen,
zworkmem);

- kunmap_atomic(user_mem, KM_USER0);
+ kunmap_atomic(bio_mem, KM_USER0);

if (unlikely(ret != LZO_E_OK)) {
preempt_enable();
@@ -337,50 +369,45 @@ static int zram_write(struct zram *zram, struct bio *bio)
}

/* Page is incompressible. Store it as-is (uncompressed) */
- if (unlikely(clen > max_zpage_size)) {
- clen = PAGE_SIZE;
- page_store = alloc_page(GFP_NOWAIT | __GFP_HIGHMEM);
- if (unlikely(!page_store)) {
+ if (unlikely(zlen > max_zpage_size)) {
+ zlen = PAGE_SIZE;
+ zpage = alloc_page(GFP_NOWAIT | __GFP_HIGHMEM);
+ if (unlikely(!zpage)) {
preempt_enable();
pr_info("Error allocating memory for "
"incompressible page: %u\n", index);
goto out;
}

- offset = 0;
- zram_set_flag(zram, index, ZRAM_UNCOMPRESSED);
+ zoffset = 0;
zram_inc_stat(zram, ZRAM_STAT_PAGES_EXPAND);
- zram->table[index].page = page_store;
- src = kmap_atomic(page, KM_USER0);
+ src = kmap_atomic(zpage, KM_USER0);
goto memstore;
}

- if (xv_malloc(zram->mem_pool, clen + sizeof(*zheader),
- &zram->table[index].page, &offset,
+ if (xv_malloc(zram->mem_pool, zlen, &zpage, &zoffset,
GFP_NOWAIT | __GFP_HIGHMEM)) {
preempt_enable();
pr_info("Error allocating memory for compressed "
- "page: %u, size=%zu\n", index, clen);
+ "page: %u, size=%zu\n", index, zlen);
goto out;
}

memstore:
- zram->table[index].offset = offset;
+ zmem = kmap_atomic(zpage, KM_USER1) + zoffset;

- cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
- zram->table[index].offset;
-
- memcpy(cmem, src, clen);
- kunmap_atomic(cmem, KM_USER1);
+ memcpy(zmem, src, zlen);
+ kunmap_atomic(zmem, KM_USER1);
preempt_enable();

- if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)))
+ if (unlikely(!zoffset))
kunmap_atomic(src, KM_USER0);

/* Update stats */
- zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, clen);
+ zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, zlen);
zram_inc_stat(zram, ZRAM_STAT_PAGES_STORED);

+ zram_insert_obj(zram, index, zpage, zoffset);
index++;
}

@@ -445,21 +472,8 @@ void zram_reset_device(struct zram *zram)
zram->init_done = 0;

/* Free all pages that are still in this zram device */
- for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
- struct page *page;
- u16 offset;
-
- page = zram->table[index].page;
- offset = zram->table[index].offset;
-
- if (!page)
- continue;
-
- if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)))
- __free_page(page);
- else
- xv_free(zram->mem_pool, page, offset);
- }
+ for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++)
+ zram_free_page(zram, index);

vfree(zram->table);
zram->table = NULL;
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 21c97f6..65e512d 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -62,26 +62,12 @@ static const unsigned max_zpage_size = PAGE_SIZE / 4 * 3;
#define SECTORS_PER_PAGE_SHIFT (PAGE_SHIFT - SECTOR_SHIFT)
#define SECTORS_PER_PAGE (1 << SECTORS_PER_PAGE_SHIFT)

-/* Flags for zram pages (table[page_no].flags) */
-enum zram_pageflags {
- /* Page is stored uncompressed */
- ZRAM_UNCOMPRESSED,
-
- /* Page consists entirely of zeros */
- ZRAM_ZERO,
-
- __NR_ZRAM_PAGEFLAGS,
-};
-
-/*-- Data structures */
-
-/* Allocated for each disk page */
+/*
+ * Maintains swap slot to compressed object mapping.
+ */
struct table {
- struct page *page;
- u16 offset;
- u8 count; /* object ref count (not yet used) */
- u8 flags;
-} __attribute__((aligned(4)));
+ phys_addr_t addr; /* location of [compressed] object */
+};

enum zram_stats_index {
ZRAM_STAT_COMPR_SIZE, /* compressed size of pages stored */
--
1.7.2.1

2010-08-09 17:27:24

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 06/10] Block discard support

The 'discard' bio discard request provides information to
zram disks regarding blocks which are no longer in use by
filesystem. This allows freeing memory allocated for such
blocks.

When zram devices are used as swap disks, we already have
a callback (block_device_operations->swap_slot_free_notify).
So, the discard support is useful only when used as generic
(non-swap) disk.

Signed-off-by: Nitin Gupta <[email protected]>
---
drivers/staging/zram/zram_drv.c | 25 +++++++++++++++++++++++++
drivers/staging/zram/zram_sysfs.c | 11 +++++++++++
2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index efe9c93..0f9785f 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -420,6 +420,20 @@ out:
return 0;
}

+static void zram_discard(struct zram *zram, struct bio *bio)
+{
+ size_t bytes = bio->bi_size;
+ sector_t sector = bio->bi_sector;
+
+ while (bytes >= PAGE_SIZE) {
+ zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
+ sector += PAGE_SIZE >> SECTOR_SHIFT;
+ bytes -= PAGE_SIZE;
+ }
+
+ bio_endio(bio, 0);
+}
+
/*
* Check if request is within bounds and page aligned.
*/
@@ -451,6 +465,12 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
return 0;
}

+ if (unlikely(bio_rw_flagged(bio, BIO_RW_DISCARD))) {
+ zram_inc_stat(zram, ZRAM_STAT_DISCARD);
+ zram_discard(zram, bio);
+ return 0;
+ }
+
switch (bio_data_dir(bio)) {
case READ:
ret = zram_read(zram, bio);
@@ -606,6 +626,11 @@ static int create_device(struct zram *zram, int device_id)
blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);

+ zram->disk->queue->limits.discard_granularity = PAGE_SIZE;
+ zram->disk->queue->limits.max_discard_sectors = UINT_MAX;
+ zram->disk->queue->limits.discard_zeroes_data = 1;
+ queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->queue);
+
add_disk(zram->disk);

#ifdef CONFIG_SYSFS
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index 43bcdd4..74971c0 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -165,6 +165,15 @@ static ssize_t notify_free_show(struct device *dev,
zram_get_stat(zram, ZRAM_STAT_NOTIFY_FREE));
}

+static ssize_t discard_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return sprintf(buf, "%llu\n",
+ zram_get_stat(zram, ZRAM_STAT_DISCARD));
+}
+
static ssize_t zero_pages_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -215,6 +224,7 @@ static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
static DEVICE_ATTR(invalid_io, S_IRUGO, invalid_io_show, NULL);
static DEVICE_ATTR(notify_free, S_IRUGO, notify_free_show, NULL);
+static DEVICE_ATTR(discard, S_IRUGO, discard_show, NULL);
static DEVICE_ATTR(zero_pages, S_IRUGO, zero_pages_show, NULL);
static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
static DEVICE_ATTR(compr_data_size, S_IRUGO, compr_data_size_show, NULL);
@@ -228,6 +238,7 @@ static struct attribute *zram_disk_attrs[] = {
&dev_attr_num_writes.attr,
&dev_attr_invalid_io.attr,
&dev_attr_notify_free.attr,
+ &dev_attr_discard.attr,
&dev_attr_zero_pages.attr,
&dev_attr_orig_data_size.attr,
&dev_attr_compr_data_size.attr,
--
1.7.2.1

2010-08-09 17:27:33

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 07/10] Increase compressed page size threshold

Compression takes much more time than decompression. So, its quite
wasteful in terms of both CPU cycles and memory usage to have a very
low compressed page size threshold and thereby storing such not-so-well
compressible pages as-is (uncompressed). So, increasing it from
PAGE_SIZE/2 to PAGE_SIZE/8*7. A low threshold was useful when we had
"backing swap" support where we could forward such pages to the backing
device (applicable only when zram was used as swap disk).

It is not yet configurable through sysfs but may be exported in future,
along with threshold for average compression ratio.

Signed-off-by: Nitin Gupta <[email protected]>
---
drivers/staging/zram/zram_drv.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 65e512d..bcc51ea 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -47,7 +47,7 @@ static const unsigned default_disksize_perc_ram = 25;
* Pages that compress to size greater than this are stored
* uncompressed in memory.
*/
-static const unsigned max_zpage_size = PAGE_SIZE / 4 * 3;
+static const unsigned max_zpage_size = PAGE_SIZE / 8 * 7;

/*
* NOTE: max_zpage_size must be less than or equal to:
--
1.7.2.1

2010-08-09 17:27:37

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 08/10] Some cleanups

- xvmalloc: Remove unnecessary stat_{inc,dec} and increment
pages_stored directly
- xvmalloc: Initialize pointers with NULL instead of 0
- zram: Remove verbose message when use sets insane disksize
- zram: Mark some messages as pr_debug
- zram: Refine some comments

Signed-off-by: Nitin Gupta <[email protected]>
---
drivers/staging/zram/xvmalloc.c | 22 ++++-------------
drivers/staging/zram/zram_drv.c | 49 +++++++++++++-------------------------
drivers/staging/zram/zram_drv.h | 26 ++++----------------
3 files changed, 28 insertions(+), 69 deletions(-)

diff --git a/drivers/staging/zram/xvmalloc.c b/drivers/staging/zram/xvmalloc.c
index 3fdbb8a..6268f65 100644
--- a/drivers/staging/zram/xvmalloc.c
+++ b/drivers/staging/zram/xvmalloc.c
@@ -20,16 +20,6 @@
#include "xvmalloc.h"
#include "xvmalloc_int.h"

-static void stat_inc(u64 *value)
-{
- *value = *value + 1;
-}
-
-static void stat_dec(u64 *value)
-{
- *value = *value - 1;
-}
-
static int test_flag(struct block_header *block, enum blockflags flag)
{
return block->prev & BIT(flag);
@@ -187,7 +177,7 @@ static void insert_block(struct xv_pool *pool, struct page *page, u32 offset,
slindex = get_index_for_insert(block->size);
flindex = slindex / BITS_PER_LONG;

- block->link.prev_page = 0;
+ block->link.prev_page = NULL;
block->link.prev_offset = 0;
block->link.next_page = pool->freelist[slindex].page;
block->link.next_offset = pool->freelist[slindex].offset;
@@ -217,7 +207,7 @@ static void remove_block_head(struct xv_pool *pool,

pool->freelist[slindex].page = block->link.next_page;
pool->freelist[slindex].offset = block->link.next_offset;
- block->link.prev_page = 0;
+ block->link.prev_page = NULL;
block->link.prev_offset = 0;

if (!pool->freelist[slindex].page) {
@@ -232,7 +222,7 @@ static void remove_block_head(struct xv_pool *pool,
*/
tmpblock = get_ptr_atomic(pool->freelist[slindex].page,
pool->freelist[slindex].offset, KM_USER1);
- tmpblock->link.prev_page = 0;
+ tmpblock->link.prev_page = NULL;
tmpblock->link.prev_offset = 0;
put_ptr_atomic(tmpblock, KM_USER1);
}
@@ -284,7 +274,7 @@ static int grow_pool(struct xv_pool *pool, gfp_t flags)
if (unlikely(!page))
return -ENOMEM;

- stat_inc(&pool->total_pages);
+ pool->total_pages++;

spin_lock(&pool->lock);
block = get_ptr_atomic(page, 0, KM_USER0);
@@ -361,8 +351,6 @@ int xv_malloc(struct xv_pool *pool, u32 size, struct page **page,

if (!*page) {
spin_unlock(&pool->lock);
- if (flags & GFP_NOWAIT)
- return -ENOMEM;
error = grow_pool(pool, flags);
if (unlikely(error))
return error;
@@ -472,7 +460,7 @@ void xv_free(struct xv_pool *pool, struct page *page, u32 offset)
spin_unlock(&pool->lock);

__free_page(page);
- stat_dec(&pool->total_pages);
+ pool->total_pages--;
return;
}

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 0f9785f..42aa271 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -9,7 +9,7 @@
* Released under the terms of 3-clause BSD License
* Released under the terms of GNU General Public License Version 2.0
*
- * Project home: http://compcache.googlecode.com
+ * Project home: http://compcache.googlecode.com/
*/

#define KMSG_COMPONENT "zram"
@@ -130,33 +130,15 @@ static void zram_insert_obj(struct zram *zram, u32 index, struct page *page,
zram->table[index].addr = addr;
}

-static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
+static u64 zram_default_disksize(void)
{
- if (!zram->disksize) {
- pr_info(
- "disk size not provided. You can use disksize_kb module "
- "param to specify size.\nUsing default: (%u%% of RAM).\n",
- default_disksize_perc_ram
- );
- zram->disksize = default_disksize_perc_ram *
- (totalram_bytes / 100);
- }
+ u64 disksize;

- if (zram->disksize > 2 * (totalram_bytes)) {
- pr_info(
- "There is little point creating a zram of greater than "
- "twice the size of memory since we expect a 2:1 compression "
- "ratio. Note that zram uses about 0.1%% of the size of "
- "the disk when not in use so a huge zram is "
- "wasteful.\n"
- "\tMemory Size: %zu kB\n"
- "\tSize you selected: %llu kB\n"
- "Continuing anyway ...\n",
- totalram_bytes >> 10, zram->disksize
- );
- }
+ disksize = default_disksize_perc_ram *
+ (totalram_pages / 100);
+ disksize = (disksize << PAGE_SHIFT) & PAGE_MASK;

- zram->disksize &= PAGE_MASK;
+ return disksize;
}

static void zram_free_page(struct zram *zram, size_t index)
@@ -459,7 +441,7 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
int ret = 0;
struct zram *zram = queue->queuedata;

- if (!valid_io_request(zram, bio)) {
+ if (unlikely(!valid_io_request(zram, bio))) {
zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
bio_io_error(bio);
return 0;
@@ -504,7 +486,7 @@ void zram_reset_device(struct zram *zram)
/* Reset stats */
memset(&zram->stats, 0, sizeof(zram->stats));

- zram->disksize = 0;
+ zram->disksize = zram_default_disksize();
mutex_unlock(&zram->init_lock);
}

@@ -520,7 +502,8 @@ int zram_init_device(struct zram *zram)
return 0;
}

- zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
+ if (!zram->disksize)
+ zram->disksize = zram_default_disksize();

num_pages = zram->disksize >> PAGE_SHIFT;
zram->table = vmalloc(num_pages * sizeof(*zram->table));
@@ -566,7 +549,8 @@ fail:
return ret;
}

-void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
+static void zram_slot_free_notify(struct block_device *bdev,
+ unsigned long index)
{
struct zram *zram;

@@ -614,8 +598,9 @@ static int create_device(struct zram *zram, int device_id)
zram->disk->private_data = zram;
snprintf(zram->disk->disk_name, 16, "zram%d", device_id);

- /* Actual capacity set using syfs (/sys/block/zram<id>/disksize */
- set_capacity(zram->disk, 0);
+ /* Custom size can be set using syfs (/sys/block/zram<id>/disksize) */
+ zram->disksize = zram_default_disksize();
+ set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);

/*
* To ensure that we always get PAGE_SIZE aligned
@@ -739,7 +724,7 @@ static int __init zram_init(void)
}

/* Allocate the device array and initialize each one */
- pr_info("Creating %u devices ...\n", num_devices);
+ pr_debug("Creating %u devices ...\n", num_devices);
devices = kzalloc(num_devices * sizeof(struct zram), GFP_KERNEL);
if (!devices) {
ret = -ENOMEM;
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index bcc51ea..2cf36db 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -9,7 +9,7 @@
* Released under the terms of 3-clause BSD License
* Released under the terms of GNU General Public License Version 2.0
*
- * Project home: http://compcache.googlecode.com
+ * Project home: http://compcache.googlecode.com/
*/

#ifndef _ZRAM_DRV_H_
@@ -26,18 +26,6 @@
*/
static const unsigned max_num_devices = 32;

-/*
- * Stored at beginning of each compressed object.
- *
- * It stores back-reference to table entry which points to this
- * object. This is required to support memory defragmentation.
- */
-struct zobj_header {
-#if 0
- u32 table_idx;
-#endif
-};
-
/*-- Configurable parameters */

/* Default zram disk size: 25% of total RAM */
@@ -50,8 +38,7 @@ static const unsigned default_disksize_perc_ram = 25;
static const unsigned max_zpage_size = PAGE_SIZE / 8 * 7;

/*
- * NOTE: max_zpage_size must be less than or equal to:
- * XV_MAX_ALLOC_SIZE - sizeof(struct zobj_header)
+ * NOTE: max_zpage_size must be less than or equal to XV_MAX_ALLOC_SIZE
* otherwise, xv_malloc() would always return failure.
*/

@@ -92,16 +79,15 @@ struct zram {
struct table *table;
struct request_queue *queue;
struct gendisk *disk;
- int init_done;
+ unsigned int init_done;
/* Prevent concurrent execution of device init and reset */
struct mutex init_lock;
/*
* This is the limit on amount of *uncompressed* worth of data
- * we can store in a disk.
+ * we can store in a disk (in bytes).
*/
- u64 disksize; /* bytes */
-
- struct zram_stats_cpu *stats;
+ u64 disksize;
+ struct zram_stats_cpu *stats; /* percpu stats */
};

extern struct zram *devices;
--
1.7.2.1

2010-08-09 17:27:50

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 10/10] Document sysfs entries

Signed-off-by: Nitin Gupta <[email protected]>
---
Documentation/ABI/testing/sysfs-block-zram | 99 ++++++++++++++++++++++++++++
1 files changed, 99 insertions(+), 0 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-block-zram

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
new file mode 100644
index 0000000..c8b3b48
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -0,0 +1,99 @@
+What: /sys/block/zram<id>/disksize
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The disksize file is read-write and specifies the disk size
+ which represents the limit on the *uncompressed* worth of data
+ that can be stored in this disk.
+
+What: /sys/block/zram<id>/initstate
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The disksize file is read-only and shows the initialization
+ state of the device.
+
+What: /sys/block/zram<id>/reset
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The disksize file is write-only and allows resetting the
+ device. The reset operation frees all the memory assocaited
+ with this device.
+
+What: /sys/block/zram<id>/num_reads
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The num_reads file is read-only and specifies the number of
+ reads (failed or successful) done on this device.
+
+What: /sys/block/zram<id>/num_writes
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The num_writes file is read-only and specifies the number of
+ writes (failed or successful) done on this device.
+
+What: /sys/block/zram<id>/invalid_io
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The invalid_io file is read-only and specifies the number of
+ non-page-size-aligned I/O requests issued to this device.
+
+What: /sys/block/zram<id>/notify_free
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The notify_free file is read-only and specifies the number of
+ swap slot free notifications received by this device. These
+ notifications are send to a swap block device when a swap slot
+ is freed. This statistic is applicable only when this disk is
+ being used as a swap disk.
+
+What: /sys/block/zram<id>/discard
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The discard file is read-only and specifies the number of
+ discard requests received by this device. These requests
+ provide information to block device regarding blocks which are
+ no longer used by filesystem.
+
+What: /sys/block/zram<id>/zero_pages
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The zero_pages file is read-only and specifies number of zero
+ filled pages written to this disk. No memory is allocated for
+ such pages.
+
+What: /sys/block/zram<id>/orig_data_size
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The orig_data_size file is read-only and specifies uncompressed
+ size of data stored in this disk. This excludes zero-filled
+ pages (zero_pages) since no memory is allocated for them.
+ Unit: bytes
+
+What: /sys/block/zram<id>/compr_data_size
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The compr_data_size file is read-only and specifies compressed
+ size of data stored in this disk. So, compression ratio can be
+ calculated using orig_data_size and this statistic.
+ Unit: bytes
+
+What: /sys/block/zram<id>/mem_used_total
+Date: August 2010
+Contact: Nitin Gupta <[email protected]>
+Description:
+ The mem_used_total file is read-only and specifies the amount
+ of memory, including allocator fragmentation and metadata
+ overhead, allocated for this disk. So, allocator space
+ efficiency can be calculated using compr_data_size and this
+ statistic.
+ Unit: bytes
\ No newline at end of file
--
1.7.2.1

2010-08-09 17:27:52

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 09/10] Update zram documentation

Update zram documentation to reflect transition form
ioctl to sysfs interface.

Signed-off-by: Nitin Gupta <[email protected]>
---
drivers/staging/zram/zram.txt | 58 +++++++++++++++++++++++++---------------
1 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/drivers/staging/zram/zram.txt b/drivers/staging/zram/zram.txt
index 520edc1..5f75d29 100644
--- a/drivers/staging/zram/zram.txt
+++ b/drivers/staging/zram/zram.txt
@@ -5,33 +5,35 @@ Project home: http://compcache.googlecode.com/

* Introduction

-The zram module creates RAM based block devices: /dev/ramX (X = 0, 1, ...).
-Pages written to these disks are compressed and stored in memory itself.
-These disks allow very fast I/O and compression provides good amounts of
-memory savings.
+The zram module creates RAM based block devices named /dev/zram<id>
+(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
+in memory itself. These disks allow very fast I/O and compression provides
+good amounts of memory savings. Some of the usecases include /tmp storage,
+use as swap disks, various caches under /var and maybe many more :)

-See project home for use cases, performance numbers and a lot more.
-
-Individual zram devices are configured and initialized using zramconfig
-userspace utility as shown in examples below. See zramconfig man page for
-more details.
+Statistics for individual zram devices are exported through sysfs nodes at
+/sys/block/zram<id>/

* Usage

Following shows a typical sequence of steps for using zram.

-1) Load Modules:
+1) Load Module:
modprobe zram num_devices=4
- This creates 4 (uninitialized) devices: /dev/zram{0,1,2,3}
+ This creates 4 devices: /dev/zram{0,1,2,3}
(num_devices parameter is optional. Default: 1)

-2) Initialize:
- Use zramconfig utility to configure and initialize individual
- zram devices. For example:
- zramconfig /dev/zram0 --init # uses default value of disksize_kb
- zramconfig /dev/zram1 --disksize_kb=102400 # 100MB /dev/zram1
+2) Set Disksize (Optional):
+ Set disk size by writing the value to sysfs node 'disksize'
+ (in bytes). If disksize is not given, default value of 25%
+ of RAM is used.
+
+ # Initialize /dev/zram0 with 50MB disksize
+ echo $((50*1024*1024)) > /sys/block/zram0/disksize

- *See zramconfig man page for more details and examples*
+ NOTE: disksize cannot be changed if the disk contains any
+ data. So, for such a disk, you need to issue 'reset' (see below)
+ before you can change its disksize.

3) Activate:
mkswap /dev/zram0
@@ -41,17 +43,29 @@ Following shows a typical sequence of steps for using zram.
mount /dev/zram1 /tmp

4) Stats:
- zramconfig /dev/zram0 --stats
- zramconfig /dev/zram1 --stats
+ Per-device statistics are exported as various nodes under
+ /sys/block/zram<id>/
+ disksize
+ num_reads
+ num_writes
+ invalid_io
+ notify_free
+ discard
+ zero_pages
+ orig_data_size
+ compr_data_size
+ mem_used_total

5) Deactivate:
swapoff /dev/zram0
umount /dev/zram1

6) Reset:
- zramconfig /dev/zram0 --reset
- zramconfig /dev/zram1 --reset
- (This frees memory allocated for the given device).
+ Write any positive value to 'reset' sysfs node
+ echo 1 > /sys/block/zram0/reset
+ echo 1 > /sys/block/zram1/reset
+
+ (This frees all the memory allocated for the given device).


Please report any problems at:
--
1.7.2.1

2010-08-09 17:28:41

by Nitin Gupta

[permalink] [raw]
Subject: [PATCH 04/10] Use percpu buffers

This also removes the (mutex) lock which was used to
protect these buffers.

Tests with fio on dual core showed improvement of about
20% for write speed.

fio job description:

[global]
bs=4k
ioengine=libaio
iodepth=8
size=1g
direct=1
runtime=60
directory=/mnt/zdisk
verify=meta
verify=0x1234567890abcdef
numjobs=2

[seq-write]
rw=write

Speedup was negligible with iodepth of 4 or less. Maybe
gains will be more visible with higher number of cores.

Results with above job file:
Before:
WRITE: io=2,048MB, aggrb=70,725KB/s
WRITE: io=2,048MB, aggrb=71,938KB/s
WRITE: io=2,048MB, aggrb=70,461KB/s

After:
WRITE: io=2,048MB, aggrb=86,691KB/s
WRITE: io=2,048MB, aggrb=87,025KB/s
WRITE: io=2,048MB, aggrb=88,700KB/s

Speedup: ~20%

Read performance remained almost the same since the read
path was lockless earlier also (LZO decompressor does not
require any additional buffer).

Signed-off-by: Nitin Gupta <[email protected]>
---
drivers/staging/zram/zram_drv.c | 136 +++++++++++++++++++++++++--------------
drivers/staging/zram/zram_drv.h | 4 -
2 files changed, 88 insertions(+), 52 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index f111b86..c16e09a 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -17,13 +17,14 @@

#include <linux/module.h>
#include <linux/kernel.h>
-#include <linux/bio.h>
#include <linux/bitops.h>
#include <linux/blkdev.h>
#include <linux/buffer_head.h>
+#include <linux/cpu.h>
#include <linux/device.h>
#include <linux/genhd.h>
#include <linux/highmem.h>
+#include <linux/notifier.h>
#include <linux/slab.h>
#include <linux/lzo.h>
#include <linux/string.h>
@@ -31,6 +32,9 @@

#include "zram_drv.h"

+static DEFINE_PER_CPU(unsigned char *, compress_buffer);
+static DEFINE_PER_CPU(unsigned char *, compress_workmem);
+
/* Globals */
static int zram_major;
struct zram *devices;
@@ -290,10 +294,10 @@ static int zram_write(struct zram *zram, struct bio *bio)
size_t clen;
struct zobj_header *zheader;
struct page *page, *page_store;
+ unsigned char *zbuffer, *zworkmem;
unsigned char *user_mem, *cmem, *src;

page = bvec->bv_page;
- src = zram->compress_buffer;

/*
* System overwrites unused sectors. Free memory associated
@@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
zram_test_flag(zram, index, ZRAM_ZERO))
zram_free_page(zram, index);

- mutex_lock(&zram->lock);
+ preempt_disable();
+ zbuffer = __get_cpu_var(compress_buffer);
+ zworkmem = __get_cpu_var(compress_workmem);
+ if (unlikely(!zbuffer || !zworkmem)) {
+ preempt_enable();
+ goto out;
+ }

+ src = zbuffer;
user_mem = kmap_atomic(page, KM_USER0);
if (page_zero_filled(user_mem)) {
kunmap_atomic(user_mem, KM_USER0);
- mutex_unlock(&zram->lock);
+ preempt_enable();
zram_inc_stat(zram, ZRAM_STAT_PAGES_ZERO);
zram_set_flag(zram, index, ZRAM_ZERO);
continue;
}

ret = lzo1x_1_compress(user_mem, PAGE_SIZE, src, &clen,
- zram->compress_workmem);
+ zworkmem);

kunmap_atomic(user_mem, KM_USER0);

if (unlikely(ret != LZO_E_OK)) {
- mutex_unlock(&zram->lock);
+ preempt_enable();
pr_err("Compression failed! err=%d\n", ret);
goto out;
}

- /*
- * Page is incompressible. Store it as-is (uncompressed)
- * since we do not want to return too many disk write
- * errors which has side effect of hanging the system.
- */
+ /* Page is incompressible. Store it as-is (uncompressed) */
if (unlikely(clen > max_zpage_size)) {
clen = PAGE_SIZE;
- page_store = alloc_page(GFP_NOIO | __GFP_HIGHMEM);
+ page_store = alloc_page(GFP_NOWAIT | __GFP_HIGHMEM);
if (unlikely(!page_store)) {
- mutex_unlock(&zram->lock);
+ preempt_enable();
pr_info("Error allocating memory for "
"incompressible page: %u\n", index);
goto out;
@@ -350,8 +357,8 @@ static int zram_write(struct zram *zram, struct bio *bio)

if (xv_malloc(zram->mem_pool, clen + sizeof(*zheader),
&zram->table[index].page, &offset,
- GFP_NOIO | __GFP_HIGHMEM)) {
- mutex_unlock(&zram->lock);
+ GFP_NOWAIT | __GFP_HIGHMEM)) {
+ preempt_enable();
pr_info("Error allocating memory for compressed "
"page: %u, size=%zu\n", index, clen);
goto out;
@@ -363,18 +370,10 @@ memstore:
cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
zram->table[index].offset;

-#if 0
- /* Back-reference needed for memory defragmentation */
- if (!zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)) {
- zheader = (struct zobj_header *)cmem;
- zheader->table_idx = index;
- cmem += sizeof(*zheader);
- }
-#endif
-
memcpy(cmem, src, clen);
-
kunmap_atomic(cmem, KM_USER1);
+ preempt_enable();
+
if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)))
kunmap_atomic(src, KM_USER0);

@@ -382,7 +381,6 @@ memstore:
zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, clen);
zram_inc_stat(zram, ZRAM_STAT_PAGES_STORED);

- mutex_unlock(&zram->lock);
index++;
}

@@ -446,13 +444,6 @@ void zram_reset_device(struct zram *zram)
mutex_lock(&zram->init_lock);
zram->init_done = 0;

- /* Free various per-device buffers */
- kfree(zram->compress_workmem);
- free_pages((unsigned long)zram->compress_buffer, 1);
-
- zram->compress_workmem = NULL;
- zram->compress_buffer = NULL;
-
/* Free all pages that are still in this zram device */
for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
struct page *page;
@@ -497,20 +488,6 @@ int zram_init_device(struct zram *zram)

zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);

- zram->compress_workmem = kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
- if (!zram->compress_workmem) {
- pr_err("Error allocating compressor working memory!\n");
- ret = -ENOMEM;
- goto fail;
- }
-
- zram->compress_buffer = (void *)__get_free_pages(__GFP_ZERO, 1);
- if (!zram->compress_buffer) {
- pr_err("Error allocating compressor buffer space\n");
- ret = -ENOMEM;
- goto fail;
- }
-
num_pages = zram->disksize >> PAGE_SHIFT;
zram->table = vmalloc(num_pages * sizeof(*zram->table));
if (!zram->table) {
@@ -573,7 +550,6 @@ static int create_device(struct zram *zram, int device_id)
{
int ret = 0;

- mutex_init(&zram->lock);
mutex_init(&zram->init_lock);

zram->queue = blk_alloc_queue(GFP_KERNEL);
@@ -649,9 +625,46 @@ static void destroy_device(struct zram *zram)
blk_cleanup_queue(zram->queue);
}

+/*
+ * Callback for CPU hotplug events. Allocates percpu compression buffers.
+ */
+static int zram_cpu_notify(struct notifier_block *nb, unsigned long action,
+ void *pcpu)
+{
+ int cpu = (long)pcpu;
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ per_cpu(compress_buffer, cpu) = (void *)__get_free_pages(
+ GFP_KERNEL | __GFP_ZERO, 1);
+ per_cpu(compress_workmem, cpu) = kzalloc(
+ LZO1X_MEM_COMPRESS, GFP_KERNEL);
+
+ break;
+ case CPU_DEAD:
+ case CPU_UP_CANCELED:
+ free_pages((unsigned long)(per_cpu(compress_buffer, cpu)), 1);
+ per_cpu(compress_buffer, cpu) = NULL;
+
+ kfree(per_cpu(compress_buffer, cpu));
+ per_cpu(compress_buffer, cpu) = NULL;
+
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block zram_cpu_nb = {
+ .notifier_call = zram_cpu_notify
+};
+
static int __init zram_init(void)
{
int ret, dev_id;
+ unsigned int cpu;

if (num_devices > max_num_devices) {
pr_warning("Invalid value for num_devices: %u\n",
@@ -660,6 +673,20 @@ static int __init zram_init(void)
goto out;
}

+ ret = register_cpu_notifier(&zram_cpu_nb);
+ if (ret)
+ goto out;
+
+ for_each_online_cpu(cpu) {
+ void *pcpu = (void *)(long)cpu;
+ zram_cpu_notify(&zram_cpu_nb, CPU_UP_PREPARE, pcpu);
+ if (!per_cpu(compress_buffer, cpu) ||
+ !per_cpu(compress_workmem, cpu)) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
+
zram_major = register_blkdev(0, "zram");
if (zram_major <= 0) {
pr_warning("Unable to get major number\n");
@@ -694,12 +721,18 @@ free_devices:
unregister:
unregister_blkdev(zram_major, "zram");
out:
+ for_each_online_cpu(cpu) {
+ void *pcpu = (void *)(long)cpu;
+ zram_cpu_notify(&zram_cpu_nb, CPU_UP_CANCELED, pcpu);
+ }
+
return ret;
}

static void __exit zram_exit(void)
{
int i;
+ unsigned int cpu;
struct zram *zram;

for (i = 0; i < num_devices; i++) {
@@ -712,6 +745,13 @@ static void __exit zram_exit(void)

unregister_blkdev(zram_major, "zram");

+ for_each_online_cpu(cpu) {
+ void *pcpu = (void *)(long)cpu;
+ zram_cpu_notify(&zram_cpu_nb, CPU_UP_CANCELED, pcpu);
+ }
+
+ unregister_cpu_notifier(&zram_cpu_nb);
+
kfree(devices);
pr_debug("Cleanup done!\n");
}
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 88fddb4..21c97f6 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -103,11 +103,7 @@ struct zram_stats_cpu {

struct zram {
struct xv_pool *mem_pool;
- void *compress_workmem;
- void *compress_buffer;
struct table *table;
- struct mutex lock; /* protect compression buffers against
- * concurrent writes */
struct request_queue *queue;
struct gendisk *disk;
int init_done;
--
1.7.2.1

2010-08-09 18:32:13

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 07/10] Increase compressed page size threshold

Hi Nitin,

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
> Compression takes much more time than decompression. So, its quite
> wasteful in terms of both CPU cycles and memory usage to have a very
> low compressed page size threshold and thereby storing such not-so-well
> compressible pages as-is (uncompressed). So, increasing it from
> PAGE_SIZE/2 to PAGE_SIZE/8*7. A low threshold was useful when we had
> "backing swap" support where we could forward such pages to the backing
> device (applicable only when zram was used as swap disk).
>
> It is not yet configurable through sysfs but may be exported in future,
> along with threshold for average compression ratio.
>
> Signed-off-by: Nitin Gupta <[email protected]>

The description makes sense but lacks any real data. What kind of
workloads did you test this with? Where does it help most? How much?

Pekka

2010-08-09 18:34:51

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 01/10] Replace ioctls with sysfs interface

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
> Creates per-device sysfs nodes in /sys/block/zram<id>/
> Currently following stats are exported:
> ?- disksize
> ?- num_reads
> ?- num_writes
> ?- invalid_io
> ?- zero_pages
> ?- orig_data_size
> ?- compr_data_size
> ?- mem_used_total
>
> By default, disksize is set to 0. So, to start using
> a zram device, fist write a disksize value and then
> initialize device by writing any positive value to
> initstate. For example:
>
> ? ? ? ?# initialize /dev/zram0 with 50MB disksize
> ? ? ? ?echo 50*1024*1024 | bc > /sys/block/zram0/disksize
> ? ? ? ?echo 1 > /sys/block/zram0/initstate
>
> When done using a disk, issue reset to free its memory
> by writing any positive value to reset node:
>
> ? ? ? ?echo 1 > /sys/block/zram0/reset
>
> This change also obviates the need for 'rzscontrol' utility.
>
> Signed-off-by: Nitin Gupta <[email protected]>

Looks good to me (but I'm not a sysfs guy).

Acked-by: Pekka Enberg <[email protected]>

> ?/* Module params (documentation at end) */
> -static unsigned int num_devices;
> +unsigned int num_devices;
> +
> +static void zram_stat_inc(u32 *v)
> +{
> + ? ? ? *v = *v + 1;
> +}
> +
> +static void zram_stat_dec(u32 *v)
> +{
> + ? ? ? *v = *v - 1;
> +}
> +
> +static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
> +{
> + ? ? ? spin_lock(&zram->stat64_lock);
> + ? ? ? *v = *v + inc;
> + ? ? ? spin_unlock(&zram->stat64_lock);
> +}
> +
> +static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
> +{
> + ? ? ? spin_lock(&zram->stat64_lock);
> + ? ? ? *v = *v - dec;
> + ? ? ? spin_unlock(&zram->stat64_lock);
> +}
> +
> +static void zram_stat64_inc(struct zram *zram, u64 *v)
> +{
> + ? ? ? zram_stat64_add(zram, v, 1);
> +}

These could probably use atomic_inc(), atomic64_inc(), and friends, no?

2010-08-09 18:36:55

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 02/10] Remove need for explicit device initialization

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
> Currently, the user has to explicitly write a positive value to
> initstate sysfs node before the device can be used. This event
> triggers allocation of per-device metadata like memory pool,
> table array and so on.
>
> We do not pre-initialize all zram devices since the 'table' array,
> mapping disk blocks to compressed chunks, takes considerable amount
> of memory (8 bytes per page). So, pre-initializing all devices will
> be quite wasteful if only few or none of the devices are actually
> used.
>
> This explicit device initialization from user is an odd requirement and
> can be easily avoided. We now initialize the device when first write is
> done to the device.
>
> Signed-off-by: Nitin Gupta <[email protected]>

AFAICT, most hardware block device drivers do things like this in the
probe function. Why can't we do that for zram as well and drop the
->init_done and ->init_lock parts?

2010-08-09 18:44:42

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
> Also remove references to removed stats (ex: good_comress).
>
> Signed-off-by: Nitin Gupta <[email protected]>

Acked-by: Pekka Enberg <[email protected]>

2010-08-09 18:57:45

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 04/10] Use percpu buffers

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?zram_test_flag(zram, index, ZRAM_ZERO))
> ? ? ? ? ? ? ? ? ? ? ? ?zram_free_page(zram, index);
>
> - ? ? ? ? ? ? ? mutex_lock(&zram->lock);
> + ? ? ? ? ? ? ? preempt_disable();
> + ? ? ? ? ? ? ? zbuffer = __get_cpu_var(compress_buffer);
> + ? ? ? ? ? ? ? zworkmem = __get_cpu_var(compress_workmem);
> + ? ? ? ? ? ? ? if (unlikely(!zbuffer || !zworkmem)) {
> + ? ? ? ? ? ? ? ? ? ? ? preempt_enable();
> + ? ? ? ? ? ? ? ? ? ? ? goto out;
> + ? ? ? ? ? ? ? }

The per-CPU buffer thing with this preempt_disable() trickery looks
overkill to me. Most block device drivers seem to use mempool_alloc()
for this sort of thing. Is there some reason you can't use that here?

2010-08-09 18:59:38

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 05/10] Reduce per table entry overhead by 4 bytes

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
> Each zram device maintains an array (table) that maps
> index within the device to the location of corresponding
> compressed chunk. Currently we store 'struct page' pointer,
> offset with page and various flags separately which takes
> 12 bytes per table entry. Now all these are encoded in a
> single 'phys_add_t' value which results in savings of 4 bytes
> per entry (except on PAE systems).
>
> Unfortunately, cleanups related to some variable renames
> were mixed in this patch. So, please bear some additional
> noise.

The noise makes this patch pretty difficult to review properly. Care
to spilt the patch into two pieces?

2010-08-09 19:02:11

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 08/10] Some cleanups

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
> ?- xvmalloc: Remove unnecessary stat_{inc,dec} and increment
> ? pages_stored directly
> ?- xvmalloc: Initialize pointers with NULL instead of 0
> ?- zram: Remove verbose message when use sets insane disksize
> ?- zram: Mark some messages as pr_debug
> ?- zram: Refine some comments
>
> Signed-off-by: Nitin Gupta <[email protected]>

Acked-by: Pekka Enberg <[email protected]>

2010-08-09 19:02:52

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 10/10] Document sysfs entries

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
> Signed-off-by: Nitin Gupta <[email protected]>

Acked-by: Pekka Enberg <[email protected]>

2010-08-09 19:03:53

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 06/10] Block discard support

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
> The 'discard' bio discard request provides information to
> zram disks regarding blocks which are no longer in use by
> filesystem. This allows freeing memory allocated for such
> blocks.
>
> When zram devices are used as swap disks, we already have
> a callback (block_device_operations->swap_slot_free_notify).
> So, the discard support is useful only when used as generic
> (non-swap) disk.
>
> Signed-off-by: Nitin Gupta <[email protected]>

Lets CC fsdevel and Jens for this.

> ---
> ?drivers/staging/zram/zram_drv.c ? | ? 25 +++++++++++++++++++++++++
> ?drivers/staging/zram/zram_sysfs.c | ? 11 +++++++++++
> ?2 files changed, 36 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index efe9c93..0f9785f 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -420,6 +420,20 @@ out:
> ? ? ? ?return 0;
> ?}
>
> +static void zram_discard(struct zram *zram, struct bio *bio)
> +{
> + ? ? ? size_t bytes = bio->bi_size;
> + ? ? ? sector_t sector = bio->bi_sector;
> +
> + ? ? ? while (bytes >= PAGE_SIZE) {
> + ? ? ? ? ? ? ? zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
> + ? ? ? ? ? ? ? sector += PAGE_SIZE >> SECTOR_SHIFT;
> + ? ? ? ? ? ? ? bytes -= PAGE_SIZE;
> + ? ? ? }
> +
> + ? ? ? bio_endio(bio, 0);
> +}
> +
> ?/*
> ?* Check if request is within bounds and page aligned.
> ?*/
> @@ -451,6 +465,12 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
> ? ? ? ? ? ? ? ?return 0;
> ? ? ? ?}
>
> + ? ? ? if (unlikely(bio_rw_flagged(bio, BIO_RW_DISCARD))) {
> + ? ? ? ? ? ? ? zram_inc_stat(zram, ZRAM_STAT_DISCARD);
> + ? ? ? ? ? ? ? zram_discard(zram, bio);
> + ? ? ? ? ? ? ? return 0;
> + ? ? ? }
> +
> ? ? ? ?switch (bio_data_dir(bio)) {
> ? ? ? ?case READ:
> ? ? ? ? ? ? ? ?ret = zram_read(zram, bio);
> @@ -606,6 +626,11 @@ static int create_device(struct zram *zram, int device_id)
> ? ? ? ?blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
> ? ? ? ?blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
>
> + ? ? ? zram->disk->queue->limits.discard_granularity = PAGE_SIZE;
> + ? ? ? zram->disk->queue->limits.max_discard_sectors = UINT_MAX;
> + ? ? ? zram->disk->queue->limits.discard_zeroes_data = 1;
> + ? ? ? queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->queue);
> +
> ? ? ? ?add_disk(zram->disk);
>
> ?#ifdef CONFIG_SYSFS
> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
> index 43bcdd4..74971c0 100644
> --- a/drivers/staging/zram/zram_sysfs.c
> +++ b/drivers/staging/zram/zram_sysfs.c
> @@ -165,6 +165,15 @@ static ssize_t notify_free_show(struct device *dev,
> ? ? ? ? ? ? ? ?zram_get_stat(zram, ZRAM_STAT_NOTIFY_FREE));
> ?}
>
> +static ssize_t discard_show(struct device *dev,
> + ? ? ? ? ? ? ? struct device_attribute *attr, char *buf)
> +{
> + ? ? ? struct zram *zram = dev_to_zram(dev);
> +
> + ? ? ? return sprintf(buf, "%llu\n",
> + ? ? ? ? ? ? ? zram_get_stat(zram, ZRAM_STAT_DISCARD));
> +}
> +
> ?static ssize_t zero_pages_show(struct device *dev,
> ? ? ? ? ? ? ? ?struct device_attribute *attr, char *buf)
> ?{
> @@ -215,6 +224,7 @@ static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
> ?static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
> ?static DEVICE_ATTR(invalid_io, S_IRUGO, invalid_io_show, NULL);
> ?static DEVICE_ATTR(notify_free, S_IRUGO, notify_free_show, NULL);
> +static DEVICE_ATTR(discard, S_IRUGO, discard_show, NULL);
> ?static DEVICE_ATTR(zero_pages, S_IRUGO, zero_pages_show, NULL);
> ?static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
> ?static DEVICE_ATTR(compr_data_size, S_IRUGO, compr_data_size_show, NULL);
> @@ -228,6 +238,7 @@ static struct attribute *zram_disk_attrs[] = {
> ? ? ? ?&dev_attr_num_writes.attr,
> ? ? ? ?&dev_attr_invalid_io.attr,
> ? ? ? ?&dev_attr_notify_free.attr,
> + ? ? ? &dev_attr_discard.attr,
> ? ? ? ?&dev_attr_zero_pages.attr,
> ? ? ? ?&dev_attr_orig_data_size.attr,
> ? ? ? ?&dev_attr_compr_data_size.attr,
> --
> 1.7.2.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. ?For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2010-08-10 02:23:44

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 06/10] Block discard support

On 08/09/2010 03:03 PM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
>> The 'discard' bio discard request provides information to
>> zram disks regarding blocks which are no longer in use by
>> filesystem. This allows freeing memory allocated for such
>> blocks.
>>
>> When zram devices are used as swap disks, we already have
>> a callback (block_device_operations->swap_slot_free_notify).
>> So, the discard support is useful only when used as generic
>> (non-swap) disk.
>>
>> Signed-off-by: Nitin Gupta <[email protected]>
>
> Lets CC fsdevel and Jens for this.

Looks OK from a quick look. One comment, though:

>> +static void zram_discard(struct zram *zram, struct bio *bio)
>> +{
>> + size_t bytes = bio->bi_size;
>> + sector_t sector = bio->bi_sector;
>> +
>> + while (bytes >= PAGE_SIZE) {
>> + zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>> + sector += PAGE_SIZE >> SECTOR_SHIFT;
>> + bytes -= PAGE_SIZE;
>> + }
>> +
>> + bio_endio(bio, 0);
>> +}
>> +

So freeing the page here will guarantee zeroed return on read?
And since you set PAGE_SIZE as the discard granularity, the above loop
could be coded more readable with the knowledge that ->bi_size is always
a multiple of the page size.

--
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

2010-08-10 03:06:16

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 01/10] Replace ioctls with sysfs interface

On 08/10/2010 12:04 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
>> Creates per-device sysfs nodes in /sys/block/zram<id>/
>> Currently following stats are exported:
>> - disksize
>> - num_reads
>> - num_writes
>> - invalid_io
>> - zero_pages
>> - orig_data_size
>> - compr_data_size
>> - mem_used_total
>>
<snip>
>>
>> Signed-off-by: Nitin Gupta <[email protected]>
>
> Looks good to me (but I'm not a sysfs guy).
>
> Acked-by: Pekka Enberg <[email protected]>
>

Thanks!

>> /* Module params (documentation at end) */
>> -static unsigned int num_devices;
>> +unsigned int num_devices;
>> +
>> +static void zram_stat_inc(u32 *v)
>> +{
>> + *v = *v + 1;
>> +}
>> +
>> +static void zram_stat_dec(u32 *v)
>> +{
>> + *v = *v - 1;
>> +}
>> +
>> +static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
>> +{
>> + spin_lock(&zram->stat64_lock);
>> + *v = *v + inc;
>> + spin_unlock(&zram->stat64_lock);
>> +}
>> +
>> +static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
>> +{
>> + spin_lock(&zram->stat64_lock);
>> + *v = *v - dec;
>> + spin_unlock(&zram->stat64_lock);
>> +}
>> +
>> +static void zram_stat64_inc(struct zram *zram, u64 *v)
>> +{
>> + zram_stat64_add(zram, v, 1);
>> +}
>
> These could probably use atomic_inc(), atomic64_inc(), and friends, no?
>

Yes, I think we could use them. Anyways, they are replaced by percpu stats in
patch 3, so probably this can be left as-is.

Thanks,
Nitin

2010-08-10 03:38:23

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 02/10] Remove need for explicit device initialization

On 08/10/2010 12:06 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
>> Currently, the user has to explicitly write a positive value to
>> initstate sysfs node before the device can be used. This event
>> triggers allocation of per-device metadata like memory pool,
>> table array and so on.
>>
>> We do not pre-initialize all zram devices since the 'table' array,
>> mapping disk blocks to compressed chunks, takes considerable amount
>> of memory (8 bytes per page). So, pre-initializing all devices will
>> be quite wasteful if only few or none of the devices are actually
>> used.
>>
>> This explicit device initialization from user is an odd requirement and
>> can be easily avoided. We now initialize the device when first write is
>> done to the device.
>>
>> Signed-off-by: Nitin Gupta <[email protected]>
>
> AFAICT, most hardware block device drivers do things like this in the
> probe function. Why can't we do that for zram as well and drop the
> ->init_done and ->init_lock parts?
>

I think probe is only for PCI devices? Maybe should hook into open function
in struct block_device_operations. That way, we can also drop init_done and
init_lock parts.

Thanks,
Nitin

2010-08-10 04:33:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats

On Mon, 9 Aug 2010 22:56:49 +0530 Nitin Gupta <[email protected]> wrote:

> +/*
> + * Individual percpu values can go negative but the sum across all CPUs
> + * must always be positive (we store various counts). So, return sum as
> + * unsigned value.
> + */
> +static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
> {
> - u64 val;
> -
> - spin_lock(&zram->stat64_lock);
> - val = *v;
> - spin_unlock(&zram->stat64_lock);
> + int cpu;
> + s64 val = 0;
> +
> + for_each_possible_cpu(cpu) {
> + s64 temp;
> + unsigned int start;
> + struct zram_stats_cpu *stats;
> +
> + stats = per_cpu_ptr(zram->stats, cpu);
> + do {
> + start = u64_stats_fetch_begin(&stats->syncp);
> + temp = stats->count[idx];
> + } while (u64_stats_fetch_retry(&stats->syncp, start));
> + val += temp;
> + }
>
> + WARN_ON(val < 0);
> return val;
> }

That reimplements include/linux/percpu_counter.h, poorly.

Please see the June discussion "[PATCH v2 1/2] tmpfs: Quick token
library to allow scalable retrieval of tokens from token jar" for some
discussion.

2010-08-10 04:47:16

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 04/10] Use percpu buffers

On 08/10/2010 12:27 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>> zram_test_flag(zram, index, ZRAM_ZERO))
>> zram_free_page(zram, index);
>>
>> - mutex_lock(&zram->lock);
>> + preempt_disable();
>> + zbuffer = __get_cpu_var(compress_buffer);
>> + zworkmem = __get_cpu_var(compress_workmem);
>> + if (unlikely(!zbuffer || !zworkmem)) {
>> + preempt_enable();
>> + goto out;
>> + }
>
> The per-CPU buffer thing with this preempt_disable() trickery looks
> overkill to me. Most block device drivers seem to use mempool_alloc()
> for this sort of thing. Is there some reason you can't use that here?
>

Other block drivers are allocating relatively small structs using
mempool_alloc(). However, in case of zram, these buffers are quite
large (compress_workmem is 64K!). So, allocating them on every write
would probably be much slower than using a pre-allocated per-cpu buffer.

Thanks,
Nitin

2010-08-10 04:54:39

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 06/10] Block discard support

On 08/10/2010 07:53 AM, Jens Axboe wrote:
> On 08/09/2010 03:03 PM, Pekka Enberg wrote:
>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
>>> The 'discard' bio discard request provides information to
>>> zram disks regarding blocks which are no longer in use by
>>> filesystem. This allows freeing memory allocated for such
>>> blocks.
>>>
>>> When zram devices are used as swap disks, we already have
>>> a callback (block_device_operations->swap_slot_free_notify).
>>> So, the discard support is useful only when used as generic
>>> (non-swap) disk.
>>>
>>> Signed-off-by: Nitin Gupta <[email protected]>
>>
>> Lets CC fsdevel and Jens for this.
>
> Looks OK from a quick look. One comment, though:
>
>>> +static void zram_discard(struct zram *zram, struct bio *bio)
>>> +{
>>> + size_t bytes = bio->bi_size;
>>> + sector_t sector = bio->bi_sector;
>>> +
>>> + while (bytes >= PAGE_SIZE) {
>>> + zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>>> + sector += PAGE_SIZE >> SECTOR_SHIFT;
>>> + bytes -= PAGE_SIZE;
>>> + }
>>> +
>>> + bio_endio(bio, 0);
>>> +}
>>> +
>
> So freeing the page here will guarantee zeroed return on read?

For reads on freed/unwritten sectors, it simply returns success and
does not touch the bio page. Is it better to zero the page in such
cases?

> And since you set PAGE_SIZE as the discard granularity, the above loop
> could be coded more readable with the knowledge that ->bi_size is always
> a multiple of the page size.
>

Ok, I will cleanup it up.

Thanks for comments.
Nitin

2010-08-10 04:55:29

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 05/10] Reduce per table entry overhead by 4 bytes

On 08/10/2010 12:29 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
>> Each zram device maintains an array (table) that maps
>> index within the device to the location of corresponding
>> compressed chunk. Currently we store 'struct page' pointer,
>> offset with page and various flags separately which takes
>> 12 bytes per table entry. Now all these are encoded in a
>> single 'phys_add_t' value which results in savings of 4 bytes
>> per entry (except on PAE systems).
>>
>> Unfortunately, cleanups related to some variable renames
>> were mixed in this patch. So, please bear some additional
>> noise.
>
> The noise makes this patch pretty difficult to review properly. Care
> to spilt the patch into two pieces?
>

Ok, I will split them as separate patches.

Thanks for all the reviews and Acks.
Nitin

2010-08-10 05:15:54

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 04/10] Use percpu buffers

Hi Nitin,

On 10.8.2010 7.47, Nitin Gupta wrote:
> On 08/10/2010 12:27 AM, Pekka Enberg wrote:
>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta<[email protected]> wrote:
>>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>>> zram_test_flag(zram, index, ZRAM_ZERO))
>>> zram_free_page(zram, index);
>>>
>>> - mutex_lock(&zram->lock);
>>> + preempt_disable();
>>> + zbuffer = __get_cpu_var(compress_buffer);
>>> + zworkmem = __get_cpu_var(compress_workmem);
>>> + if (unlikely(!zbuffer || !zworkmem)) {
>>> + preempt_enable();
>>> + goto out;
>>> + }
>> The per-CPU buffer thing with this preempt_disable() trickery looks
>> overkill to me. Most block device drivers seem to use mempool_alloc()
>> for this sort of thing. Is there some reason you can't use that here?
>>
> Other block drivers are allocating relatively small structs using
> mempool_alloc(). However, in case of zram, these buffers are quite
> large (compress_workmem is 64K!). So, allocating them on every write
> would probably be much slower than using a pre-allocated per-cpu buffer.
The mempool API is precisely for that - using pre-allocated buffers
instead of allocating every time. The preempt_disable() games make the
code complex and have the downside of higher scheduling latencies so why
not give mempools a try?

Pekka

2010-08-10 05:32:36

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 04/10] Use percpu buffers

Hi,

On 08/10/2010 10:35 AM, Pekka Enberg wrote:
> On 10.8.2010 7.47, Nitin Gupta wrote:
>> On 08/10/2010 12:27 AM, Pekka Enberg wrote:
>>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta<[email protected]> wrote:
>>>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>>>> zram_test_flag(zram, index, ZRAM_ZERO))
>>>> zram_free_page(zram, index);
>>>>
>>>> - mutex_lock(&zram->lock);
>>>> + preempt_disable();
>>>> + zbuffer = __get_cpu_var(compress_buffer);
>>>> + zworkmem = __get_cpu_var(compress_workmem);
>>>> + if (unlikely(!zbuffer || !zworkmem)) {
>>>> + preempt_enable();
>>>> + goto out;
>>>> + }
>>> The per-CPU buffer thing with this preempt_disable() trickery looks
>>> overkill to me. Most block device drivers seem to use mempool_alloc()
>>> for this sort of thing. Is there some reason you can't use that here?
>>>
>> Other block drivers are allocating relatively small structs using
>> mempool_alloc(). However, in case of zram, these buffers are quite
>> large (compress_workmem is 64K!). So, allocating them on every write
>> would probably be much slower than using a pre-allocated per-cpu buffer.
> The mempool API is precisely for that - using pre-allocated buffers instead of allocating every time. The preempt_disable() games make the code complex and have the downside of higher scheduling latencies so why not give mempools a try?
>

mempool_alloc() first calls alloc_fn with ~(__GFP_WAIT | __GFP_IO)
and *then* falls down to pre-allocated buffers. So, it will always
be slower than directly using pre-allocated buffers as is done
currently.

One trick we can use is to have alloc_fn such that it always returns
failure with ~__GFP_WAIT and do actual allocation otherwise. But still
it seems like unnecessary cost.

Thanks,
Nitin


2010-08-10 07:36:44

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 04/10] Use percpu buffers

Hi Nitin,

On 8/10/10 8:32 AM, Nitin Gupta wrote:
> Other block drivers are allocating relatively small structs using
>>> mempool_alloc(). However, in case of zram, these buffers are quite
>>> large (compress_workmem is 64K!). So, allocating them on every write
>>> would probably be much slower than using a pre-allocated per-cpu buffer.
>>>
>> The mempool API is precisely for that - using pre-allocated buffers instead of allocating every time. The preempt_disable() games make the code complex and have the downside of higher scheduling latencies so why not give mempools a try?
>>
> mempool_alloc() first calls alloc_fn with ~(__GFP_WAIT | __GFP_IO)
> and *then* falls down to pre-allocated buffers. So, it will always
> be slower than directly using pre-allocated buffers as is done
> currently.
>
> One trick we can use is to have alloc_fn such that it always returns
> failure with ~__GFP_WAIT and do actual allocation otherwise. But still
> it seems like unnecessary cost.
>
We can always extend the mempool API with mempool_prealloc() function if
that turns out to be a problem. The per-CPU buffer with
preempt_disable() trickery isn't really the proper thing to do here. It
doesn't make much sense to disable preemption for compression that's
purely CPU bound.

Pekka

2010-08-10 15:54:37

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 06/10] Block discard support

On 08/10/2010 12:54 AM, Nitin Gupta wrote:
> On 08/10/2010 07:53 AM, Jens Axboe wrote:
>> On 08/09/2010 03:03 PM, Pekka Enberg wrote:
>>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <[email protected]> wrote:
>>>> The 'discard' bio discard request provides information to
>>>> zram disks regarding blocks which are no longer in use by
>>>> filesystem. This allows freeing memory allocated for such
>>>> blocks.
>>>>
>>>> When zram devices are used as swap disks, we already have
>>>> a callback (block_device_operations->swap_slot_free_notify).
>>>> So, the discard support is useful only when used as generic
>>>> (non-swap) disk.
>>>>
>>>> Signed-off-by: Nitin Gupta <[email protected]>
>>>
>>> Lets CC fsdevel and Jens for this.
>>
>> Looks OK from a quick look. One comment, though:
>>
>>>> +static void zram_discard(struct zram *zram, struct bio *bio)
>>>> +{
>>>> + size_t bytes = bio->bi_size;
>>>> + sector_t sector = bio->bi_sector;
>>>> +
>>>> + while (bytes >= PAGE_SIZE) {
>>>> + zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>>>> + sector += PAGE_SIZE >> SECTOR_SHIFT;
>>>> + bytes -= PAGE_SIZE;
>>>> + }
>>>> +
>>>> + bio_endio(bio, 0);
>>>> +}
>>>> +
>>
>> So freeing the page here will guarantee zeroed return on read?
>
> For reads on freed/unwritten sectors, it simply returns success and
> does not touch the bio page. Is it better to zero the page in such
> cases?

Well, you told the kernel that you return zeroes on discarded ranges:

zram->disk->queue->limits.discard_zeroes_data = 1;

So yes, if you intend to keep that, then you need to zero the
incoming pages that have been explicitly trimmed by a discard.

--
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

2010-08-11 16:39:08

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats

On 08/10/2010 10:04 AM, Andrew Morton wrote:
> On Mon, 9 Aug 2010 22:56:49 +0530 Nitin Gupta <[email protected]> wrote:
>
>> +/*
>> + * Individual percpu values can go negative but the sum across all CPUs
>> + * must always be positive (we store various counts). So, return sum as
>> + * unsigned value.
>> + */
>> +static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
>> {
>> - u64 val;
>> -
>> - spin_lock(&zram->stat64_lock);
>> - val = *v;
>> - spin_unlock(&zram->stat64_lock);
>> + int cpu;
>> + s64 val = 0;
>> +
>> + for_each_possible_cpu(cpu) {
>> + s64 temp;
>> + unsigned int start;
>> + struct zram_stats_cpu *stats;
>> +
>> + stats = per_cpu_ptr(zram->stats, cpu);
>> + do {
>> + start = u64_stats_fetch_begin(&stats->syncp);
>> + temp = stats->count[idx];
>> + } while (u64_stats_fetch_retry(&stats->syncp, start));
>> + val += temp;
>> + }
>>
>> + WARN_ON(val < 0);
>> return val;
>> }
>
> That reimplements include/linux/percpu_counter.h, poorly.
>
> Please see the June discussion "[PATCH v2 1/2] tmpfs: Quick token
> library to allow scalable retrieval of tokens from token jar" for some
> discussion.
>
>

I read the discussion you pointed out but still fail to see how percpu_counters,
with all their overhead, are better than simple pcpu variable used in current
version. What is the advantage?

Thanks,
Nitin

2010-08-11 17:17:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats

On Wed, 11 Aug 2010 22:09:29 +0530 Nitin Gupta <[email protected]> wrote:

> On 08/10/2010 10:04 AM, Andrew Morton wrote:
> > On Mon, 9 Aug 2010 22:56:49 +0530 Nitin Gupta <[email protected]> wrote:
> >
> >> +/*
> >> + * Individual percpu values can go negative but the sum across all CPUs
> >> + * must always be positive (we store various counts). So, return sum as
> >> + * unsigned value.
> >> + */
> >> +static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
> >> {
> >> - u64 val;
> >> -
> >> - spin_lock(&zram->stat64_lock);
> >> - val = *v;
> >> - spin_unlock(&zram->stat64_lock);
> >> + int cpu;
> >> + s64 val = 0;
> >> +
> >> + for_each_possible_cpu(cpu) {
> >> + s64 temp;
> >> + unsigned int start;
> >> + struct zram_stats_cpu *stats;
> >> +
> >> + stats = per_cpu_ptr(zram->stats, cpu);
> >> + do {
> >> + start = u64_stats_fetch_begin(&stats->syncp);
> >> + temp = stats->count[idx];
> >> + } while (u64_stats_fetch_retry(&stats->syncp, start));
> >> + val += temp;
> >> + }
> >>
> >> + WARN_ON(val < 0);
> >> return val;
> >> }
> >
> > That reimplements include/linux/percpu_counter.h, poorly.
> >
> > Please see the June discussion "[PATCH v2 1/2] tmpfs: Quick token
> > library to allow scalable retrieval of tokens from token jar" for some
> > discussion.
> >
> >
>
> I read the discussion you pointed out but still fail to see how percpu_counters,
> with all their overhead,

What overhead? Send numbers. Then extrapolate those numbers to a
machine which has 128 possible CPUs and 4 present CPUs.

> are better than simple pcpu variable used in current
> version. What is the advantage?

Firstly, they'd have saved all the time you spent duplicating them.

Secondly, getting additional users of the standard facility results in
more testing and perhaps enhancement of that facility, thus benefiting
other users too.

Thirdly, using the standard facility permits your code to leverage
enhancements which others add.

Fourthly, they would result in a smaller kernel.

You didn't really need me to teach you the benefits of code reuse did
you?


Please do not merge this code unless there is a good reason to do so
and it has been shown that the standard facility cannot be suitably
fixed or enhanced to address the deficiency.

Subject: Re: [PATCH 03/10] Use percpu stats

On Mon, 9 Aug 2010, Nitin Gupta wrote:

> -static void zram_stat_inc(u32 *v)
> +static void zram_add_stat(struct zram *zram,
> + enum zram_stats_index idx, s64 val)
> {
> - *v = *v + 1;
> + struct zram_stats_cpu *stats;
> +
> + preempt_disable();
> + stats = __this_cpu_ptr(zram->stats);
> + u64_stats_update_begin(&stats->syncp);
> + stats->count[idx] += val;
> + u64_stats_update_end(&stats->syncp);
> + preempt_enable();

Maybe do

#define zram_add_stat(zram, index, val)
this_cpu_add(zram->stats->count[index], val)

instead? It creates an add in a single "atomic" per cpu instruction and
deals with the fallback scenarios for processors that cannot handle 64
bit adds.

2010-08-31 05:39:53

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats


Hi,

> + zram->stats = alloc_percpu(struct zram_stats_cpu);
> + if (!zram->stats) {
> + pr_err("Error allocating percpu stats\n");
> + ret = -ENOMEM;
> + goto fail;
> + }

There doesn't seem to be a free_percpu() in the module exit path. Something
like this perhaps?

Anton
--

zram: Free percpu data on module exit.

Signed-off-by: Anton Blanchard <[email protected]>
---

Index: powerpc.git/drivers/staging/zram/zram_drv.c
===================================================================
--- powerpc.git.orig/drivers/staging/zram/zram_drv.c 2010-08-31 15:15:59.344290847 +1000
+++ powerpc.git/drivers/staging/zram/zram_drv.c 2010-08-31 15:17:00.383045836 +1000
@@ -483,8 +483,7 @@ void zram_reset_device(struct zram *zram
xv_destroy_pool(zram->mem_pool);
zram->mem_pool = NULL;

- /* Reset stats */
- memset(&zram->stats, 0, sizeof(zram->stats));
+ free_percpu(&zram->stats);

zram->disksize = zram_default_disksize();
mutex_unlock(&zram->init_lock);

2010-08-31 20:31:11

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats

On Mon, Aug 30, 2010 at 12:20 PM, Christoph Lameter <[email protected]> wrote:
> On Mon, 9 Aug 2010, Nitin Gupta wrote:
>
>> -static void zram_stat_inc(u32 *v)
>> +static void zram_add_stat(struct zram *zram,
>> + ? ? ? ? ? ? ? ? ? ? enum zram_stats_index idx, s64 val)
>> ?{
>> - ? ? *v = *v + 1;
>> + ? ? struct zram_stats_cpu *stats;
>> +
>> + ? ? preempt_disable();
>> + ? ? stats = __this_cpu_ptr(zram->stats);
>> + ? ? u64_stats_update_begin(&stats->syncp);
>> + ? ? stats->count[idx] += val;
>> + ? ? u64_stats_update_end(&stats->syncp);
>> + ? ? preempt_enable();
>
> Maybe do
>
> #define zram_add_stat(zram, index, val)
> ? ? ? ? ? ? ? ?this_cpu_add(zram->stats->count[index], val)
>
> instead? It creates an add in a single "atomic" per cpu instruction and
> deals with the fallback scenarios for processors that cannot handle 64
> bit adds.
>
>

Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
but if it's not required for atomic access to 64-bit then why was it added to
the mainline in the first place?

Thanks,
Nitin

2010-08-31 21:28:33

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats

Le mardi 31 août 2010 à 16:31 -0400, Nitin Gupta a écrit :
> On Mon, Aug 30, 2010 at 12:20 PM, Christoph Lameter <[email protected]> wrote:
> > On Mon, 9 Aug 2010, Nitin Gupta wrote:
> >
> >> -static void zram_stat_inc(u32 *v)
> >> +static void zram_add_stat(struct zram *zram,
> >> + enum zram_stats_index idx, s64 val)
> >> {
> >> - *v = *v + 1;
> >> + struct zram_stats_cpu *stats;
> >> +
> >> + preempt_disable();
> >> + stats = __this_cpu_ptr(zram->stats);
> >> + u64_stats_update_begin(&stats->syncp);
> >> + stats->count[idx] += val;
> >> + u64_stats_update_end(&stats->syncp);
> >> + preempt_enable();
> >
> > Maybe do
> >
> > #define zram_add_stat(zram, index, val)
> > this_cpu_add(zram->stats->count[index], val)
> >
> > instead? It creates an add in a single "atomic" per cpu instruction and
> > deals with the fallback scenarios for processors that cannot handle 64
> > bit adds.
> >
> >
>
> Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
> but if it's not required for atomic access to 64-bit then why was it added to
> the mainline in the first place?

Because we wanted to have fast 64bit counters, even on 32bit arches, and
this has litle to do with 'atomic' on one entity, but a group of
counters. (check drivers/net/loopback.c, lines 91-94). No lock prefix
used in fast path.

We also wanted readers to read correct values, not a value being changed
by a writer, with inconsistent 32bit halves. SNMP applications want
monotonically increasing counters.

this_cpu_add()/this_cpu_read() doesnt fit.

Even for single counter, this_cpu_read(64bit) is not using an RMW
(cmpxchg8) instruction, so you can get very strange results when low
order 32bit wraps.


Subject: Re: [PATCH 03/10] Use percpu stats

On Tue, 31 Aug 2010, Eric Dumazet wrote:

> > Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
> > but if it's not required for atomic access to 64-bit then why was it added to
> > the mainline in the first place?
>
> Because we wanted to have fast 64bit counters, even on 32bit arches, and
> this has litle to do with 'atomic' on one entity, but a group of
> counters. (check drivers/net/loopback.c, lines 91-94). No lock prefix
> used in fast path.
>
> We also wanted readers to read correct values, not a value being changed
> by a writer, with inconsistent 32bit halves. SNMP applications want
> monotonically increasing counters.
>
> this_cpu_add()/this_cpu_read() doesnt fit.
>
> Even for single counter, this_cpu_read(64bit) is not using an RMW
> (cmpxchg8) instruction, so you can get very strange results when low
> order 32bit wraps.

How about fixing it so that everyone benefits?

2010-08-31 21:41:25

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats

Le mardi 31 août 2010 à 16:35 -0500, Christoph Lameter a écrit :
> On Tue, 31 Aug 2010, Eric Dumazet wrote:
>
> > > Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
> > > but if it's not required for atomic access to 64-bit then why was it added to
> > > the mainline in the first place?
> >
> > Because we wanted to have fast 64bit counters, even on 32bit arches, and
> > this has litle to do with 'atomic' on one entity, but a group of
> > counters. (check drivers/net/loopback.c, lines 91-94). No lock prefix
> > used in fast path.
> >
> > We also wanted readers to read correct values, not a value being changed
> > by a writer, with inconsistent 32bit halves. SNMP applications want
> > monotonically increasing counters.
> >
> > this_cpu_add()/this_cpu_read() doesnt fit.
> >
> > Even for single counter, this_cpu_read(64bit) is not using an RMW
> > (cmpxchg8) instruction, so you can get very strange results when low
> > order 32bit wraps.
>
> How about fixing it so that everyone benefits?
>

IMHO, this_cpu_read() is fine as is : a _read_ operation.

Dont pretend it can be used in every context, its not true.


2010-08-31 22:38:42

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 00/10] zram: various improvements and cleanups

On Mon, Aug 09, 2010 at 10:56:46PM +0530, Nitin Gupta wrote:
> The zram module creates RAM based block devices named /dev/zram<id>
> (<id> = 0, 1, ...). Pages written to these disks are compressed and stored
> in memory itself.
>
> One of the major changes done is the replacement of ioctls with sysfs
> interface. One of the advantages of this approach is we no longer depend on the
> userspace tool (rzscontrol) which was used to set various parameters and check
> statistics. Maintaining updated version of rzscontrol as changes were done to
> ioctls, statistics exported etc. was a major pain.
>
> Another significant change is the introduction of percpu stats and compression
> buffers. Earlier, we had per-device buffers protected by a mutex. This was a
> major bottleneck on multi-core systems. With these changes, benchmarks with
> fio[1] showed a speedup of about 20% for write performance on dual-core
> system (see patch 4/10 description for details).
>
>
> For easier testing, a single patch against 2.6.35-git8 has been uploaded at:
> http://compcache.googlecode.com/hg/sub-projects/mainline/zram_2.6.36-rc0.patch

I applied the first 2 and last 2 of these patches and they will show up
in the linux-next tree tomorrow. I stopped there due to Andrew's
complaints about the per-cpu variable stuff. Please resolve this and
redo the patch set and I will be glad to apply them.

thanks,

greg k-h

2010-08-31 23:06:30

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 01/10] Replace ioctls with sysfs interface

On Mon, 2010-08-09 at 22:56 +0530, Nitin Gupta wrote:
> Creates per-device sysfs nodes in /sys/block/zram<id>/
> Currently following stats are exported:
> - disksize
> - num_reads
> - num_writes
> - invalid_io
> - zero_pages
> - orig_data_size
> - compr_data_size
> - mem_used_total
>
> By default, disksize is set to 0. So, to start using
> a zram device, fist write a disksize value and then
> initialize device by writing any positive value to
> initstate. For example:
>
> # initialize /dev/zram0 with 50MB disksize
> echo 50*1024*1024 | bc > /sys/block/zram0/disksize
> echo 1 > /sys/block/zram0/initstate
>
> When done using a disk, issue reset to free its memory
> by writing any positive value to reset node:
>
> echo 1 > /sys/block/zram0/reset

Maybe I'm just a weirdo, but I don't really use modules much. That
effectively means that I'm stuck at boot with one zram device.

Making it a read-only module param also means that someone can't add a
second at runtime while the first is still in use.

It doesn't seem to be used very pervasively, but there is a
module_param_cb() function so you can register callbacks when the param
gets updated. Might come in handy for this.

-- Dave

2010-09-01 03:37:06

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH 00/10] zram: various improvements and cleanups


Hi Nitin,

I gave zram a try on a ppc64 box with a 64kB PAGE_SIZE. It looks like the
xvmalloc allocator fails when we add in a large enough block (in this case
65532 bytes).

flindex ends up as 127 which is larger than BITS_PER_LONG. We continually call
grow_block inside find_block and fail:

zram: Error allocating memory for compressed page: 0, size=467

Anton

2010-09-01 03:42:10

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats


Hi Nitin,

> There doesn't seem to be a free_percpu() in the module exit path. Something
> like this perhaps?

Sorry, was having other zram issues on ppc64 that prevented me from testing
this. Stupid bug, removed the &.

Anton

zram: Free percpu data on module exit.

Signed-off-by: Anton Blanchard <[email protected]>
---

Index: powerpc.git/drivers/staging/zram/zram_drv.c
===================================================================
--- powerpc.git.orig/drivers/staging/zram/zram_drv.c 2010-08-31 15:15:59.344290847 +1000
+++ powerpc.git/drivers/staging/zram/zram_drv.c 2010-09-01 12:35:02.964893575 +1000
@@ -483,8 +483,7 @@ void zram_reset_device(struct zram *zram
xv_destroy_pool(zram->mem_pool);
zram->mem_pool = NULL;

- /* Reset stats */
- memset(&zram->stats, 0, sizeof(zram->stats));
+ free_percpu(zram->stats);

zram->disksize = zram_default_disksize();
mutex_unlock(&zram->init_lock);

2010-09-01 03:52:13

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH 03/10] Use percpu stats


Hi,

> Also remove references to removed stats (ex: good_comress).

I'm getting an oops when running mkfs on zram:

NIP [d0000000030e0340] .zram_inc_stat+0x58/0x84 [zram]
[c00000006d58f720] [d0000000030e091c] .zram_make_request+0xa8/0x6a0 [zram]
[c00000006d58f840] [c00000000035795c] .generic_make_request+0x390/0x434
[c00000006d58f950] [c000000000357b14] .submit_bio+0x114/0x140
[c00000006d58fa20] [c000000000361778] .blkdev_issue_discard+0x1ac/0x250
[c00000006d58fb10] [c000000000361f68] .blkdev_ioctl+0x358/0x7fc
[c00000006d58fbd0] [c0000000001c1c1c] .block_ioctl+0x6c/0x90
[c00000006d58fc70] [c0000000001984c4] .do_vfs_ioctl+0x660/0x6d4
[c00000006d58fd70] [c0000000001985a0] .SyS_ioctl+0x68/0xb0

Since disksize no longer starts as 0 it looks like we can call
zram_make_request before the device has been initialised. The patch below
fixes the immediate problem but this would go away if we move the
initialisation function elsewhere (as suggested in another thread).

Signed-off-by: Anton Blanchard <[email protected]>
---

Index: powerpc.git/drivers/staging/zram/zram_drv.c
===================================================================
--- powerpc.git.orig/drivers/staging/zram/zram_drv.c 2010-09-01 12:35:14.286515175 +1000
+++ powerpc.git/drivers/staging/zram/zram_drv.c 2010-09-01 12:35:24.167930504 +1000
@@ -441,6 +441,12 @@ static int zram_make_request(struct requ
int ret = 0;
struct zram *zram = queue->queuedata;

+ if (unlikely(!zram->init_done)) {
+ set_bit(BIO_UPTODATE, &bio->bi_flags);
+ bio_endio(bio, 0);
+ return 0;
+ }
+
if (unlikely(!valid_io_request(zram, bio))) {
zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
bio_io_error(bio);