We found some issues in zram by code inspection, so generate this set
of patches for bugfixes and minor code improvements.
The first six patches are bugfixes, and should target the mainline
and even stable trees. The last 4 patches are code cleanup and
performance optimization, but not sure whether Greg is kind enough
to accept them:)
Great thanks to Minchan and Jerome for code review!
V2->V3:
1) reorder patches so bugfixes go first
2) rewrite valid_io_request()
3) enhance comments and commit messages
Jiang Liu (10):
zram: avoid invalid memory access in zram_exit()
zram: use zram->lock to protect zram_free_page() in swap free notify
path
zram: destroy all devices on error recovery path in zram_init()
zram: avoid double free in function zram_bvec_write()
zram: avoid access beyond the zram device
zram: protect sysfs handler from invalid memory access
zram: simplify and optimize dev_to_zram()
zram: kill unused zram_get_num_devices()
zram: optimize memory operations with clear_page()/copy_page()
zram: use atomic64_xxx() to replace zram_stat64_xxx()
drivers/staging/zram/zram_drv.c | 109 +++++++++++++++++---------------------
drivers/staging/zram/zram_drv.h | 26 ++++-----
drivers/staging/zram/zram_sysfs.c | 36 ++++---------
3 files changed, 73 insertions(+), 98 deletions(-)
--
1.8.1.2
Memory for zram->disk object may have already been freed after returning
from destroy_device(zram), then it's unsafe for zram_reset_device(zram)
to access zram->disk again.
We can't solve this bug by flipping the order of destroy_device(zram)
and zram_reset_device(zram), that will cause deadlock issues to the
zram sysfs handler.
So fix it by holding an extra reference to zram->disk before calling
destroy_device(zram).
Signed-off-by: Jiang Liu <[email protected]>
Cc: [email protected]
---
drivers/staging/zram/zram_drv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index e34e3fe..ee6b67d 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -727,8 +727,10 @@ static void __exit zram_exit(void)
for (i = 0; i < num_devices; i++) {
zram = &zram_devices[i];
+ get_disk(zram->disk);
destroy_device(zram);
zram_reset_device(zram);
+ put_disk(zram->disk);
}
unregister_blkdev(zram_major, "zram");
--
1.8.1.2
On error recovery path of zram_init(), it leaks the zram device object
causing the failure. So change create_device() to free allocated
resources on error path.
Signed-off-by: Jiang Liu <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Jerome Marchand <[email protected]>
Cc: [email protected]
---
drivers/staging/zram/zram_drv.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 0738f6c..2ca6dc9 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -595,7 +595,7 @@ static const struct block_device_operations zram_devops = {
static int create_device(struct zram *zram, int device_id)
{
- int ret = 0;
+ int ret = -ENOMEM;
init_rwsem(&zram->lock);
init_rwsem(&zram->init_lock);
@@ -605,7 +605,6 @@ static int create_device(struct zram *zram, int device_id)
if (!zram->queue) {
pr_err("Error allocating disk queue for device %d\n",
device_id);
- ret = -ENOMEM;
goto out;
}
@@ -615,11 +614,9 @@ static int create_device(struct zram *zram, int device_id)
/* gendisk structure */
zram->disk = alloc_disk(1);
if (!zram->disk) {
- blk_cleanup_queue(zram->queue);
pr_warn("Error allocating disk structure for device %d\n",
device_id);
- ret = -ENOMEM;
- goto out;
+ goto out_free_queue;
}
zram->disk->major = zram_major;
@@ -648,11 +645,17 @@ static int create_device(struct zram *zram, int device_id)
&zram_disk_attr_group);
if (ret < 0) {
pr_warn("Error creating sysfs group");
- goto out;
+ goto out_free_disk;
}
zram->init_done = 0;
+ return 0;
+out_free_disk:
+ del_gendisk(zram->disk);
+ put_disk(zram->disk);
+out_free_queue:
+ blk_cleanup_queue(zram->queue);
out:
return ret;
}
--
1.8.1.2
Use zram->init_lock to protect access to zram->meta, otherwise it
may cause invalid memory access if zram->meta has been freed by
zram_reset_device().
This issue may be triggered by:
Thread 1:
while true; do cat mem_used_total; done
Thread 2:
while true; do echo 8M > disksize; echo 1 > reset; done
Signed-off-by: Jiang Liu <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: [email protected]
---
drivers/staging/zram/zram_sysfs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index e6a929d..dc76a3d 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -188,8 +188,10 @@ static ssize_t mem_used_total_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);
struct zram_meta *meta = zram->meta;
+ down_read(&zram->init_lock);
if (zram->init_done)
val = zs_get_total_size_bytes(meta->mem_pool);
+ up_read(&zram->init_lock);
return sprintf(buf, "%llu\n", val);
}
--
1.8.1.2
Now there's no caller of zram_get_num_devices(), so kill it.
And change zram_devices to static because it's only used in zram_drv.c.
Signed-off-by: Jiang Liu <[email protected]>
---
drivers/staging/zram/zram_drv.c | 7 +------
drivers/staging/zram/zram_drv.h | 2 --
2 files changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 9289217..5e6545a 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -37,7 +37,7 @@
/* Globals */
static int zram_major;
-struct zram *zram_devices;
+static struct zram *zram_devices;
/* Module params (documentation at end) */
static unsigned int num_devices = 1;
@@ -679,11 +679,6 @@ static void destroy_device(struct zram *zram)
blk_cleanup_queue(zram->queue);
}
-unsigned int zram_get_num_devices(void)
-{
- return num_devices;
-}
-
static int __init zram_init(void)
{
int ret, dev_id;
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index d542eee..b3a315d 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -110,8 +110,6 @@ struct zram {
struct zram_stats stats;
};
-extern struct zram *zram_devices;
-unsigned int zram_get_num_devices(void);
#ifdef CONFIG_SYSFS
extern struct attribute_group zram_disk_attr_group;
#endif
--
1.8.1.2
Simplify and optimize dev_to_zram() without walking the zram_devices
array.
Signed-off-by: Jiang Liu <[email protected]>
---
drivers/staging/zram/zram_sysfs.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index dc76a3d..e239d94 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -30,18 +30,9 @@ static u64 zram_stat64_read(struct zram *zram, u64 *v)
return val;
}
-static struct zram *dev_to_zram(struct device *dev)
+static inline struct zram *dev_to_zram(struct device *dev)
{
- int i;
- struct zram *zram = NULL;
-
- for (i = 0; i < zram_get_num_devices(); i++) {
- zram = &zram_devices[i];
- if (disk_to_dev(zram->disk) == dev)
- break;
- }
-
- return zram;
+ return (struct zram *)dev_to_disk(dev)->private_data;
}
static ssize_t disksize_show(struct device *dev,
--
1.8.1.2
Some architectures provides architecture-specific, optimized version of
clear_page()/copy_page(), which may have better performance than
memset()/memcpy(). So use clear_page()/copy_page() to optimize zram
performance if possible.
Signed-off-by: Jiang Liu <[email protected]>
---
drivers/staging/zram/zram_drv.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 5e6545a..b4f84b6b 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -128,23 +128,26 @@ static void zram_free_page(struct zram *zram, size_t index)
meta->table[index].size = 0;
}
+static inline int is_partial_io(struct bio_vec *bvec)
+{
+ return bvec->bv_len != PAGE_SIZE;
+}
+
static void handle_zero_page(struct bio_vec *bvec)
{
struct page *page = bvec->bv_page;
void *user_mem;
user_mem = kmap_atomic(page);
- memset(user_mem + bvec->bv_offset, 0, bvec->bv_len);
+ if (is_partial_io(bvec))
+ memset(user_mem + bvec->bv_offset, 0, bvec->bv_len);
+ else
+ clear_page(user_mem);
kunmap_atomic(user_mem);
flush_dcache_page(page);
}
-static inline int is_partial_io(struct bio_vec *bvec)
-{
- return bvec->bv_len != PAGE_SIZE;
-}
-
static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
{
int ret = LZO_E_OK;
@@ -154,13 +157,13 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
unsigned long handle = meta->table[index].handle;
if (!handle || zram_test_flag(meta, index, ZRAM_ZERO)) {
- memset(mem, 0, PAGE_SIZE);
+ clear_page(mem);
return 0;
}
cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
if (meta->table[index].size == PAGE_SIZE)
- memcpy(mem, cmem, PAGE_SIZE);
+ copy_page(mem, cmem);
else
ret = lzo1x_decompress_safe(cmem, meta->table[index].size,
mem, &clen);
@@ -309,11 +312,13 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
}
cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
- if ((clen == PAGE_SIZE) && !is_partial_io(bvec))
+ if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
src = kmap_atomic(page);
- memcpy(cmem, src, clen);
- if ((clen == PAGE_SIZE) && !is_partial_io(bvec))
+ copy_page(cmem, src);
kunmap_atomic(src);
+ } else {
+ memcpy(cmem, src, clen);
+ }
zs_unmap_object(meta->mem_pool, handle);
--
1.8.1.2
Use atomic64_xxx() to replace open-coded zram_stat64_xxx().
Some architectures have native support of atomic64 operations,
so we can get rid of the spin_lock() in zram_stat64_xxx().
On the other hand, for platforms use generic version of atomic64
implement, it may cause an extra save/restore of the interrupt
flag. So it's a tradeoff.
Signed-off-by: Jiang Liu <[email protected]>
---
drivers/staging/zram/zram_drv.c | 37 ++++++++-----------------------------
drivers/staging/zram/zram_drv.h | 19 +++++++++++--------
drivers/staging/zram/zram_sysfs.c | 21 +++++----------------
3 files changed, 24 insertions(+), 53 deletions(-)
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index b4f84b6b..518e87d 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -42,25 +42,6 @@ static struct zram *zram_devices;
/* Module params (documentation at end) */
static unsigned int num_devices = 1;
-static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
-{
- spin_lock(&zram->stat64_lock);
- *v = *v + inc;
- spin_unlock(&zram->stat64_lock);
-}
-
-static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
-{
- spin_lock(&zram->stat64_lock);
- *v = *v - dec;
- spin_unlock(&zram->stat64_lock);
-}
-
-static void zram_stat64_inc(struct zram *zram, u64 *v)
-{
- zram_stat64_add(zram, v, 1);
-}
-
static int zram_test_flag(struct zram_meta *meta, u32 index,
enum zram_pageflags flag)
{
@@ -120,8 +101,7 @@ static void zram_free_page(struct zram *zram, size_t index)
if (size <= PAGE_SIZE / 2)
zram->stats.good_compress--;
- zram_stat64_sub(zram, &zram->stats.compr_size,
- meta->table[index].size);
+ atomic64_sub(meta->table[index].size, &zram->stats.compr_size);
zram->stats.pages_stored--;
meta->table[index].handle = 0;
@@ -172,7 +152,7 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
/* Should NEVER happen. Return bio error if it does. */
if (unlikely(ret != LZO_E_OK)) {
pr_err("Decompression failed! err=%d, page=%u\n", ret, index);
- zram_stat64_inc(zram, &zram->stats.failed_reads);
+ atomic64_inc(&zram->stats.failed_reads);
return ret;
}
@@ -326,7 +306,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
meta->table[index].size = clen;
/* Update stats */
- zram_stat64_add(zram, &zram->stats.compr_size, clen);
+ atomic64_add(clen, &zram->stats.compr_size);
zram->stats.pages_stored++;
if (clen <= PAGE_SIZE / 2)
zram->stats.good_compress++;
@@ -336,7 +316,7 @@ out:
kfree(uncmem);
if (ret)
- zram_stat64_inc(zram, &zram->stats.failed_writes);
+ atomic64_inc(&zram->stats.failed_writes);
return ret;
}
@@ -373,10 +353,10 @@ static void __zram_make_request(struct zram *zram, struct bio *bio, int rw)
switch (rw) {
case READ:
- zram_stat64_inc(zram, &zram->stats.num_reads);
+ atomic64_inc(&zram->stats.num_reads);
break;
case WRITE:
- zram_stat64_inc(zram, &zram->stats.num_writes);
+ atomic64_inc(&zram->stats.num_writes);
break;
}
@@ -456,7 +436,7 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
goto error;
if (!valid_io_request(zram, bio)) {
- zram_stat64_inc(zram, &zram->stats.invalid_io);
+ atomic64_inc(&zram->stats.invalid_io);
goto error;
}
@@ -595,7 +575,7 @@ static void zram_slot_free_notify(struct block_device *bdev,
down_write(&zram->lock);
zram_free_page(zram, index);
up_write(&zram->lock);
- zram_stat64_inc(zram, &zram->stats.notify_free);
+ atomic64_inc(&zram->stats.notify_free);
}
static const struct block_device_operations zram_devops = {
@@ -609,7 +589,6 @@ static int create_device(struct zram *zram, int device_id)
init_rwsem(&zram->lock);
init_rwsem(&zram->init_lock);
- spin_lock_init(&zram->stat64_lock);
zram->queue = blk_alloc_queue(GFP_KERNEL);
if (!zram->queue) {
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index b3a315d..11b09fc 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -69,14 +69,18 @@ struct table {
u8 flags;
} __aligned(4);
+/*
+ * All 64bit fields should only be manipulated by 64bit atomic accessors.
+ * All modifications to 32bit counter should be protected by zram->lock.
+ */
struct zram_stats {
- u64 compr_size; /* compressed size of pages stored */
- u64 num_reads; /* failed + successful */
- u64 num_writes; /* --do-- */
- u64 failed_reads; /* should NEVER! happen */
- u64 failed_writes; /* can happen when memory is too low */
- u64 invalid_io; /* non-page-aligned I/O requests */
- u64 notify_free; /* no. of swap slot free notifications */
+ atomic64_t compr_size; /* compressed size of pages stored */
+ atomic64_t num_reads; /* failed + successful */
+ atomic64_t num_writes; /* --do-- */
+ atomic64_t failed_reads; /* should NEVER! happen */
+ atomic64_t failed_writes; /* can happen when memory is too low */
+ atomic64_t invalid_io; /* non-page-aligned I/O requests */
+ atomic64_t notify_free; /* no. of swap slot free notifications */
u32 pages_zero; /* no. of zero filled pages */
u32 pages_stored; /* no. of pages currently stored */
u32 good_compress; /* % of pages with compression ratio<=50% */
@@ -92,7 +96,6 @@ struct zram_meta {
struct zram {
struct zram_meta *meta;
- spinlock_t stat64_lock; /* protect 64-bit stats */
struct rw_semaphore lock; /* protect compression buffers, table,
* 32bit stat counters against concurrent
* notifications, reads and writes */
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index e239d94..93a2f9c 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -19,17 +19,6 @@
#include "zram_drv.h"
-static u64 zram_stat64_read(struct zram *zram, u64 *v)
-{
- u64 val;
-
- spin_lock(&zram->stat64_lock);
- val = *v;
- spin_unlock(&zram->stat64_lock);
-
- return val;
-}
-
static inline struct zram *dev_to_zram(struct device *dev)
{
return (struct zram *)dev_to_disk(dev)->private_data;
@@ -116,7 +105,7 @@ static ssize_t num_reads_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);
return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.num_reads));
+ (u64)atomic64_read(&zram->stats.num_reads));
}
static ssize_t num_writes_show(struct device *dev,
@@ -125,7 +114,7 @@ static ssize_t num_writes_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);
return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.num_writes));
+ (u64)atomic64_read(&zram->stats.num_writes));
}
static ssize_t invalid_io_show(struct device *dev,
@@ -134,7 +123,7 @@ static ssize_t invalid_io_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);
return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.invalid_io));
+ (u64)atomic64_read(&zram->stats.invalid_io));
}
static ssize_t notify_free_show(struct device *dev,
@@ -143,7 +132,7 @@ static ssize_t notify_free_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);
return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.notify_free));
+ (u64)atomic64_read(&zram->stats.notify_free));
}
static ssize_t zero_pages_show(struct device *dev,
@@ -169,7 +158,7 @@ static ssize_t compr_data_size_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);
return sprintf(buf, "%llu\n",
- zram_stat64_read(zram, &zram->stats.compr_size));
+ (u64)atomic64_read(&zram->stats.compr_size));
}
static ssize_t mem_used_total_show(struct device *dev,
--
1.8.1.2
Function valid_io_request() should verify the entire request are within
the zram device address range. Otherwise it may cause invalid memory
access when accessing/modifying zram->meta->table[index] because the
'index' is out of range. Then it may access non-exist memory, randomly
modify memory belong to other subsystems, which is hard to track down.
Signed-off-by: Jiang Liu <[email protected]>
Cc: [email protected]
---
drivers/staging/zram/zram_drv.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 27ab824..9289217 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -420,13 +420,20 @@ out:
*/
static inline int valid_io_request(struct zram *zram, struct bio *bio)
{
- if (unlikely(
- (bio->bi_sector >= (zram->disksize >> SECTOR_SHIFT)) ||
- (bio->bi_sector & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)) ||
- (bio->bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))) {
+ u64 start, end, bound;
+
+ /* unaligned request */
+ if (unlikely(bio->bi_sector & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
+ return 0;
+ if (unlikely(bio->bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
+ return 0;
+ start = bio->bi_sector;
+ end = start + (bio->bi_size >> SECTOR_SHIFT);
+ bound = zram->disksize >> SECTOR_SHIFT;
+ /* out of range range */
+ if (unlikely(start >= bound || end >= bound || start > end))
return 0;
- }
/* I/O request is valid */
return 1;
--
1.8.1.2
When doing a patial write and the whole page is filled with zero,
zram_bvec_write() will free uncmem twice.
Signed-off-by: Jiang Liu <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: [email protected]
---
drivers/staging/zram/zram_drv.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 2ca6dc9..27ab824 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -272,8 +272,6 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
if (page_zero_filled(uncmem)) {
kunmap_atomic(user_mem);
- if (is_partial_io(bvec))
- kfree(uncmem);
zram->stats.pages_zero++;
zram_set_flag(meta, index, ZRAM_ZERO);
ret = 0;
--
1.8.1.2
zram_slot_free_notify() is free-running without any protection from
concurrent operations. So there are race conditions between
zram_bvec_read()/zram_bvec_write() and zram_slot_free_notify(),
and possible consequences include:
1) Trigger BUG_ON(!handle) on zram_bvec_write() side.
2) Access to freed pages on zram_bvec_read() side.
3) Break some fields (bad_compress, good_compress, pages_stored)
in zram->stats if the swap layer makes concurrently call to
zram_slot_free_notify().
So enhance zram_slot_free_notify() to acquire writer lock on zram->lock
before calling zram_free_page().
Signed-off-by: Jiang Liu <[email protected]>
Cc: [email protected]
---
drivers/staging/zram/zram_drv.c | 2 ++
drivers/staging/zram/zram_drv.h | 5 +++--
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index ee6b67d..0738f6c 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -582,7 +582,9 @@ static void zram_slot_free_notify(struct block_device *bdev,
struct zram *zram;
zram = bdev->bd_disk->private_data;
+ down_write(&zram->lock);
zram_free_page(zram, index);
+ up_write(&zram->lock);
zram_stat64_inc(zram, &zram->stats.notify_free);
}
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 2d1a3f1..d542eee 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -93,8 +93,9 @@ struct zram_meta {
struct zram {
struct zram_meta *meta;
spinlock_t stat64_lock; /* protect 64-bit stats */
- struct rw_semaphore lock; /* protect compression buffers and table
- * against concurrent read and writes */
+ struct rw_semaphore lock; /* protect compression buffers, table,
+ * 32bit stat counters against concurrent
+ * notifications, reads and writes */
struct request_queue *queue;
struct gendisk *disk;
int init_done;
--
1.8.1.2
Hello Jiang,
On Fri, Jun 07, 2013 at 12:07:22AM +0800, Jiang Liu wrote:
> Memory for zram->disk object may have already been freed after returning
> from destroy_device(zram), then it's unsafe for zram_reset_device(zram)
> to access zram->disk again.
>
> We can't solve this bug by flipping the order of destroy_device(zram)
> and zram_reset_device(zram), that will cause deadlock issues to the
> zram sysfs handler.
Sorry for bothering you with description nitpick.
I agree your approach is so simple that I'd like to give Ack
but your description is not clear.
If you really want to say deadlock issue with flipping approach,
please add enough explain how the deadlock happens.(But not sure
it is worth that we should keep the problem of deadlock issue of
flipping approach in changelog)
Otherwise, it's enough with first paragraph because this bug is
very simple and plain. I prefer latter because I want that
other developers don't waste their time to understand a deadlock issue
of flipping approach)
>
> So fix it by holding an extra reference to zram->disk before calling
> destroy_device(zram).
>
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: [email protected]
Acked-by: Minchan Kim <[email protected]>
But please rewrite the description.
--
Kind regards,
Minchan Kim
On Fri, Jun 07, 2013 at 12:07:23AM +0800, Jiang Liu wrote:
> zram_slot_free_notify() is free-running without any protection from
> concurrent operations. So there are race conditions between
> zram_bvec_read()/zram_bvec_write() and zram_slot_free_notify(),
> and possible consequences include:
> 1) Trigger BUG_ON(!handle) on zram_bvec_write() side.
> 2) Access to freed pages on zram_bvec_read() side.
> 3) Break some fields (bad_compress, good_compress, pages_stored)
> in zram->stats if the swap layer makes concurrently call to
> zram_slot_free_notify().
>
> So enhance zram_slot_free_notify() to acquire writer lock on zram->lock
> before calling zram_free_page().
>
If someone try to read/write *active* swap device via opening
block device file(it's not sane but we couldn't prevent it),
the race between zram_slot_free_notify and zram_bvec_[read|write] can happen.
In such case, following problem for example can happen.
1. xxx
2. xxx
3. xxx
So this patch closes the race with zram->lock write-side lock.
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: [email protected]
Acked-by: Minchan Kim <[email protected]>
But please rewrite the description.
--
Kind regards,
Minchan Kim
On Fri, Jun 07, 2013 at 12:07:25AM +0800, Jiang Liu wrote:
> When doing a patial write and the whole page is filled with zero,
^
partial
> zram_bvec_write() will free uncmem twice.
>
> Signed-off-by: Jiang Liu <[email protected]>
> Acked-by: Minchan Kim <[email protected]>
> Cc: [email protected]
> ---
> drivers/staging/zram/zram_drv.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 2ca6dc9..27ab824 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -272,8 +272,6 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>
> if (page_zero_filled(uncmem)) {
> kunmap_atomic(user_mem);
> - if (is_partial_io(bvec))
> - kfree(uncmem);
> zram->stats.pages_zero++;
> zram_set_flag(meta, index, ZRAM_ZERO);
> ret = 0;
> --
> 1.8.1.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Kind regards,
Minchan Kim
On Fri, Jun 07, 2013 at 12:07:26AM +0800, Jiang Liu wrote:
> Function valid_io_request() should verify the entire request are within
> the zram device address range. Otherwise it may cause invalid memory
> access when accessing/modifying zram->meta->table[index] because the
> 'index' is out of range. Then it may access non-exist memory, randomly
> modify memory belong to other subsystems, which is hard to track down.
>
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: [email protected]
> ---
> drivers/staging/zram/zram_drv.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 27ab824..9289217 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -420,13 +420,20 @@ out:
> */
> static inline int valid_io_request(struct zram *zram, struct bio *bio)
> {
> - if (unlikely(
> - (bio->bi_sector >= (zram->disksize >> SECTOR_SHIFT)) ||
> - (bio->bi_sector & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)) ||
> - (bio->bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))) {
> + u64 start, end, bound;
> +
> + /* unaligned request */
> + if (unlikely(bio->bi_sector & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
> + return 0;
> + if (unlikely(bio->bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
> + return 0;
>
> + start = bio->bi_sector;
> + end = start + (bio->bi_size >> SECTOR_SHIFT);
> + bound = zram->disksize >> SECTOR_SHIFT;
> + /* out of range range */
> + if (unlikely(start >= bound || end >= bound || start > end))
if (end >= bound || start > end) isn't enough?
> return 0;
> - }
>
> /* I/O request is valid */
> return 1;
> --
> 1.8.1.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Kind regards,
Minchan Kim
On 06/06/2013 06:07 PM, Jiang Liu wrote:
> Memory for zram->disk object may have already been freed after returning
> from destroy_device(zram), then it's unsafe for zram_reset_device(zram)
> to access zram->disk again.
>
> We can't solve this bug by flipping the order of destroy_device(zram)
> and zram_reset_device(zram), that will cause deadlock issues to the
> zram sysfs handler.
>
> So fix it by holding an extra reference to zram->disk before calling
> destroy_device(zram).
>
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: [email protected]
Acked-by: Jerome Marchand <[email protected]>
On 06/06/2013 06:07 PM, Jiang Liu wrote:
> zram_slot_free_notify() is free-running without any protection from
> concurrent operations. So there are race conditions between
> zram_bvec_read()/zram_bvec_write() and zram_slot_free_notify(),
> and possible consequences include:
> 1) Trigger BUG_ON(!handle) on zram_bvec_write() side.
> 2) Access to freed pages on zram_bvec_read() side.
> 3) Break some fields (bad_compress, good_compress, pages_stored)
> in zram->stats if the swap layer makes concurrently call to
> zram_slot_free_notify().
>
> So enhance zram_slot_free_notify() to acquire writer lock on zram->lock
> before calling zram_free_page().
>
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: [email protected]
Acked-by: Jerome Marchand <[email protected]>
On 06/07/2013 10:09 AM, Minchan Kim wrote:
> On Fri, Jun 07, 2013 at 12:07:26AM +0800, Jiang Liu wrote:
>> Function valid_io_request() should verify the entire request are within
>> the zram device address range. Otherwise it may cause invalid memory
>> access when accessing/modifying zram->meta->table[index] because the
>> 'index' is out of range. Then it may access non-exist memory, randomly
>> modify memory belong to other subsystems, which is hard to track down.
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> Cc: [email protected]
>> ---
>> drivers/staging/zram/zram_drv.c | 17 ++++++++++++-----
>> 1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
>> index 27ab824..9289217 100644
>> --- a/drivers/staging/zram/zram_drv.c
>> +++ b/drivers/staging/zram/zram_drv.c
>> @@ -420,13 +420,20 @@ out:
>> */
>> static inline int valid_io_request(struct zram *zram, struct bio *bio)
>> {
>> - if (unlikely(
>> - (bio->bi_sector >= (zram->disksize >> SECTOR_SHIFT)) ||
>> - (bio->bi_sector & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)) ||
>> - (bio->bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))) {
>> + u64 start, end, bound;
>> +
>> + /* unaligned request */
>> + if (unlikely(bio->bi_sector & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
>> + return 0;
>> + if (unlikely(bio->bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
>> + return 0;
>>
>> + start = bio->bi_sector;
>> + end = start + (bio->bi_size >> SECTOR_SHIFT);
>> + bound = zram->disksize >> SECTOR_SHIFT;
>> + /* out of range range */
>> + if (unlikely(start >= bound || end >= bound || start > end))
>
> if (end >= bound || start > end) isn't enough?
I shall think so.
Jerome
>
>> return 0;
>> - }
>>
>> /* I/O request is valid */
>> return 1;
>> --
>> 1.8.1.2
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
On 06/06/2013 06:07 PM, Jiang Liu wrote:
> Use zram->init_lock to protect access to zram->meta, otherwise it
> may cause invalid memory access if zram->meta has been freed by
> zram_reset_device().
>
> This issue may be triggered by:
> Thread 1:
> while true; do cat mem_used_total; done
> Thread 2:
> while true; do echo 8M > disksize; echo 1 > reset; done
>
> Signed-off-by: Jiang Liu <[email protected]>
> Acked-by: Minchan Kim <[email protected]>
> Cc: [email protected]
Acked-by: Jerome Marchand <[email protected]>
On 06/07/2013 05:40 PM, Jerome Marchand wrote:
> On 06/07/2013 10:09 AM, Minchan Kim wrote:
>> On Fri, Jun 07, 2013 at 12:07:26AM +0800, Jiang Liu wrote:
>>> Function valid_io_request() should verify the entire request are within
...
>>>
>>> + start = bio->bi_sector;
>>> + end = start + (bio->bi_size >> SECTOR_SHIFT);
>>> + bound = zram->disksize >> SECTOR_SHIFT;
>>> + /* out of range range */
>>> + if (unlikely(start >= bound || end >= bound || start > end))
>>
>> if (end >= bound || start > end) isn't enough?
>
> I shall think so.
>
> Jerome
Me too! But I realized this just after sending out the patchset
last night.
>
>>
>>> return 0;
>>> - }
>>>
>>> /* I/O request is valid */
>>> return 1;
>>> --
>>> 1.8.1.2
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>
>