2012-08-08 06:10:53

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 0/7] zram/zsmalloc promotion

This patchset promotes zram/zsmalloc from staging.
Both are very clean and zram is used by many embedded product
for a long time.

[1-3] are patches not merged into linux-next yet but needed
it as base for [4-5] which promotes zsmalloc.
Greg, if you merged [1-3] already, skip them.

Seth Jennings (5):
1. zsmalloc: s/firstpage/page in new copy map funcs
2. zsmalloc: prevent mappping in interrupt context
3. zsmalloc: add page table mapping method
4. zsmalloc: collapse internal .h into .c
5. zsmalloc: promote to mm/

Minchan Kim (2):
6. zram: promote zram from staging
7. zram: select ZSMALLOC when ZRAM is configured

drivers/block/Kconfig | 1 +
drivers/block/Makefile | 1 +
drivers/{staging => block}/zram/Kconfig | 3 +-
drivers/{staging => block}/zram/Makefile | 0
drivers/{staging => block}/zram/zram.txt | 0
drivers/{staging => block}/zram/zram_drv.c | 0
drivers/{staging => block}/zram/zram_drv.h | 3 +-
drivers/{staging => block}/zram/zram_sysfs.c | 0
drivers/staging/Kconfig | 4 -
drivers/staging/Makefile | 2 -
drivers/staging/zcache/zcache-main.c | 4 +-
drivers/staging/zsmalloc/Kconfig | 10 -
drivers/staging/zsmalloc/Makefile | 3 -
drivers/staging/zsmalloc/zsmalloc_int.h | 155 ----------
.../staging/zsmalloc => include/linux}/zsmalloc.h | 0
mm/Kconfig | 18 ++
mm/Makefile | 1 +
.../zsmalloc/zsmalloc-main.c => mm/zsmalloc.c | 323 +++++++++++++++++---
18 files changed, 299 insertions(+), 229 deletions(-)
rename drivers/{staging => block}/zram/Kconfig (94%)
rename drivers/{staging => block}/zram/Makefile (100%)
rename drivers/{staging => block}/zram/zram.txt (100%)
rename drivers/{staging => block}/zram/zram_drv.c (100%)
rename drivers/{staging => block}/zram/zram_drv.h (98%)
rename drivers/{staging => block}/zram/zram_sysfs.c (100%)
delete mode 100644 drivers/staging/zsmalloc/Kconfig
delete mode 100644 drivers/staging/zsmalloc/Makefile
delete mode 100644 drivers/staging/zsmalloc/zsmalloc_int.h
rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
rename drivers/staging/zsmalloc/zsmalloc-main.c => mm/zsmalloc.c (73%)

--
1.7.9.5


2012-08-08 06:10:54

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 1/7] zsmalloc: s/firstpage/page in new copy map funcs

From: Seth Jennings <[email protected]>

firstpage already has precedent and meaning the first page
of a zspage. In the case of the copy mapping functions,
it is the first of a pair of pages needing to be mapped.

This patch just renames the firstpage argument to "page" to
avoid confusion.

Signed-off-by: Seth Jennings <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/staging/zsmalloc/zsmalloc-main.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index 8b0bcb6..3c83c65 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -470,15 +470,15 @@ static struct page *find_get_zspage(struct size_class *class)
return page;
}

-static void zs_copy_map_object(char *buf, struct page *firstpage,
+static void zs_copy_map_object(char *buf, struct page *page,
int off, int size)
{
struct page *pages[2];
int sizes[2];
void *addr;

- pages[0] = firstpage;
- pages[1] = get_next_page(firstpage);
+ pages[0] = page;
+ pages[1] = get_next_page(page);
BUG_ON(!pages[1]);

sizes[0] = PAGE_SIZE - off;
@@ -493,15 +493,15 @@ static void zs_copy_map_object(char *buf, struct page *firstpage,
kunmap_atomic(addr);
}

-static void zs_copy_unmap_object(char *buf, struct page *firstpage,
+static void zs_copy_unmap_object(char *buf, struct page *page,
int off, int size)
{
struct page *pages[2];
int sizes[2];
void *addr;

- pages[0] = firstpage;
- pages[1] = get_next_page(firstpage);
+ pages[0] = page;
+ pages[1] = get_next_page(page);
BUG_ON(!pages[1]);

sizes[0] = PAGE_SIZE - off;
--
1.7.9.5

2012-08-08 06:10:58

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 7/7] zram: select ZSMALLOC when ZRAM is configured

At the monent, we can configure zram in driver/block once zsmalloc
in mm menu is configured firstly. It's not convenient.

User can configure zram in driver/block regardless of zsmalloc enabling
by this patch.

Signed-off-by: Minchan Kim <[email protected]>
---
drivers/block/zram/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index be5abe8..ee23a86 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -1,6 +1,7 @@
config ZRAM
tristate "Compressed RAM block device support"
- depends on BLOCK && SYSFS && ZSMALLOC
+ depends on BLOCK && SYSFS
+ select ZSMALLOC
select LZO_COMPRESS
select LZO_DECOMPRESS
default n
--
1.7.9.5

2012-08-08 06:10:59

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 6/7] zram: promote zram from staging

It's time to promote zram from staging because zram is in staging
for a long time and is improved by many contributors so code is
very clean. Most important issue, zram's dependency with x86 is
solved by making zsmalloc portable. In addition, many embedded
product uses zram in real practive so I think there is no reason
to prevent promotion now.

Cc: Seth Jennings <[email protected]>
Cc: Nitin Gupta <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/block/Kconfig | 1 +
drivers/block/Makefile | 1 +
drivers/{staging => block}/zram/Kconfig | 0
drivers/{staging => block}/zram/Makefile | 0
drivers/{staging => block}/zram/zram.txt | 0
drivers/{staging => block}/zram/zram_drv.c | 0
drivers/{staging => block}/zram/zram_drv.h | 0
drivers/{staging => block}/zram/zram_sysfs.c | 0
drivers/staging/Kconfig | 2 --
drivers/staging/Makefile | 1 -
10 files changed, 2 insertions(+), 3 deletions(-)
rename drivers/{staging => block}/zram/Kconfig (100%)
rename drivers/{staging => block}/zram/Makefile (100%)
rename drivers/{staging => block}/zram/zram.txt (100%)
rename drivers/{staging => block}/zram/zram_drv.c (100%)
rename drivers/{staging => block}/zram/zram_drv.h (100%)
rename drivers/{staging => block}/zram/zram_sysfs.c (100%)

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index a796407..4277454 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -289,6 +289,7 @@ config BLK_DEV_CRYPTOLOOP
cryptoloop device.

source "drivers/block/drbd/Kconfig"
+source "drivers/block/zram/Kconfig"

config BLK_DEV_NBD
tristate "Network block device support"
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 5b79505..951ba69 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_BLK_DEV_UMEM) += umem.o
obj-$(CONFIG_BLK_DEV_NBD) += nbd.o
obj-$(CONFIG_BLK_DEV_CRYPTOLOOP) += cryptoloop.o
obj-$(CONFIG_VIRTIO_BLK) += virtio_blk.o
+obj-$(CONFIG_ZRAM) += zram/

obj-$(CONFIG_VIODASD) += viodasd.o
obj-$(CONFIG_BLK_DEV_SX8) += sx8.o
diff --git a/drivers/staging/zram/Kconfig b/drivers/block/zram/Kconfig
similarity index 100%
rename from drivers/staging/zram/Kconfig
rename to drivers/block/zram/Kconfig
diff --git a/drivers/staging/zram/Makefile b/drivers/block/zram/Makefile
similarity index 100%
rename from drivers/staging/zram/Makefile
rename to drivers/block/zram/Makefile
diff --git a/drivers/staging/zram/zram.txt b/drivers/block/zram/zram.txt
similarity index 100%
rename from drivers/staging/zram/zram.txt
rename to drivers/block/zram/zram.txt
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
similarity index 100%
rename from drivers/staging/zram/zram_drv.c
rename to drivers/block/zram/zram_drv.c
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
similarity index 100%
rename from drivers/staging/zram/zram_drv.h
rename to drivers/block/zram/zram_drv.h
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/block/zram/zram_sysfs.c
similarity index 100%
rename from drivers/staging/zram/zram_sysfs.c
rename to drivers/block/zram/zram_sysfs.c
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index b7f7bc7..1a628eb 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -74,8 +74,6 @@ source "drivers/staging/sep/Kconfig"

source "drivers/staging/iio/Kconfig"

-source "drivers/staging/zram/Kconfig"
-
source "drivers/staging/zcache/Kconfig"

source "drivers/staging/wlags49_h2/Kconfig"
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index ad74bee..ed8889f 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -32,7 +32,6 @@ obj-$(CONFIG_VME_BUS) += vme/
obj-$(CONFIG_IPACK_BUS) += ipack/
obj-$(CONFIG_DX_SEP) += sep/
obj-$(CONFIG_IIO) += iio/
-obj-$(CONFIG_ZRAM) += zram/
obj-$(CONFIG_ZCACHE) += zcache/
obj-$(CONFIG_WLAGS49_H2) += wlags49_h2/
obj-$(CONFIG_WLAGS49_H25) += wlags49_h25/
--
1.7.9.5

2012-08-08 06:11:01

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 5/7] zsmalloc: promote to mm/

From: Seth Jennings <[email protected]>

This patch promotes the slab-based zsmalloc memory allocator
from the staging tree to mm/

zcache/zram depends on this allocator for storing compressed RAM pages
in an efficient way under system wide memory pressure where
high-order (greater than 0) page allocation are very likely to
fail.

For more information on zsmalloc and its internals, read the
documentation at the top of the zsmalloc c file.

[minchan: change description slighly for including zram]

Signed-off-by: Seth Jennings <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/staging/Kconfig | 2 --
drivers/staging/Makefile | 1 -
drivers/staging/zcache/zcache-main.c | 4 ++--
drivers/staging/zram/zram_drv.h | 3 +--
drivers/staging/zsmalloc/Kconfig | 10 ----------
drivers/staging/zsmalloc/Makefile | 3 ---
.../staging/zsmalloc => include/linux}/zsmalloc.h | 0
mm/Kconfig | 18 ++++++++++++++++++
mm/Makefile | 1 +
.../zsmalloc/zsmalloc-main.c => mm/zsmalloc.c | 3 +--
10 files changed, 23 insertions(+), 22 deletions(-)
delete mode 100644 drivers/staging/zsmalloc/Kconfig
delete mode 100644 drivers/staging/zsmalloc/Makefile
rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
rename drivers/staging/zsmalloc/zsmalloc-main.c => mm/zsmalloc.c (99%)

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index e3402d5..b7f7bc7 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -78,8 +78,6 @@ source "drivers/staging/zram/Kconfig"

source "drivers/staging/zcache/Kconfig"

-source "drivers/staging/zsmalloc/Kconfig"
-
source "drivers/staging/wlags49_h2/Kconfig"

source "drivers/staging/wlags49_h25/Kconfig"
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index 3be59d0..ad74bee 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -34,7 +34,6 @@ obj-$(CONFIG_DX_SEP) += sep/
obj-$(CONFIG_IIO) += iio/
obj-$(CONFIG_ZRAM) += zram/
obj-$(CONFIG_ZCACHE) += zcache/
-obj-$(CONFIG_ZSMALLOC) += zsmalloc/
obj-$(CONFIG_WLAGS49_H2) += wlags49_h2/
obj-$(CONFIG_WLAGS49_H25) += wlags49_h25/
obj-$(CONFIG_FB_SM7XX) += sm7xxfb/
diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index c214977..06ce28f 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -32,9 +32,9 @@
#include <linux/crypto.h>
#include <linux/string.h>
#include <linux/idr.h>
-#include "tmem.h"
+#include <linux/zsmalloc.h>

-#include "../zsmalloc/zsmalloc.h"
+#include "tmem.h"

#ifdef CONFIG_CLEANCACHE
#include <linux/cleancache.h>
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 572c0b1..f6d0925 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -17,8 +17,7 @@

#include <linux/spinlock.h>
#include <linux/mutex.h>
-
-#include "../zsmalloc/zsmalloc.h"
+#include <linux/zsmalloc.h>

/*
* Some arbitrary value. This is just to catch
diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
deleted file mode 100644
index 9084565..0000000
--- a/drivers/staging/zsmalloc/Kconfig
+++ /dev/null
@@ -1,10 +0,0 @@
-config ZSMALLOC
- tristate "Memory allocator for compressed pages"
- default n
- help
- zsmalloc is a slab-based memory allocator designed to store
- compressed RAM pages. zsmalloc uses virtual memory mapping
- in order to reduce fragmentation. However, this results in a
- non-standard allocator interface where a handle, not a pointer, is
- returned by an alloc(). This handle must be mapped in order to
- access the allocated space.
diff --git a/drivers/staging/zsmalloc/Makefile b/drivers/staging/zsmalloc/Makefile
deleted file mode 100644
index b134848..0000000
--- a/drivers/staging/zsmalloc/Makefile
+++ /dev/null
@@ -1,3 +0,0 @@
-zsmalloc-y := zsmalloc-main.o
-
-obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
diff --git a/drivers/staging/zsmalloc/zsmalloc.h b/include/linux/zsmalloc.h
similarity index 100%
rename from drivers/staging/zsmalloc/zsmalloc.h
rename to include/linux/zsmalloc.h
diff --git a/mm/Kconfig b/mm/Kconfig
index d5c8019..2586b66 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -411,3 +411,21 @@ config FRONTSWAP
and swap data is stored as normal on the matching swap device.

If unsure, say Y to enable frontswap.
+
+config ZSMALLOC
+ tristate "Memory allocator for compressed pages"
+ default n
+ help
+ zsmalloc is a slab-based memory allocator designed to store
+ compressed RAM pages. zsmalloc uses a memory pool that combines
+ single pages into higher order pages by linking them together
+ using the fields of the struct page. Allocations are then
+ mapped through copy buffers or VM mapping, in order to reduce
+ memory pool fragmentation and increase allocation success rate under
+ memory pressure.
+
+ This results in a non-standard allocator interface where
+ a handle, not a pointer, is returned by the allocation function.
+ This handle must be mapped in order to access the allocated space.
+
+ If unsure, say N.
diff --git a/mm/Makefile b/mm/Makefile
index 92753e2..8a3d7bea 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -57,3 +57,4 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
+obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/mm/zsmalloc.c
similarity index 99%
rename from drivers/staging/zsmalloc/zsmalloc-main.c
rename to mm/zsmalloc.c
index 09a9d35..6b20429 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/mm/zsmalloc.c
@@ -78,8 +78,7 @@
#include <linux/hardirq.h>
#include <linux/spinlock.h>
#include <linux/types.h>
-
-#include "zsmalloc.h"
+#include <linux/zsmalloc.h>

/*
* This must be power of 2 and greater than of equal to sizeof(link_free).
--
1.7.9.5

2012-08-08 06:12:08

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 4/7] zsmalloc: collapse internal .h into .c

From: Seth Jennings <[email protected]>

The patch collapses in the internal zsmalloc_int.h into
the zsmalloc-main.c file.

This is done in preparation for the promotion to mm/ where
separate internal headers are discouraged.

Signed-off-by: Seth Jennings <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/staging/zsmalloc/zsmalloc-main.c | 132 +++++++++++++++++++++++++-
drivers/staging/zsmalloc/zsmalloc_int.h | 149 ------------------------------
2 files changed, 131 insertions(+), 150 deletions(-)
delete mode 100644 drivers/staging/zsmalloc/zsmalloc_int.h

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index defe350..09a9d35 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -76,9 +76,139 @@
#include <linux/cpu.h>
#include <linux/vmalloc.h>
#include <linux/hardirq.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>

#include "zsmalloc.h"
-#include "zsmalloc_int.h"
+
+/*
+ * This must be power of 2 and greater than of equal to sizeof(link_free).
+ * These two conditions ensure that any 'struct link_free' itself doesn't
+ * span more than 1 page which avoids complex case of mapping 2 pages simply
+ * to restore link_free pointer values.
+ */
+#define ZS_ALIGN 8
+
+/*
+ * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
+ * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
+ */
+#define ZS_MAX_ZSPAGE_ORDER 2
+#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
+
+/*
+ * Object location (<PFN>, <obj_idx>) is encoded as
+ * as single (void *) handle value.
+ *
+ * Note that object index <obj_idx> is relative to system
+ * page <PFN> it is stored in, so for each sub-page belonging
+ * to a zspage, obj_idx starts with 0.
+ *
+ * This is made more complicated by various memory models and PAE.
+ */
+
+#ifndef MAX_PHYSMEM_BITS
+#ifdef CONFIG_HIGHMEM64G
+#define MAX_PHYSMEM_BITS 36
+#else /* !CONFIG_HIGHMEM64G */
+/*
+ * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
+ * be PAGE_SHIFT
+ */
+#define MAX_PHYSMEM_BITS BITS_PER_LONG
+#endif
+#endif
+#define _PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT)
+#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS)
+#define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
+
+#define MAX(a, b) ((a) >= (b) ? (a) : (b))
+/* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
+#define ZS_MIN_ALLOC_SIZE \
+ MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+#define ZS_MAX_ALLOC_SIZE PAGE_SIZE
+
+/*
+ * On systems with 4K page size, this gives 254 size classes! There is a
+ * trader-off here:
+ * - Large number of size classes is potentially wasteful as free page are
+ * spread across these classes
+ * - Small number of size classes causes large internal fragmentation
+ * - Probably its better to use specific size classes (empirically
+ * determined). NOTE: all those class sizes must be set as multiple of
+ * ZS_ALIGN to make sure link_free itself never has to span 2 pages.
+ *
+ * ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
+ * (reason above)
+ */
+#define ZS_SIZE_CLASS_DELTA 16
+#define ZS_SIZE_CLASSES ((ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) / \
+ ZS_SIZE_CLASS_DELTA + 1)
+
+/*
+ * We do not maintain any list for completely empty or full pages
+ */
+enum fullness_group {
+ ZS_ALMOST_FULL,
+ ZS_ALMOST_EMPTY,
+ _ZS_NR_FULLNESS_GROUPS,
+
+ ZS_EMPTY,
+ ZS_FULL
+};
+
+/*
+ * We assign a page to ZS_ALMOST_EMPTY fullness group when:
+ * n <= N / f, where
+ * n = number of allocated objects
+ * N = total number of objects zspage can store
+ * f = 1/fullness_threshold_frac
+ *
+ * Similarly, we assign zspage to:
+ * ZS_ALMOST_FULL when n > N / f
+ * ZS_EMPTY when n == 0
+ * ZS_FULL when n == N
+ *
+ * (see: fix_fullness_group())
+ */
+static const int fullness_threshold_frac = 4;
+
+struct size_class {
+ /*
+ * Size of objects stored in this class. Must be multiple
+ * of ZS_ALIGN.
+ */
+ int size;
+ unsigned int index;
+
+ /* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */
+ int pages_per_zspage;
+
+ spinlock_t lock;
+
+ /* stats */
+ u64 pages_allocated;
+
+ struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
+};
+
+/*
+ * Placed within free objects to form a singly linked list.
+ * For every zspage, first_page->freelist gives head of this list.
+ *
+ * This must be power of 2 and less than or equal to ZS_ALIGN
+ */
+struct link_free {
+ /* Handle of next free chunk (encodes <PFN, obj_idx>) */
+ void *next;
+};
+
+struct zs_pool {
+ struct size_class size_class[ZS_SIZE_CLASSES];
+
+ gfp_t flags; /* allocation flags used when growing pool */
+ const char *name;
+};

/*
* A zspage's class index and fullness group
diff --git a/drivers/staging/zsmalloc/zsmalloc_int.h b/drivers/staging/zsmalloc/zsmalloc_int.h
deleted file mode 100644
index 8c0b344..0000000
--- a/drivers/staging/zsmalloc/zsmalloc_int.h
+++ /dev/null
@@ -1,149 +0,0 @@
-/*
- * zsmalloc memory allocator
- *
- * Copyright (C) 2011 Nitin Gupta
- *
- * This code is released using a dual license strategy: BSD/GPL
- * You can choose the license that better fits your requirements.
- *
- * Released under the terms of 3-clause BSD License
- * Released under the terms of GNU General Public License Version 2.0
- */
-
-#ifndef _ZS_MALLOC_INT_H_
-#define _ZS_MALLOC_INT_H_
-
-#include <linux/kernel.h>
-#include <linux/spinlock.h>
-#include <linux/types.h>
-
-/*
- * This must be power of 2 and greater than of equal to sizeof(link_free).
- * These two conditions ensure that any 'struct link_free' itself doesn't
- * span more than 1 page which avoids complex case of mapping 2 pages simply
- * to restore link_free pointer values.
- */
-#define ZS_ALIGN 8
-
-/*
- * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
- * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
- */
-#define ZS_MAX_ZSPAGE_ORDER 2
-#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
-
-/*
- * Object location (<PFN>, <obj_idx>) is encoded as
- * as single (void *) handle value.
- *
- * Note that object index <obj_idx> is relative to system
- * page <PFN> it is stored in, so for each sub-page belonging
- * to a zspage, obj_idx starts with 0.
- *
- * This is made more complicated by various memory models and PAE.
- */
-
-#ifndef MAX_PHYSMEM_BITS
-#ifdef CONFIG_HIGHMEM64G
-#define MAX_PHYSMEM_BITS 36
-#else /* !CONFIG_HIGHMEM64G */
-/*
- * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
- * be PAGE_SHIFT
- */
-#define MAX_PHYSMEM_BITS BITS_PER_LONG
-#endif
-#endif
-#define _PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT)
-#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS)
-#define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
-
-#define MAX(a, b) ((a) >= (b) ? (a) : (b))
-/* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
-#define ZS_MIN_ALLOC_SIZE \
- MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
-#define ZS_MAX_ALLOC_SIZE PAGE_SIZE
-
-/*
- * On systems with 4K page size, this gives 254 size classes! There is a
- * trader-off here:
- * - Large number of size classes is potentially wasteful as free page are
- * spread across these classes
- * - Small number of size classes causes large internal fragmentation
- * - Probably its better to use specific size classes (empirically
- * determined). NOTE: all those class sizes must be set as multiple of
- * ZS_ALIGN to make sure link_free itself never has to span 2 pages.
- *
- * ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
- * (reason above)
- */
-#define ZS_SIZE_CLASS_DELTA 16
-#define ZS_SIZE_CLASSES ((ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) / \
- ZS_SIZE_CLASS_DELTA + 1)
-
-/*
- * We do not maintain any list for completely empty or full pages
- */
-enum fullness_group {
- ZS_ALMOST_FULL,
- ZS_ALMOST_EMPTY,
- _ZS_NR_FULLNESS_GROUPS,
-
- ZS_EMPTY,
- ZS_FULL
-};
-
-/*
- * We assign a page to ZS_ALMOST_EMPTY fullness group when:
- * n <= N / f, where
- * n = number of allocated objects
- * N = total number of objects zspage can store
- * f = 1/fullness_threshold_frac
- *
- * Similarly, we assign zspage to:
- * ZS_ALMOST_FULL when n > N / f
- * ZS_EMPTY when n == 0
- * ZS_FULL when n == N
- *
- * (see: fix_fullness_group())
- */
-static const int fullness_threshold_frac = 4;
-
-struct size_class {
- /*
- * Size of objects stored in this class. Must be multiple
- * of ZS_ALIGN.
- */
- int size;
- unsigned int index;
-
- /* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */
- int pages_per_zspage;
-
- spinlock_t lock;
-
- /* stats */
- u64 pages_allocated;
-
- struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
-};
-
-/*
- * Placed within free objects to form a singly linked list.
- * For every zspage, first_page->freelist gives head of this list.
- *
- * This must be power of 2 and less than or equal to ZS_ALIGN
- */
-struct link_free {
- /* Handle of next free chunk (encodes <PFN, obj_idx>) */
- void *next;
-};
-
-struct zs_pool {
- struct size_class size_class[ZS_SIZE_CLASSES];
-
- gfp_t flags; /* allocation flags used when growing pool */
- const char *name;
-};
-
-#endif
--
1.7.9.5

2012-08-08 06:12:31

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 3/7] zsmalloc: add page table mapping method

From: Seth Jennings <[email protected]>

This patchset provides page mapping via the page table.
On some archs, most notably ARM, this method has been
demonstrated to be faster than copying.

The logic controlling the method selection (copy vs page table)
is controlled by the definition of USE_PGTABLE_MAPPING which
is/can be defined for any arch that performs better with page
table mapping.

Signed-off-by: Seth Jennings <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/staging/zsmalloc/zsmalloc-main.c | 182 ++++++++++++++++++++++--------
drivers/staging/zsmalloc/zsmalloc_int.h | 6 -
2 files changed, 134 insertions(+), 54 deletions(-)

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index b86133f..defe350 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -89,6 +89,30 @@
#define CLASS_IDX_MASK ((1 << CLASS_IDX_BITS) - 1)
#define FULLNESS_MASK ((1 << FULLNESS_BITS) - 1)

+/*
+ * By default, zsmalloc uses a copy-based object mapping method to access
+ * allocations that span two pages. However, if a particular architecture
+ * 1) Implements local_flush_tlb_kernel_range() and 2) Performs VM mapping
+ * faster than copying, then it should be added here so that
+ * USE_PGTABLE_MAPPING is defined. This causes zsmalloc to use page table
+ * mapping rather than copying
+ * for object mapping.
+*/
+#if defined(CONFIG_ARM)
+#define USE_PGTABLE_MAPPING
+#endif
+
+struct mapping_area {
+#ifdef USE_PGTABLE_MAPPING
+ struct vm_struct *vm; /* vm area for mapping object that span pages */
+#else
+ char *vm_buf; /* copy buffer for objects that span pages */
+#endif
+ char *vm_addr; /* address of kmap_atomic()'ed pages */
+ enum zs_mapmode vm_mm; /* mapping mode */
+};
+
+
/* per-cpu VM mapping areas for zspage accesses that cross page boundaries */
static DEFINE_PER_CPU(struct mapping_area, zs_map_area);

@@ -471,16 +495,83 @@ static struct page *find_get_zspage(struct size_class *class)
return page;
}

-static void zs_copy_map_object(char *buf, struct page *page,
- int off, int size)
+#ifdef USE_PGTABLE_MAPPING
+static inline int __zs_cpu_up(struct mapping_area *area)
+{
+ /*
+ * Make sure we don't leak memory if a cpu UP notification
+ * and zs_init() race and both call zs_cpu_up() on the same cpu
+ */
+ if (area->vm)
+ return 0;
+ area->vm = alloc_vm_area(PAGE_SIZE * 2, NULL);
+ if (!area->vm)
+ return -ENOMEM;
+ return 0;
+}
+
+static inline void __zs_cpu_down(struct mapping_area *area)
+{
+ if (area->vm)
+ free_vm_area(area->vm);
+ area->vm = NULL;
+}
+
+static inline void *__zs_map_object(struct mapping_area *area,
+ struct page *pages[2], int off, int size)
+{
+ BUG_ON(map_vm_area(area->vm, PAGE_KERNEL, &pages));
+ area->vm_addr = area->vm->addr;
+ return area->vm_addr + off;
+}
+
+static inline void __zs_unmap_object(struct mapping_area *area,
+ struct page *pages[2], int off, int size)
+{
+ unsigned long addr = (unsigned long)area->vm_addr;
+ unsigned long end = addr + (PAGE_SIZE * 2);
+
+ flush_cache_vunmap(addr, end);
+ unmap_kernel_range_noflush(addr, PAGE_SIZE * 2);
+ local_flush_tlb_kernel_range(addr, end);
+}
+
+#else /* USE_PGTABLE_MAPPING */
+
+static inline int __zs_cpu_up(struct mapping_area *area)
+{
+ /*
+ * Make sure we don't leak memory if a cpu UP notification
+ * and zs_init() race and both call zs_cpu_up() on the same cpu
+ */
+ if (area->vm_buf)
+ return 0;
+ area->vm_buf = (char *)__get_free_page(GFP_KERNEL);
+ if (!area->vm_buf)
+ return -ENOMEM;
+ return 0;
+}
+
+static inline void __zs_cpu_down(struct mapping_area *area)
+{
+ if (area->vm_buf)
+ free_page((unsigned long)area->vm_buf);
+ area->vm_buf = NULL;
+}
+
+static void *__zs_map_object(struct mapping_area *area,
+ struct page *pages[2], int off, int size)
{
- struct page *pages[2];
int sizes[2];
void *addr;
+ char *buf = area->vm_buf;

- pages[0] = page;
- pages[1] = get_next_page(page);
- BUG_ON(!pages[1]);
+ /* disable page faults to match kmap_atomic() return conditions */
+ pagefault_disable();
+
+ /* no read fastpath */
+ if (area->vm_mm == ZS_MM_WO)
+ goto out;

sizes[0] = PAGE_SIZE - off;
sizes[1] = size - sizes[0];
@@ -492,18 +583,20 @@ static void zs_copy_map_object(char *buf, struct page *page,
addr = kmap_atomic(pages[1]);
memcpy(buf + sizes[0], addr, sizes[1]);
kunmap_atomic(addr);
+out:
+ return area->vm_buf;
}

-static void zs_copy_unmap_object(char *buf, struct page *page,
- int off, int size)
+static void __zs_unmap_object(struct mapping_area *area,
+ struct page *pages[2], int off, int size)
{
- struct page *pages[2];
int sizes[2];
void *addr;
+ char *buf = area->vm_buf;

- pages[0] = page;
- pages[1] = get_next_page(page);
- BUG_ON(!pages[1]);
+ /* no write fastpath */
+ if (area->vm_mm == ZS_MM_RO)
+ goto out;

sizes[0] = PAGE_SIZE - off;
sizes[1] = size - sizes[0];
@@ -515,34 +608,31 @@ static void zs_copy_unmap_object(char *buf, struct page *page,
addr = kmap_atomic(pages[1]);
memcpy(addr, buf + sizes[0], sizes[1]);
kunmap_atomic(addr);
+
+out:
+ /* enable page faults to match kunmap_atomic() return conditions */
+ pagefault_enable();
}

+#endif /* USE_PGTABLE_MAPPING */
+
static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
void *pcpu)
{
- int cpu = (long)pcpu;
+ int ret, cpu = (long)pcpu;
struct mapping_area *area;

switch (action) {
case CPU_UP_PREPARE:
area = &per_cpu(zs_map_area, cpu);
- /*
- * Make sure we don't leak memory if a cpu UP notification
- * and zs_init() race and both call zs_cpu_up() on the same cpu
- */
- if (area->vm_buf)
- return 0;
- area->vm_buf = (char *)__get_free_page(GFP_KERNEL);
- if (!area->vm_buf)
- return -ENOMEM;
- return 0;
+ ret = __zs_cpu_up(area);
+ if (ret)
+ return notifier_from_errno(ret);
break;
case CPU_DEAD:
case CPU_UP_CANCELED:
area = &per_cpu(zs_map_area, cpu);
- if (area->vm_buf)
- free_page((unsigned long)area->vm_buf);
- area->vm_buf = NULL;
+ __zs_cpu_down(area);
break;
}

@@ -759,6 +849,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
enum fullness_group fg;
struct size_class *class;
struct mapping_area *area;
+ struct page *pages[2];

BUG_ON(!handle);

@@ -775,19 +866,19 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
off = obj_idx_to_offset(page, obj_idx, class->size);

area = &get_cpu_var(zs_map_area);
+ area->vm_mm = mm;
if (off + class->size <= PAGE_SIZE) {
/* this object is contained entirely within a page */
area->vm_addr = kmap_atomic(page);
return area->vm_addr + off;
}

- /* disable page faults to match kmap_atomic() return conditions */
- pagefault_disable();
+ /* this object spans two pages */
+ pages[0] = page;
+ pages[1] = get_next_page(page);
+ BUG_ON(!pages[1]);

- if (mm != ZS_MM_WO)
- zs_copy_map_object(area->vm_buf, page, off, class->size);
- area->vm_addr = NULL;
- return area->vm_buf;
+ return __zs_map_object(area, pages, off, class->size);
}
EXPORT_SYMBOL_GPL(zs_map_object);

@@ -801,17 +892,6 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
struct size_class *class;
struct mapping_area *area;

- area = &__get_cpu_var(zs_map_area);
- /* single-page object fastpath */
- if (area->vm_addr) {
- kunmap_atomic(area->vm_addr);
- goto out;
- }
-
- /* no write fastpath */
- if (area->vm_mm == ZS_MM_RO)
- goto pfenable;
-
BUG_ON(!handle);

obj_handle_to_location(handle, &page, &obj_idx);
@@ -819,12 +899,18 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
class = &pool->size_class[class_idx];
off = obj_idx_to_offset(page, obj_idx, class->size);

- zs_copy_unmap_object(area->vm_buf, page, off, class->size);
+ area = &__get_cpu_var(zs_map_area);
+ if (off + class->size <= PAGE_SIZE)
+ kunmap_atomic(area->vm_addr);
+ else {
+ struct page *pages[2];
+
+ pages[0] = page;
+ pages[1] = get_next_page(page);
+ BUG_ON(!pages[1]);

-pfenable:
- /* enable page faults to match kunmap_atomic() return conditions */
- pagefault_enable();
-out:
+ __zs_unmap_object(area, pages, off, class->size);
+ }
put_cpu_var(zs_map_area);
}
EXPORT_SYMBOL_GPL(zs_unmap_object);
diff --git a/drivers/staging/zsmalloc/zsmalloc_int.h b/drivers/staging/zsmalloc/zsmalloc_int.h
index 52805176..8c0b344 100644
--- a/drivers/staging/zsmalloc/zsmalloc_int.h
+++ b/drivers/staging/zsmalloc/zsmalloc_int.h
@@ -109,12 +109,6 @@ enum fullness_group {
*/
static const int fullness_threshold_frac = 4;

-struct mapping_area {
- char *vm_buf; /* copy buffer for objects that span pages */
- char *vm_addr; /* address of kmap_atomic()'ed pages */
- enum zs_mapmode vm_mm; /* mapping mode */
-};
-
struct size_class {
/*
* Size of objects stored in this class. Must be multiple
--
1.7.9.5

2012-08-08 06:12:48

by Minchan Kim

[permalink] [raw]
Subject: [PATCH 2/7] zsmalloc: prevent mappping in interrupt context

From: Seth Jennings <[email protected]>

Because we use per-cpu mapping areas shared among the
pools/users, we can't allow mapping in interrupt context
because it can corrupt another users mappings.

Signed-off-by: Seth Jennings <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/staging/zsmalloc/zsmalloc-main.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c
index 3c83c65..b86133f 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -75,6 +75,7 @@
#include <linux/cpumask.h>
#include <linux/cpu.h>
#include <linux/vmalloc.h>
+#include <linux/hardirq.h>

#include "zsmalloc.h"
#include "zsmalloc_int.h"
@@ -761,6 +762,13 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,

BUG_ON(!handle);

+ /*
+ * Because we use per-cpu mapping areas shared among the
+ * pools/users, we can't allow mapping in interrupt context
+ * because it can corrupt another users mappings.
+ */
+ BUG_ON(in_interrupt());
+
obj_handle_to_location(handle, &page, &obj_idx);
get_zspage_mapping(get_first_page(page), &class_idx, &fg);
class = &pool->size_class[class_idx];
--
1.7.9.5

2012-08-08 17:35:42

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

On 08/07/2012 11:12 PM, Minchan Kim wrote:
> This patchset promotes zram/zsmalloc from staging.
> Both are very clean and zram is used by many embedded product
> for a long time.
>
> [1-3] are patches not merged into linux-next yet but needed
> it as base for [4-5] which promotes zsmalloc.
> Greg, if you merged [1-3] already, skip them.
>
> Seth Jennings (5):
> 1. zsmalloc: s/firstpage/page in new copy map funcs
> 2. zsmalloc: prevent mappping in interrupt context
> 3. zsmalloc: add page table mapping method
> 4. zsmalloc: collapse internal .h into .c
> 5. zsmalloc: promote to mm/
>
> Minchan Kim (2):
> 6. zram: promote zram from staging
> 7. zram: select ZSMALLOC when ZRAM is configured
>

All the changes look good to me. FWIW, for the entire series:
Acked-by: Nitin Gupta <[email protected]>

Thanks for all the work.
Nitin

2012-08-10 01:15:52

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

On Wed, Aug 08, 2012 at 10:35:37AM -0700, Nitin Gupta wrote:
> On 08/07/2012 11:12 PM, Minchan Kim wrote:
> > This patchset promotes zram/zsmalloc from staging.
> > Both are very clean and zram is used by many embedded product
> > for a long time.
> >
> > [1-3] are patches not merged into linux-next yet but needed
> > it as base for [4-5] which promotes zsmalloc.
> > Greg, if you merged [1-3] already, skip them.
> >
> > Seth Jennings (5):
> > 1. zsmalloc: s/firstpage/page in new copy map funcs
> > 2. zsmalloc: prevent mappping in interrupt context
> > 3. zsmalloc: add page table mapping method
> > 4. zsmalloc: collapse internal .h into .c
> > 5. zsmalloc: promote to mm/
> >
> > Minchan Kim (2):
> > 6. zram: promote zram from staging
> > 7. zram: select ZSMALLOC when ZRAM is configured
> >
>
> All the changes look good to me. FWIW, for the entire series:
> Acked-by: Nitin Gupta <[email protected]>

And Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
as well. Thanks!
>
> Thanks for all the work.
> Nitin
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2012-08-14 02:35:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
> This patchset promotes zram/zsmalloc from staging.
> Both are very clean and zram is used by many embedded product
> for a long time.
>
> [1-3] are patches not merged into linux-next yet but needed
> it as base for [4-5] which promotes zsmalloc.
> Greg, if you merged [1-3] already, skip them.

I've applied 1-3 and now 4, but that's it, I can't apply the rest
without getting acks from the -mm maintainers, sorry. Please work with
them to get those acks, and then I will be glad to apply the rest (after
you resend them of course...)

thanks,

greg k-h

2012-08-14 05:36:52

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

On 08/13/2012 07:35 PM, Greg Kroah-Hartman wrote:
> On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
>> This patchset promotes zram/zsmalloc from staging.
>> Both are very clean and zram is used by many embedded product
>> for a long time.
>>
>> [1-3] are patches not merged into linux-next yet but needed
>> it as base for [4-5] which promotes zsmalloc.
>> Greg, if you merged [1-3] already, skip them.
>
> I've applied 1-3 and now 4, but that's it, I can't apply the rest
> without getting acks from the -mm maintainers, sorry. Please work with
> them to get those acks, and then I will be glad to apply the rest (after
> you resend them of course...)
>

On a second thought, I think zsmalloc should stay in drivers/block/zram
since zram is now the only user of zsmalloc since zcache and ramster are
moving to another allocator. Secondly, zsmalloc does not provide
standard slab like interface, so should not be part of mm/. At the best,
it could be moved to lib/ with header in include/linux just like genalloc.

Thanks,
Nitin

2012-08-14 06:18:18

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

Hi Nitin,

On Mon, Aug 13, 2012 at 10:36:47PM -0700, Nitin Gupta wrote:
> On 08/13/2012 07:35 PM, Greg Kroah-Hartman wrote:
> > On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
> >> This patchset promotes zram/zsmalloc from staging.
> >> Both are very clean and zram is used by many embedded product
> >> for a long time.
> >>
> >> [1-3] are patches not merged into linux-next yet but needed
> >> it as base for [4-5] which promotes zsmalloc.
> >> Greg, if you merged [1-3] already, skip them.
> >
> > I've applied 1-3 and now 4, but that's it, I can't apply the rest
> > without getting acks from the -mm maintainers, sorry. Please work with
> > them to get those acks, and then I will be glad to apply the rest (after
> > you resend them of course...)
> >
>
> On a second thought, I think zsmalloc should stay in drivers/block/zram
> since zram is now the only user of zsmalloc since zcache and ramster are
> moving to another allocator. Secondly, zsmalloc does not provide
> standard slab like interface, so should not be part of mm/. At the best,
> it could be moved to lib/ with header in include/linux just like genalloc.

I don't care whether it's located in /mm or wherever.
But if we move it into out of /mm, I would like to confirm it from akpm.
AFAIRC, he had a concern about that because zsmalloc used a few fields of
struct page freely so he wanted to locate it in /mm.

Andrew, Any thought?

>--
Kind regards,
Minchan Kim

2012-08-14 06:20:49

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

Hi Greg,

On Mon, Aug 13, 2012 at 07:35:30PM -0700, Greg Kroah-Hartman wrote:
> On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
> > This patchset promotes zram/zsmalloc from staging.
> > Both are very clean and zram is used by many embedded product
> > for a long time.
> >
> > [1-3] are patches not merged into linux-next yet but needed
> > it as base for [4-5] which promotes zsmalloc.
> > Greg, if you merged [1-3] already, skip them.
>
> I've applied 1-3 and now 4, but that's it, I can't apply the rest

Thanks!

> without getting acks from the -mm maintainers, sorry. Please work with

Nitin suggested zsmalloc could be in /lib or /zram out of /mm but I want
to confirm it from akpm so let's wait his opinion.

Anyway, another question. zram would be under driver/blocks.
Do I need ACK from Jens for that?

> them to get those acks, and then I will be glad to apply the rest (after
> you resend them of course...)
>
> thanks,
>
> greg k-h
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim

2012-08-14 13:31:49

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

On Tue, Aug 14, 2012 at 03:22:47PM +0900, Minchan Kim wrote:
> Hi Greg,
>
> On Mon, Aug 13, 2012 at 07:35:30PM -0700, Greg Kroah-Hartman wrote:
> > On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
> > > This patchset promotes zram/zsmalloc from staging.
> > > Both are very clean and zram is used by many embedded product
> > > for a long time.
> > >
> > > [1-3] are patches not merged into linux-next yet but needed
> > > it as base for [4-5] which promotes zsmalloc.
> > > Greg, if you merged [1-3] already, skip them.
> >
> > I've applied 1-3 and now 4, but that's it, I can't apply the rest
>
> Thanks!
>
> > without getting acks from the -mm maintainers, sorry. Please work with
>
> Nitin suggested zsmalloc could be in /lib or /zram out of /mm but I want
> to confirm it from akpm so let's wait his opinion.
>
> Anyway, another question. zram would be under driver/blocks.
> Do I need ACK from Jens for that?

Yes.
>
> > them to get those acks, and then I will be glad to apply the rest (after
> > you resend them of course...)
> >
> > thanks,
> >
> > greg k-h
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to [email protected]. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
> --
> Kind regards,
> Minchan Kim
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-14 17:39:39

by Seth Jennings

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

On 08/14/2012 12:36 AM, Nitin Gupta wrote:
> On 08/13/2012 07:35 PM, Greg Kroah-Hartman wrote:
>> On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
>>> This patchset promotes zram/zsmalloc from staging.
>>> Both are very clean and zram is used by many embedded product
>>> for a long time.
>>>
>>> [1-3] are patches not merged into linux-next yet but needed
>>> it as base for [4-5] which promotes zsmalloc.
>>> Greg, if you merged [1-3] already, skip them.
>>
>> I've applied 1-3 and now 4, but that's it, I can't apply the rest
>> without getting acks from the -mm maintainers, sorry. Please work with
>> them to get those acks, and then I will be glad to apply the rest (after
>> you resend them of course...)
>>
>
> On a second thought, I think zsmalloc should stay in drivers/block/zram
> since zram is now the only user of zsmalloc since zcache and ramster are
> moving to another allocator.

The removal of zsmalloc from zcache has not been agreed upon
yet.

Dan _suggested_ removing zsmalloc as the persistent
allocator for zcache in favor of zbud to solve "flaws" in
zcache. However, zbud has large deficiencies.

A zero-filled 4k page will compress with LZO to 103 bytes.
zbud can only store two compressed pages in each memory pool
page, resulting in 95% fragmentation (i.e. 95% of the memory
pool page goes unused). While this might not be a typical
case, it is the worst case and absolutely does happen.

zbud's design also effectively limits the useful page
compression to 50%. If pages are compressed beyond that, the
added space savings is lost in memory pool fragmentation.
For example, if two pages compress to 30% of their original
size, those two pages take up 60% of the zbud memory pool
page, and 40% is lost to fragmentation because zbud can't
store anything in the remaining space.

To say it another way, for every two page cache pages that
cleancache stores in zcache, zbud _must_ allocate a memory
pool page, regardless of how well those pages compress.
This reduces the efficiency of the page cache reclaim
mechanism by half.

I have posted some work (zsmalloc shrinker interface, user
registered alloc/free functions for the zsmalloc memory
pool) that begins to make zsmalloc a suitable replacement
for zbud, but that work was put on hold until the path out
of staging was established.

I'm hoping to continue this work once the code is in
mainline. While zbud has deficiencies, it doesn't prevent
zcache from having value as I have already demonstrated.
However, replacing zsmalloc with zbud would step backward
for the reasons mentioned above.

I do not support the removal of zsmalloc from zcache. As
such, I think the zsmalloc code should remain independent.

Seth

2012-08-15 10:42:18

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

On Tue, Aug 14, 2012 at 12:39:25PM -0500, Seth Jennings wrote:
> On 08/14/2012 12:36 AM, Nitin Gupta wrote:
> > On 08/13/2012 07:35 PM, Greg Kroah-Hartman wrote:
> >> On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
> >>> This patchset promotes zram/zsmalloc from staging.
> >>> Both are very clean and zram is used by many embedded product
> >>> for a long time.
> >>>
> >>> [1-3] are patches not merged into linux-next yet but needed
> >>> it as base for [4-5] which promotes zsmalloc.
> >>> Greg, if you merged [1-3] already, skip them.
> >>
> >> I've applied 1-3 and now 4, but that's it, I can't apply the rest
> >> without getting acks from the -mm maintainers, sorry. Please work with
> >> them to get those acks, and then I will be glad to apply the rest (after
> >> you resend them of course...)
> >>
> >
> > On a second thought, I think zsmalloc should stay in drivers/block/zram
> > since zram is now the only user of zsmalloc since zcache and ramster are
> > moving to another allocator.
>
> The removal of zsmalloc from zcache has not been agreed upon
> yet.

<nods>
>
> Dan _suggested_ removing zsmalloc as the persistent
> allocator for zcache in favor of zbud to solve "flaws" in
> zcache. However, zbud has large deficiencies.
>
> A zero-filled 4k page will compress with LZO to 103 bytes.
> zbud can only store two compressed pages in each memory pool
> page, resulting in 95% fragmentation (i.e. 95% of the memory
> pool page goes unused). While this might not be a typical
> case, it is the worst case and absolutely does happen.
>
> zbud's design also effectively limits the useful page
> compression to 50%. If pages are compressed beyond that, the
> added space savings is lost in memory pool fragmentation.
> For example, if two pages compress to 30% of their original
> size, those two pages take up 60% of the zbud memory pool
> page, and 40% is lost to fragmentation because zbud can't
> store anything in the remaining space.
>
> To say it another way, for every two page cache pages that
> cleancache stores in zcache, zbud _must_ allocate a memory
> pool page, regardless of how well those pages compress.
> This reduces the efficiency of the page cache reclaim
> mechanism by half.
>
> I have posted some work (zsmalloc shrinker interface, user
> registered alloc/free functions for the zsmalloc memory
> pool) that begins to make zsmalloc a suitable replacement
> for zbud, but that work was put on hold until the path out
> of staging was established.
>
> I'm hoping to continue this work once the code is in
> mainline. While zbud has deficiencies, it doesn't prevent
> zcache from having value as I have already demonstrated.
> However, replacing zsmalloc with zbud would step backward
> for the reasons mentioned above.

What would be nice is having only one engine instead
of two - and I believe that is what you and Dan are aiming at.

Dan is looking at it from the perspective of re-engineering
zcache to use an LRU for keeping track of pages and pushing
those to the compression engine. And redoing the zbud engine
a bit (I think, let me double-check the git tree he pointed
out).

>
> I do not support the removal of zsmalloc from zcache. As
> such, I think the zsmalloc code should remain independent.
>
> Seth
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-15 15:24:25

by Dan Magenheimer

[permalink] [raw]
Subject: RE: [PATCH 0/7] zram/zsmalloc promotion

> > On a second thought, I think zsmalloc should stay in drivers/block/zram
> > since zram is now the only user of zsmalloc since zcache and ramster are
> > moving to another allocator.
>
> The removal of zsmalloc from zcache has not been agreed upon
> yet.
>
> Dan _suggested_ removing zsmalloc as the persistent
> allocator for zcache in favor of zbud to solve "flaws" in
> zcache.

(Correction: Dan has _coded_, _tested_ and _published_ _working_
code that removes zsmalloc ;-)

> However, zbud has large deficiencies.
>
> A zero-filled 4k page will compress with LZO to 103 bytes.
> zbud can only store two compressed pages in each memory pool
> page, resulting in 95% fragmentation (i.e. 95% of the memory
> pool page goes unused). While this might not be a typical
> case, it is the worst case and absolutely does happen.
>
> zbud's design also effectively limits the useful page
> compression to 50%. If pages are compressed beyond that, the
> added space savings is lost in memory pool fragmentation.
> For example, if two pages compress to 30% of their original
> size, those two pages take up 60% of the zbud memory pool
> page, and 40% is lost to fragmentation because zbud can't
> store anything in the remaining space.
>
> To say it another way, for every two page cache pages that
> cleancache stores in zcache, zbud _must_ allocate a memory
> pool page, regardless of how well those pages compress.
> This reduces the efficiency of the page cache reclaim
> mechanism by half.

All very true, but these are not zbud deficiencies.
They are design choices to ensure predictability so that
pageframes can be reclaimed when the cache gets full.
NOT zpages but complete pageframes, since this is what
the rest of the kernel uses. And zbud handles LRU
ordering also.

Basic computer science principles tell us that maximizing
storage density (as zsmalloc does) has major negative
consequences especially when random fragments are freed,
as is true for the random frontswap access patterns. You
don't get higher density without much higher complexity.
This is a fundamental design tradeoff for zcache.

> I have posted some work (zsmalloc shrinker interface, user
> registered alloc/free functions for the zsmalloc memory
> pool) that begins to make zsmalloc a suitable replacement
> for zbud, but that work was put on hold until the path out
> of staging was established.
>
> I'm hoping to continue this work once the code is in
> mainline. While zbud has deficiencies, it doesn't prevent
> zcache from having value as I have already demonstrated.
> However, replacing zsmalloc with zbud would step backward
> for the reasons mentioned above.

Please do continue this work. But there was no indication
that you had even begun to think through all the consequences
of concurrent access or LRU pageframe reclaim. I did think
through them and concluded that the issues were far more complex
with zsmalloc than zbud (and would be happy to explain further).
So I solved the issues with zbud and got both parts of zcache
(frontswap and cleancache) working fully with zbud.

In other words, IMHO the existing zsmalloc will need to evolve
a great deal to work with the complete needs of zcache. If
you can get both maximal density AND concurrency AND LRU
pageframe reclaim all working with zsmalloc, and fully test
it (with zcache) and publish it, I would support moving to
this "new" zsmalloc instantly.

> I do not support the removal of zsmalloc from zcache. As
> such, I think the zsmalloc code should remain independent.

Zcache must move forward to meet the needs of a broader set
of workloads and distros and users, not just the limited
toy benchmarking you have provided. Zsmalloc has not yet
proven capable of meeting those needs. It may be capable
in the future but it cannot meet them now.

Dan

P.S. I think zsmalloc IS a good match for zram so I do not
object to the promotion of zsmalloc as part of promoting zram.
Would be happy to explain technical details further if this
surprises anyone.

2012-08-17 05:48:04

by Nitin Gupta

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

On 08/13/2012 11:22 PM, Minchan Kim wrote:
> Hi Greg,
>
> On Mon, Aug 13, 2012 at 07:35:30PM -0700, Greg Kroah-Hartman wrote:
>> On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
>>> This patchset promotes zram/zsmalloc from staging.
>>> Both are very clean and zram is used by many embedded product
>>> for a long time.
>>>
>>> [1-3] are patches not merged into linux-next yet but needed
>>> it as base for [4-5] which promotes zsmalloc.
>>> Greg, if you merged [1-3] already, skip them.
>>
>> I've applied 1-3 and now 4, but that's it, I can't apply the rest
>
> Thanks!
>
>> without getting acks from the -mm maintainers, sorry. Please work with
>
> Nitin suggested zsmalloc could be in /lib or /zram out of /mm but I want
> to confirm it from akpm so let's wait his opinion.
>

akpm, please?

> Anyway, another question. zram would be under driver/blocks.
> Do I need ACK from Jens for that?
>

Added Jens to CC list.

Thanks,
Nitin

2012-08-21 00:56:03

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 0/7] zram/zsmalloc promotion

On Thu, Aug 16, 2012 at 10:47:58PM -0700, Nitin Gupta wrote:
> On 08/13/2012 11:22 PM, Minchan Kim wrote:
> > Hi Greg,
> >
> > On Mon, Aug 13, 2012 at 07:35:30PM -0700, Greg Kroah-Hartman wrote:
> >> On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
> >>> This patchset promotes zram/zsmalloc from staging.
> >>> Both are very clean and zram is used by many embedded product
> >>> for a long time.
> >>>
> >>> [1-3] are patches not merged into linux-next yet but needed
> >>> it as base for [4-5] which promotes zsmalloc.
> >>> Greg, if you merged [1-3] already, skip them.
> >>
> >> I've applied 1-3 and now 4, but that's it, I can't apply the rest
> >
> > Thanks!
> >
> >> without getting acks from the -mm maintainers, sorry. Please work with
> >
> > Nitin suggested zsmalloc could be in /lib or /zram out of /mm but I want
> > to confirm it from akpm so let's wait his opinion.
> >
>
> akpm, please?

To Nitin
Now both zram/zcache uses zsmalloc so I think second place is under /lib than
/zram if we really want to put it out of /mm but my preference is still
under /mm. If akpm don't oppose, I will do.
(Let's not consider removal of zsmalloc in zcache at the moment because
it's just Dan's trial and it's not realized yet. It's very twisted problem
and I don't expect it will finish soon :( )

To akpm,

I would like to put zsmalloc under /mm because it uses a few field of struct
page freely. IIRC, you pointed out, too. What do you think about it?
If you don't want, I will put zsmalloc under /lib.

To Jens,

I would like to put zram under driver/blocks.
Can I get your Ack?

>
> > Anyway, another question. zram would be under driver/blocks.
> > Do I need ACK from Jens for that?
> >
>
> Added Jens to CC list.
>
> Thanks,
> Nitin
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim