2014-04-19 15:53:21

by Dan Streetman

[permalink] [raw]
Subject: [PATCH 0/4] mm: zpool: add common api for zswap to use zbud/zsmalloc

In order to allow zswap users to choose between zbud and zsmalloc for
the compressed storage pool, this patch set adds a new api "zpool" that
provides an interface to both zbud and zsmalloc. Only a minor change
to zbud's interface was needed, as detailed in the first patch;
zsmalloc required shrinking to be added and a minor interface change,
as detailed in the second patch.

I believe Seth originally was using zsmalloc for swap, but there were
concerns about how significant the impact of shrinking zsmalloc would
be when zswap had to start reclaiming pages. That still may be an
issue, but this at least allows users to choose themselves whether
they want a lower-density or higher-density compressed storage medium.
At least for situations where zswap reclaim is never or rarely reached,
it probably makes sense to use the higher density of zsmalloc.

Note this patch series does not change zram to use zpool, although that
change should be possible as well.


Dan Streetman (4):
mm: zpool: zbud_alloc() minor param change
mm: zpool: implement zsmalloc shrinking
mm: zpool: implement common zpool api to zbud/zsmalloc
mm: zpool: update zswap to use zpool

drivers/block/zram/zram_drv.c | 2 +-
include/linux/zbud.h | 3 +-
include/linux/zpool.h | 166 ++++++++++++++++++
include/linux/zsmalloc.h | 7 +-
mm/Kconfig | 43 +++--
mm/Makefile | 1 +
mm/zbud.c | 28 ++--
mm/zpool.c | 380 ++++++++++++++++++++++++++++++++++++++++++
mm/zsmalloc.c | 168 +++++++++++++++++--
mm/zswap.c | 70 ++++----
10 files changed, 787 insertions(+), 81 deletions(-)
create mode 100644 include/linux/zpool.h
create mode 100644 mm/zpool.c

--
1.8.3.1


2014-04-19 15:53:50

by Dan Streetman

[permalink] [raw]
Subject: [PATCH 1/4] mm: zpool: zbud_alloc() minor param change

Change zbud to store gfp_t flags passed at pool creation to use for
each alloc; this allows the api to be closer to the existing zsmalloc
interface, and the only current zbud user (zswap) uses the same gfp
flags for all allocs. Update zswap to use changed interface.

Signed-off-by: Dan Streetman <[email protected]>

---
include/linux/zbud.h | 3 +--
mm/zbud.c | 28 +++++++++++++++-------------
mm/zswap.c | 6 +++---
3 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/linux/zbud.h b/include/linux/zbud.h
index 2571a5c..50563b6 100644
--- a/include/linux/zbud.h
+++ b/include/linux/zbud.h
@@ -11,8 +11,7 @@ struct zbud_ops {

struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops);
void zbud_destroy_pool(struct zbud_pool *pool);
-int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
- unsigned long *handle);
+int zbud_alloc(struct zbud_pool *pool, int size, unsigned long *handle);
void zbud_free(struct zbud_pool *pool, unsigned long handle);
int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
void *zbud_map(struct zbud_pool *pool, unsigned long handle);
diff --git a/mm/zbud.c b/mm/zbud.c
index 9451361..e02f53f 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -94,6 +94,7 @@ struct zbud_pool {
struct list_head lru;
u64 pages_nr;
struct zbud_ops *ops;
+ gfp_t gfp;
};

/*
@@ -193,9 +194,12 @@ static int num_free_chunks(struct zbud_header *zhdr)
*****************/
/**
* zbud_create_pool() - create a new zbud pool
- * @gfp: gfp flags when allocating the zbud pool structure
+ * @gfp: gfp flags when growing the pool
* @ops: user-defined operations for the zbud pool
*
+ * gfp should not set __GFP_HIGHMEM as highmem pages cannot be used
+ * as zbud pool pages.
+ *
* Return: pointer to the new zbud pool or NULL if the metadata allocation
* failed.
*/
@@ -204,7 +208,9 @@ struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops)
struct zbud_pool *pool;
int i;

- pool = kmalloc(sizeof(struct zbud_pool), gfp);
+ if (gfp & __GFP_HIGHMEM)
+ return NULL;
+ pool = kmalloc(sizeof(struct zbud_pool), GFP_KERNEL);
if (!pool)
return NULL;
spin_lock_init(&pool->lock);
@@ -214,6 +220,7 @@ struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops)
INIT_LIST_HEAD(&pool->lru);
pool->pages_nr = 0;
pool->ops = ops;
+ pool->gfp = gfp;
return pool;
}

@@ -232,7 +239,6 @@ void zbud_destroy_pool(struct zbud_pool *pool)
* zbud_alloc() - allocates a region of a given size
* @pool: zbud pool from which to allocate
* @size: size in bytes of the desired allocation
- * @gfp: gfp flags used if the pool needs to grow
* @handle: handle of the new allocation
*
* This function will attempt to find a free region in the pool large enough to
@@ -240,22 +246,18 @@ void zbud_destroy_pool(struct zbud_pool *pool)
* performed first. If no suitable free region is found, then a new page is
* allocated and added to the pool to satisfy the request.
*
- * gfp should not set __GFP_HIGHMEM as highmem pages cannot be used
- * as zbud pool pages.
- *
- * Return: 0 if success and handle is set, otherwise -EINVAL if the size or
- * gfp arguments are invalid or -ENOMEM if the pool was unable to allocate
- * a new page.
+ * Return: 0 if success and @handle is set, -ENOSPC if the @size is too large,
+ * -EINVAL if the @size is 0 or less, or -ENOMEM if the pool was unable to
+ * allocate a new page.
*/
-int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
- unsigned long *handle)
+int zbud_alloc(struct zbud_pool *pool, int size, unsigned long *handle)
{
int chunks, i, freechunks;
struct zbud_header *zhdr = NULL;
enum buddy bud;
struct page *page;

- if (size <= 0 || gfp & __GFP_HIGHMEM)
+ if (size <= 0)
return -EINVAL;
if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE)
return -ENOSPC;
@@ -279,7 +281,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,

/* Couldn't find unbuddied zbud page, create new one */
spin_unlock(&pool->lock);
- page = alloc_page(gfp);
+ page = alloc_page(pool->gfp);
if (!page)
return -ENOMEM;
spin_lock(&pool->lock);
diff --git a/mm/zswap.c b/mm/zswap.c
index aeaef0f..1cc6770 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -679,8 +679,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,

/* store */
len = dlen + sizeof(struct zswap_header);
- ret = zbud_alloc(zswap_pool, len, __GFP_NORETRY | __GFP_NOWARN,
- &handle);
+ ret = zbud_alloc(zswap_pool, len, &handle);
if (ret == -ENOSPC) {
zswap_reject_compress_poor++;
goto freepage;
@@ -900,7 +899,8 @@ static int __init init_zswap(void)

pr_info("loading zswap\n");

- zswap_pool = zbud_create_pool(GFP_KERNEL, &zswap_zbud_ops);
+ zswap_pool = zbud_create_pool(__GFP_NORETRY | __GFP_NOWARN,
+ &zswap_zbud_ops);
if (!zswap_pool) {
pr_err("zbud pool creation failed\n");
goto error;
--
1.8.3.1

2014-04-19 15:54:01

by Dan Streetman

[permalink] [raw]
Subject: [PATCH 3/4] mm: zpool: implement common zpool api to zbud/zsmalloc

Add zpool api.

zpool provides an interface for memory storage, typically of compressed
memory. Users can select what backend to use; currently the only
implementations are zbud, a low density implementation with exactly
two compressed pages per storage page, and zsmalloc, a higher density
implementation with multiple compressed pages per storage page.

Signed-off-by: Dan Streetman <[email protected]>
---
include/linux/zpool.h | 166 ++++++++++++++++++++++
mm/Kconfig | 43 +++---
mm/Makefile | 1 +
mm/zpool.c | 380 ++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 572 insertions(+), 18 deletions(-)
create mode 100644 include/linux/zpool.h
create mode 100644 mm/zpool.c

diff --git a/include/linux/zpool.h b/include/linux/zpool.h
new file mode 100644
index 0000000..82d81c6
--- /dev/null
+++ b/include/linux/zpool.h
@@ -0,0 +1,166 @@
+/*
+ * zpool memory storage api
+ *
+ * Copyright (C) 2014 Dan Streetman
+ *
+ * This is a common frontend for the zbud and zsmalloc memory
+ * storage pool implementations. Typically, this is used to
+ * store compressed memory.
+ */
+
+#ifndef _ZPOOL_H_
+#define _ZPOOL_H_
+
+struct zpool;
+
+struct zpool_ops {
+ int (*evict)(struct zpool *pool, unsigned long handle);
+};
+
+#define ZPOOL_TYPE_ZSMALLOC "zsmalloc"
+#define ZPOOL_TYPE_ZBUD "zbud"
+
+/*
+ * Control how a handle is mapped. It will be ignored if the
+ * implementation does not support it. Its use is optional.
+ * Note that this does not refer to memory protection, it
+ * refers to how the memory will be copied in/out if copying
+ * is necessary during mapping; read-write is the safest as
+ * it copies the existing memory in on map, and copies the
+ * changed memory back out on unmap. Write-only does not copy
+ * in the memory and should only be used for initialization.
+ * If in doubt, use ZPOOL_MM_DEFAULT which is read-write.
+ */
+enum zpool_mapmode {
+ ZPOOL_MM_RW, /* normal read-write mapping */
+ ZPOOL_MM_RO, /* read-only (no copy-out at unmap time) */
+ ZPOOL_MM_WO, /* write-only (no copy-in at map time) */
+
+ ZPOOL_MM_DEFAULT = ZPOOL_MM_RW
+};
+
+/**
+ * zpool_create_pool() - Create a new zpool
+ * @type The type of the zpool to create (e.g. zbud, zsmalloc)
+ * @flags What GFP flags should be used when the zpool allocates memory.
+ * @ops The optional ops callback.
+ * @fallback If other implementations should be used
+ *
+ * This creates a new zpool of the specified type. The zpool will use the
+ * given flags when allocating any memory. If the ops param is NULL, then
+ * the created zpool will not be shrinkable.
+ *
+ * If creation of the implementation @type fails, and @fallback is true,
+ * then other implementation(s) are tried. If @fallback is false or no
+ * implementations could be created, then NULL is returned.
+ *
+ * Returns: New zpool on success, NULL on failure.
+ */
+struct zpool *zpool_create_pool(char *type, gfp_t flags,
+ struct zpool_ops *ops, bool fallback);
+
+/**
+ * zpool_get_type() - Get the type of the zpool
+ * @pool The zpool to check
+ *
+ * This returns the type of the pool, which will match one of the
+ * ZPOOL_TYPE_* defined values. This can be useful after calling
+ * zpool_create_pool() with @fallback set to true.
+ *
+ * Returns: The type of zpool.
+ */
+char *zpool_get_type(struct zpool *pool);
+
+/**
+ * zpool_destroy_pool() - Destroy a zpool
+ * @pool The zpool to destroy.
+ *
+ * This destroys an existing zpool. The zpool should not be in use.
+ */
+void zpool_destroy_pool(struct zpool *pool);
+
+/**
+ * zpool_malloc() - Allocate memory
+ * @pool The zpool to allocate from.
+ * @size The amount of memory to allocate.
+ * @handle Pointer to the handle to set
+ *
+ * This allocates the requested amount of memory from the pool.
+ * The provided @handle will be set to the allocated object handle.
+ *
+ * Returns: 0 on success, negative value on error.
+ */
+int zpool_malloc(struct zpool *pool, size_t size, unsigned long *handle);
+
+/**
+ * zpool_free() - Free previously allocated memory
+ * @pool The zpool that allocated the memory.
+ * @handle The handle to the memory to free.
+ *
+ * This frees previously allocated memory. This does not guarantee
+ * that the pool will actually free memory, only that the memory
+ * in the pool will become available for use by the pool.
+ */
+void zpool_free(struct zpool *pool, unsigned long handle);
+
+/**
+ * zpool_shrink() - Shrink the pool size
+ * @pool The zpool to shrink.
+ * @size The minimum amount to shrink the pool.
+ *
+ * This attempts to shrink the actual memory size of the pool
+ * by evicting currently used handle(s). If the pool was
+ * created with no zpool_ops, or the evict call fails for any
+ * of the handles, this will fail.
+ *
+ * Returns: 0 on success, negative value on error/failure.
+ */
+int zpool_shrink(struct zpool *pool, size_t size);
+
+/**
+ * zpool_map_handle() - Map a previously allocated handle into memory
+ * @pool The zpool that the handle was allocated from
+ * @handle The handle to map
+ * @mm How the memory should be mapped
+ *
+ * This maps a previously allocated handle into memory. The @mm
+ * param indicates to the implemenation how the memory will be
+ * used, i.e. read-only, write-only, read-write. If the
+ * implementation does not support it, the memory will be treated
+ * as read-write.
+ *
+ * This may hold locks, disable interrupts, and/or preemption,
+ * and the zpool_unmap_handle() must be called to undo those
+ * actions. The code that uses the mapped handle should complete
+ * its operatons on the mapped handle memory quickly and unmap
+ * as soon as possible. Multiple handles should not be mapped
+ * concurrently on a cpu.
+ *
+ * Returns: A pointer to the handle's mapped memory area.
+ */
+void *zpool_map_handle(struct zpool *pool, unsigned long handle,
+ enum zpool_mapmode mm);
+
+/**
+ * zpool_unmap_handle() - Unmap a previously mapped handle
+ * @pool The zpool that the handle was allocated from
+ * @handle The handle to unmap
+ *
+ * This unmaps a previously mapped handle. Any locks or other
+ * actions that the implemenation took in zpool_map_handle()
+ * will be undone here. The memory area returned from
+ * zpool_map_handle() should no longer be used after this.
+ */
+void zpool_unmap_handle(struct zpool *pool, unsigned long handle);
+
+/**
+ * zpool_get_total_size() - The total size of the pool
+ * @pool The zpool to check
+ *
+ * This returns the total size in bytes of the pool.
+ *
+ * Returns: Total size of the zpool in bytes.
+ */
+u64 zpool_get_total_size(struct zpool *pool);
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index ebe5880..ed7715c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -512,21 +512,23 @@ config CMA_DEBUG
processing calls such as dma_alloc_from_contiguous().
This option does not affect warning and error messages.

-config ZBUD
- tristate
- default n
+config MEM_SOFT_DIRTY
+ bool "Track memory changes"
+ depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
+ select PROC_PAGE_MONITOR
help
- A special purpose allocator for storing compressed pages.
- It is designed to store up to two compressed pages per physical
- page. While this design limits storage density, it has simple and
- deterministic reclaim properties that make it preferable to a higher
- density approach when reclaim will be used.
+ This option enables memory changes tracking by introducing a
+ soft-dirty bit on pte-s. This bit it set when someone writes
+ into a page just as regular dirty bit, but unlike the latter
+ it can be cleared by hands.
+
+ See Documentation/vm/soft-dirty.txt for more details.

config ZSWAP
bool "Compressed cache for swap pages (EXPERIMENTAL)"
depends on FRONTSWAP && CRYPTO=y
select CRYPTO_LZO
- select ZBUD
+ select ZPOOL
default n
help
A lightweight compressed cache for swap pages. It takes
@@ -542,17 +544,22 @@ config ZSWAP
they have not be fully explored on the large set of potential
configurations and workloads that exist.

-config MEM_SOFT_DIRTY
- bool "Track memory changes"
- depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
- select PROC_PAGE_MONITOR
+config ZPOOL
+ tristate "Common API for compressed memory storage"
+ default n
help
- This option enables memory changes tracking by introducing a
- soft-dirty bit on pte-s. This bit it set when someone writes
- into a page just as regular dirty bit, but unlike the latter
- it can be cleared by hands.
+ Compressed memory storage API. This allows using either zbud or
+ zsmalloc.

- See Documentation/vm/soft-dirty.txt for more details.
+config ZBUD
+ tristate "Low density storage for compressed pages"
+ default n
+ help
+ A special purpose allocator for storing compressed pages.
+ It is designed to store up to two compressed pages per physical
+ page. While this design limits storage density, it has simple and
+ deterministic reclaim properties that make it preferable to a higher
+ density approach when reclaim will be used.

config ZSMALLOC
bool "Memory allocator for compressed pages"
diff --git a/mm/Makefile b/mm/Makefile
index 60cacbb..4135f7c 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -60,6 +60,7 @@ obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
obj-$(CONFIG_PAGE_OWNER) += pageowner.o
+obj-$(CONFIG_ZPOOL) += zpool.o
obj-$(CONFIG_ZBUD) += zbud.o
obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
diff --git a/mm/zpool.c b/mm/zpool.c
new file mode 100644
index 0000000..592cc0d
--- /dev/null
+++ b/mm/zpool.c
@@ -0,0 +1,380 @@
+/*
+ * zpool memory storage api
+ *
+ * Copyright (C) 2014 Dan Streetman
+ *
+ * This is a common frontend for the zbud and zsmalloc memory
+ * storage pool implementations. Typically, this is used to
+ * store compressed memory.
+ */
+
+#include <linux/list.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/zpool.h>
+#include <linux/zbud.h>
+#include <linux/zsmalloc.h>
+
+struct zpool_imp {
+ void (*destroy)(struct zpool *pool);
+
+ int (*malloc)(struct zpool *pool, size_t size, unsigned long *handle);
+ void (*free)(struct zpool *pool, unsigned long handle);
+
+ int (*shrink)(struct zpool *pool, size_t size);
+
+ void *(*map)(struct zpool *pool, unsigned long handle,
+ enum zpool_mapmode mm);
+ void (*unmap)(struct zpool *pool, unsigned long handle);
+
+ u64 (*total_size)(struct zpool *pool);
+};
+
+struct zpool {
+ char *type;
+
+ union {
+#ifdef CONFIG_ZSMALLOC
+ struct zs_pool *zsmalloc_pool;
+#endif
+#ifdef CONFIG_ZBUD
+ struct zbud_pool *zbud_pool;
+#endif
+ };
+
+ struct zpool_imp *imp;
+ struct zpool_ops *ops;
+
+ struct list_head list;
+};
+
+static LIST_HEAD(zpools);
+static DEFINE_SPINLOCK(zpools_lock);
+
+static int zpool_noop_evict(struct zpool *pool, unsigned long handle)
+{
+ return -EINVAL;
+}
+static struct zpool_ops zpool_noop_ops = {
+ .evict = zpool_noop_evict
+};
+
+
+/* zsmalloc */
+
+#ifdef CONFIG_ZSMALLOC
+
+static void zpool_zsmalloc_destroy(struct zpool *zpool)
+{
+ spin_lock(&zpools_lock);
+ list_del(&zpool->list);
+ spin_unlock(&zpools_lock);
+
+ zs_destroy_pool(zpool->zsmalloc_pool);
+ kfree(zpool);
+}
+
+static int zpool_zsmalloc_malloc(struct zpool *pool, size_t size,
+ unsigned long *handle)
+{
+ *handle = zs_malloc(pool->zsmalloc_pool, size);
+ return *handle ? 0 : -1;
+}
+
+static void zpool_zsmalloc_free(struct zpool *pool, unsigned long handle)
+{
+ zs_free(pool->zsmalloc_pool, handle);
+}
+
+static int zpool_zsmalloc_shrink(struct zpool *pool, size_t size)
+{
+ return zs_shrink(pool->zsmalloc_pool, size);
+}
+
+static void *zpool_zsmalloc_map(struct zpool *pool, unsigned long handle,
+ enum zpool_mapmode zpool_mapmode)
+{
+ enum zs_mapmode zs_mapmode;
+
+ switch (zpool_mapmode) {
+ case ZPOOL_MM_RO:
+ zs_mapmode = ZS_MM_RO; break;
+ case ZPOOL_MM_WO:
+ zs_mapmode = ZS_MM_WO; break;
+ case ZPOOL_MM_RW: /* fallthrough */
+ default:
+ zs_mapmode = ZS_MM_RW; break;
+ }
+ return zs_map_object(pool->zsmalloc_pool, handle, zs_mapmode);
+}
+
+static void zpool_zsmalloc_unmap(struct zpool *pool, unsigned long handle)
+{
+ zs_unmap_object(pool->zsmalloc_pool, handle);
+}
+
+static u64 zpool_zsmalloc_total_size(struct zpool *pool)
+{
+ return zs_get_total_size_bytes(pool->zsmalloc_pool);
+}
+
+static int zpool_zsmalloc_evict(struct zs_pool *zsmalloc_pool,
+ unsigned long handle)
+{
+ struct zpool *zpool;
+
+ spin_lock(&zpools_lock);
+ list_for_each_entry(zpool, &zpools, list) {
+ if (zpool->zsmalloc_pool == zsmalloc_pool) {
+ spin_unlock(&zpools_lock);
+ return zpool->ops->evict(zpool, handle);
+ }
+ }
+ spin_unlock(&zpools_lock);
+ return -EINVAL;
+}
+
+static struct zpool_imp zpool_zsmalloc_imp = {
+ .destroy = zpool_zsmalloc_destroy,
+ .malloc = zpool_zsmalloc_malloc,
+ .free = zpool_zsmalloc_free,
+ .shrink = zpool_zsmalloc_shrink,
+ .map = zpool_zsmalloc_map,
+ .unmap = zpool_zsmalloc_unmap,
+ .total_size = zpool_zsmalloc_total_size
+};
+
+static struct zs_ops zpool_zsmalloc_ops = {
+ .evict = zpool_zsmalloc_evict
+};
+
+static struct zpool *zpool_zsmalloc_create(gfp_t flags, struct zpool_ops *ops)
+{
+ struct zpool *zpool;
+ struct zs_ops *zs_ops = (ops ? &zpool_zsmalloc_ops : NULL);
+
+ zpool = kmalloc(sizeof(*zpool), GFP_KERNEL);
+ if (!zpool)
+ return NULL;
+
+ zpool->zsmalloc_pool = zs_create_pool(flags, zs_ops);
+ if (!zpool->zsmalloc_pool) {
+ kfree(zpool);
+ return NULL;
+ }
+
+ zpool->type = ZPOOL_TYPE_ZSMALLOC;
+ zpool->imp = &zpool_zsmalloc_imp;
+ zpool->ops = (ops ? ops : &zpool_noop_ops);
+ spin_lock(&zpools_lock);
+ list_add(&zpool->list, &zpools);
+ spin_unlock(&zpools_lock);
+
+ return zpool;
+}
+
+#else
+
+static struct zpool *zpool_zsmalloc_create(gfp_t flags, struct zpool_ops *ops)
+{
+ pr_info("zpool: no zsmalloc in this kernel\n");
+ return NULL;
+}
+
+#endif /* CONFIG_ZSMALLOC */
+
+
+/* zbud */
+
+#ifdef CONFIG_ZBUD
+
+static void zpool_zbud_destroy(struct zpool *zpool)
+{
+ spin_lock(&zpools_lock);
+ list_del(&zpool->list);
+ spin_unlock(&zpools_lock);
+
+ zbud_destroy_pool(zpool->zbud_pool);
+ kfree(zpool);
+}
+
+static int zpool_zbud_malloc(struct zpool *pool, size_t size,
+ unsigned long *handle)
+{
+ return zbud_alloc(pool->zbud_pool, size, handle);
+}
+
+static void zpool_zbud_free(struct zpool *pool, unsigned long handle)
+{
+ zbud_free(pool->zbud_pool, handle);
+}
+
+static int zpool_zbud_shrink(struct zpool *pool, size_t size)
+{
+ return zbud_reclaim_page(pool->zbud_pool, 3);
+}
+
+static void *zpool_zbud_map(struct zpool *pool, unsigned long handle,
+ enum zpool_mapmode zpool_mapmode)
+{
+ return zbud_map(pool->zbud_pool, handle);
+}
+
+static void zpool_zbud_unmap(struct zpool *pool, unsigned long handle)
+{
+ zbud_unmap(pool->zbud_pool, handle);
+}
+
+static u64 zpool_zbud_total_size(struct zpool *pool)
+{
+ return zbud_get_pool_size(pool->zbud_pool) * PAGE_SIZE;
+}
+
+static int zpool_zbud_evict(struct zbud_pool *zbud_pool, unsigned long handle)
+{
+ struct zpool *zpool;
+
+ spin_lock(&zpools_lock);
+ list_for_each_entry(zpool, &zpools, list) {
+ if (zpool->zbud_pool == zbud_pool) {
+ spin_unlock(&zpools_lock);
+ return zpool->ops->evict(zpool, handle);
+ }
+ }
+ spin_unlock(&zpools_lock);
+ return -EINVAL;
+}
+
+static struct zpool_imp zpool_zbud_imp = {
+ .destroy = zpool_zbud_destroy,
+ .malloc = zpool_zbud_malloc,
+ .free = zpool_zbud_free,
+ .shrink = zpool_zbud_shrink,
+ .map = zpool_zbud_map,
+ .unmap = zpool_zbud_unmap,
+ .total_size = zpool_zbud_total_size
+};
+
+static struct zbud_ops zpool_zbud_ops = {
+ .evict = zpool_zbud_evict
+};
+
+static struct zpool *zpool_zbud_create(gfp_t flags, struct zpool_ops *ops)
+{
+ struct zpool *zpool;
+ struct zbud_ops *zbud_ops = (ops ? &zpool_zbud_ops : NULL);
+
+ zpool = kmalloc(sizeof(*zpool), GFP_KERNEL);
+ if (!zpool)
+ return NULL;
+
+ zpool->zbud_pool = zbud_create_pool(flags, zbud_ops);
+ if (!zpool->zbud_pool) {
+ kfree(zpool);
+ return NULL;
+ }
+
+ zpool->type = ZPOOL_TYPE_ZBUD;
+ zpool->imp = &zpool_zbud_imp;
+ zpool->ops = (ops ? ops : &zpool_noop_ops);
+ spin_lock(&zpools_lock);
+ list_add(&zpool->list, &zpools);
+ spin_unlock(&zpools_lock);
+
+ return zpool;
+}
+
+#else
+
+static struct zpool *zpool_zbud_create(gfp_t flags, struct zpool_ops *ops)
+{
+ pr_info("zpool: no zbud in this kernel\n");
+ return NULL;
+}
+
+#endif /* CONFIG_ZBUD */
+
+
+struct zpool *zpool_fallback_create(gfp_t flags, struct zpool_ops *ops)
+{
+ struct zpool *pool = NULL;
+
+#ifdef CONFIG_ZSMALLOC
+ pool = zpool_zsmalloc_create(flags, ops);
+ if (pool)
+ return pool;
+ pr_info("zpool: fallback unable to create zsmalloc pool\n");
+#endif
+
+#ifdef CONFIG_ZBUD
+ pool = zpool_zbud_create(flags, ops);
+ if (!pool)
+ pr_info("zpool: fallback unable to create zbud pool\n");
+#endif
+
+ return pool;
+}
+
+struct zpool *zpool_create_pool(char *type, gfp_t flags,
+ struct zpool_ops *ops, bool fallback)
+{
+ struct zpool *pool = NULL;
+
+ if (!strcmp(type, ZPOOL_TYPE_ZSMALLOC))
+ pool = zpool_zsmalloc_create(flags, ops);
+ else if (!strcmp(type, ZPOOL_TYPE_ZBUD))
+ pool = zpool_zbud_create(flags, ops);
+ else
+ pr_err("zpool: unknown type %s\n", type);
+
+ if (!pool && fallback)
+ pool = zpool_fallback_create(flags, ops);
+
+ if (!pool)
+ pr_err("zpool: couldn't create zpool\n");
+
+ return pool;
+}
+
+char *zpool_get_type(struct zpool *pool)
+{
+ return pool->type;
+}
+
+void zpool_destroy_pool(struct zpool *pool)
+{
+ pool->imp->destroy(pool);
+}
+
+int zpool_malloc(struct zpool *pool, size_t size, unsigned long *handle)
+{
+ return pool->imp->malloc(pool, size, handle);
+}
+
+void zpool_free(struct zpool *pool, unsigned long handle)
+{
+ pool->imp->free(pool, handle);
+}
+
+int zpool_shrink(struct zpool *pool, size_t size)
+{
+ return pool->imp->shrink(pool, size);
+}
+
+void *zpool_map_handle(struct zpool *pool, unsigned long handle,
+ enum zpool_mapmode mapmode)
+{
+ return pool->imp->map(pool, handle, mapmode);
+}
+
+void zpool_unmap_handle(struct zpool *pool, unsigned long handle)
+{
+ pool->imp->unmap(pool, handle);
+}
+
+u64 zpool_get_total_size(struct zpool *pool)
+{
+ return pool->imp->total_size(pool);
+}
--
1.8.3.1

2014-04-19 15:54:09

by Dan Streetman

[permalink] [raw]
Subject: [PATCH 4/4] mm: zpool: update zswap to use zpool

Change zswap to use the zpool api instead of directly using zbud.
Add a boot-time param to allow selecting which zpool implementation
to use, with zbud as the default.

Signed-off-by: Dan Streetman <[email protected]>
---
mm/zswap.c | 70 ++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 1cc6770..4f4a8ec 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -34,7 +34,7 @@
#include <linux/swap.h>
#include <linux/crypto.h>
#include <linux/mempool.h>
-#include <linux/zbud.h>
+#include <linux/zpool.h>

#include <linux/mm_types.h>
#include <linux/page-flags.h>
@@ -45,8 +45,8 @@
/*********************************
* statistics
**********************************/
-/* Number of memory pages used by the compressed pool */
-static u64 zswap_pool_pages;
+/* Total bytes used by the compressed storage */
+static u64 zswap_pool_total_size;
/* The number of compressed pages currently stored in zswap */
static atomic_t zswap_stored_pages = ATOMIC_INIT(0);

@@ -89,8 +89,13 @@ static unsigned int zswap_max_pool_percent = 20;
module_param_named(max_pool_percent,
zswap_max_pool_percent, uint, 0644);

-/* zbud_pool is shared by all of zswap backend */
-static struct zbud_pool *zswap_pool;
+/* Compressed storage to use */
+#define ZSWAP_ZPOOL_DEFAULT ZPOOL_TYPE_ZBUD
+static char *zswap_zpool_type = ZSWAP_ZPOOL_DEFAULT;
+module_param_named(zpool, zswap_zpool_type, charp, 0444);
+
+/* zpool is shared by all of zswap backend */
+static struct zpool *zswap_pool;

/*********************************
* compression functions
@@ -168,7 +173,7 @@ static void zswap_comp_exit(void)
* be held while changing the refcount. Since the lock must
* be held, there is no reason to also make refcount atomic.
* offset - the swap offset for the entry. Index into the red-black tree.
- * handle - zbud allocation handle that stores the compressed page data
+ * handle - zpool allocation handle that stores the compressed page data
* length - the length in bytes of the compressed page data. Needed during
* decompression
*/
@@ -284,15 +289,15 @@ static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
}

/*
- * Carries out the common pattern of freeing and entry's zbud allocation,
+ * Carries out the common pattern of freeing and entry's zpool allocation,
* freeing the entry itself, and decrementing the number of stored pages.
*/
static void zswap_free_entry(struct zswap_entry *entry)
{
- zbud_free(zswap_pool, entry->handle);
+ zpool_free(zswap_pool, entry->handle);
zswap_entry_cache_free(entry);
atomic_dec(&zswap_stored_pages);
- zswap_pool_pages = zbud_get_pool_size(zswap_pool);
+ zswap_pool_total_size = zpool_get_total_size(zswap_pool);
}

/* caller must hold the tree lock */
@@ -409,7 +414,7 @@ cleanup:
static bool zswap_is_full(void)
{
return totalram_pages * zswap_max_pool_percent / 100 <
- zswap_pool_pages;
+ DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE);
}

/*********************************
@@ -525,7 +530,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
* the swap cache, the compressed version stored by zswap can be
* freed.
*/
-static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
+static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
{
struct zswap_header *zhdr;
swp_entry_t swpentry;
@@ -541,9 +546,9 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
};

/* extract swpentry from data */
- zhdr = zbud_map(pool, handle);
+ zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO);
swpentry = zhdr->swpentry; /* here */
- zbud_unmap(pool, handle);
+ zpool_unmap_handle(pool, handle);
tree = zswap_trees[swp_type(swpentry)];
offset = swp_offset(swpentry);

@@ -573,13 +578,13 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
case ZSWAP_SWAPCACHE_NEW: /* page is locked */
/* decompress */
dlen = PAGE_SIZE;
- src = (u8 *)zbud_map(zswap_pool, entry->handle) +
- sizeof(struct zswap_header);
+ src = (u8 *)zpool_map_handle(zswap_pool, entry->handle,
+ ZPOOL_MM_RO) + sizeof(struct zswap_header);
dst = kmap_atomic(page);
ret = zswap_comp_op(ZSWAP_COMPOP_DECOMPRESS, src,
entry->length, dst, &dlen);
kunmap_atomic(dst);
- zbud_unmap(zswap_pool, entry->handle);
+ zpool_unmap_handle(zswap_pool, entry->handle);
BUG_ON(ret);
BUG_ON(dlen != PAGE_SIZE);

@@ -652,7 +657,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
/* reclaim space if needed */
if (zswap_is_full()) {
zswap_pool_limit_hit++;
- if (zbud_reclaim_page(zswap_pool, 8)) {
+ if (zpool_shrink(zswap_pool, PAGE_SIZE)) {
zswap_reject_reclaim_fail++;
ret = -ENOMEM;
goto reject;
@@ -679,7 +684,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,

/* store */
len = dlen + sizeof(struct zswap_header);
- ret = zbud_alloc(zswap_pool, len, &handle);
+ ret = zpool_malloc(zswap_pool, len, &handle);
if (ret == -ENOSPC) {
zswap_reject_compress_poor++;
goto freepage;
@@ -688,11 +693,11 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
zswap_reject_alloc_fail++;
goto freepage;
}
- zhdr = zbud_map(zswap_pool, handle);
+ zhdr = zpool_map_handle(zswap_pool, handle, ZPOOL_MM_RW);
zhdr->swpentry = swp_entry(type, offset);
buf = (u8 *)(zhdr + 1);
memcpy(buf, dst, dlen);
- zbud_unmap(zswap_pool, handle);
+ zpool_unmap_handle(zswap_pool, handle);
put_cpu_var(zswap_dstmem);

/* populate entry */
@@ -715,7 +720,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,

/* update stats */
atomic_inc(&zswap_stored_pages);
- zswap_pool_pages = zbud_get_pool_size(zswap_pool);
+ zswap_pool_total_size = zpool_get_total_size(zswap_pool);

return 0;

@@ -751,13 +756,13 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,

/* decompress */
dlen = PAGE_SIZE;
- src = (u8 *)zbud_map(zswap_pool, entry->handle) +
- sizeof(struct zswap_header);
+ src = (u8 *)zpool_map_handle(zswap_pool, entry->handle,
+ ZPOOL_MM_RO) + sizeof(struct zswap_header);
dst = kmap_atomic(page);
ret = zswap_comp_op(ZSWAP_COMPOP_DECOMPRESS, src, entry->length,
dst, &dlen);
kunmap_atomic(dst);
- zbud_unmap(zswap_pool, entry->handle);
+ zpool_unmap_handle(zswap_pool, entry->handle);
BUG_ON(ret);

spin_lock(&tree->lock);
@@ -810,7 +815,7 @@ static void zswap_frontswap_invalidate_area(unsigned type)
zswap_trees[type] = NULL;
}

-static struct zbud_ops zswap_zbud_ops = {
+static struct zpool_ops zswap_zpool_ops = {
.evict = zswap_writeback_entry
};

@@ -868,8 +873,8 @@ static int __init zswap_debugfs_init(void)
zswap_debugfs_root, &zswap_written_back_pages);
debugfs_create_u64("duplicate_entry", S_IRUGO,
zswap_debugfs_root, &zswap_duplicate_entry);
- debugfs_create_u64("pool_pages", S_IRUGO,
- zswap_debugfs_root, &zswap_pool_pages);
+ debugfs_create_u64("pool_total_size", S_IRUGO,
+ zswap_debugfs_root, &zswap_pool_total_size);
debugfs_create_atomic_t("stored_pages", S_IRUGO,
zswap_debugfs_root, &zswap_stored_pages);

@@ -899,12 +904,15 @@ static int __init init_zswap(void)

pr_info("loading zswap\n");

- zswap_pool = zbud_create_pool(__GFP_NORETRY | __GFP_NOWARN,
- &zswap_zbud_ops);
+ zswap_pool = zpool_create_pool(zswap_zpool_type,
+ __GFP_NORETRY | __GFP_NOWARN, &zswap_zpool_ops, true);
if (!zswap_pool) {
- pr_err("zbud pool creation failed\n");
+ pr_err("zpool creation failed\n");
goto error;
}
+ if (strcmp(zswap_zpool_type, zpool_get_type(zswap_pool)))
+ pr_info("zpool gave us fallback implementation: %s\n",
+ zpool_get_type(zswap_pool));

if (zswap_entry_cache_create()) {
pr_err("entry cache creation failed\n");
@@ -928,7 +936,7 @@ pcpufail:
compfail:
zswap_entry_cache_destory();
cachefail:
- zbud_destroy_pool(zswap_pool);
+ zpool_destroy_pool(zswap_pool);
error:
return -ENOMEM;
}
--
1.8.3.1

2014-04-19 15:53:56

by Dan Streetman

[permalink] [raw]
Subject: [PATCH 2/4] mm: zpool: implement zsmalloc shrinking

Add zs_shrink() and helper functions to zsmalloc. Update zsmalloc
zs_create_pool() creation function to include ops param that provides
an evict() function for use during shrinking. Update helper function
fix_fullness_group() to always reinsert changed zspages even if the
fullness group did not change, so they are updated in the fullness
group lru. Also update zram to use the new zsmalloc pool creation
function but pass NULL as the ops param, since zram does not use
pool shrinking.

Signed-off-by: Dan Streetman <[email protected]>

---

To find all the used objects inside a zspage, I had to do a full scan
of the zspage->freelist for each object, since there's no list of used
objects, and no way to keep a list of used objects without allocating
more memory for each zspage (that I could see). Of course, by taking
a byte (or really only a bit) out of each object's memory area to use
as a flag, we could just check that instead of scanning ->freelist
for each zspage object, but that would (slightly) reduce the available
size of each zspage object.


drivers/block/zram/zram_drv.c | 2 +-
include/linux/zsmalloc.h | 7 +-
mm/zsmalloc.c | 168 ++++++++++++++++++++++++++++++++++++++----
3 files changed, 160 insertions(+), 17 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9849b52..dacf343 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -249,7 +249,7 @@ static struct zram_meta *zram_meta_alloc(u64 disksize)
goto free_meta;
}

- meta->mem_pool = zs_create_pool(GFP_NOIO | __GFP_HIGHMEM);
+ meta->mem_pool = zs_create_pool(GFP_NOIO | __GFP_HIGHMEM, NULL);
if (!meta->mem_pool) {
pr_err("Error creating memory pool\n");
goto free_table;
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index e44d634..a75ab6e 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -36,11 +36,16 @@ enum zs_mapmode {

struct zs_pool;

-struct zs_pool *zs_create_pool(gfp_t flags);
+struct zs_ops {
+ int (*evict)(struct zs_pool *pool, unsigned long handle);
+};
+
+struct zs_pool *zs_create_pool(gfp_t flags, struct zs_ops *ops);
void zs_destroy_pool(struct zs_pool *pool);

unsigned long zs_malloc(struct zs_pool *pool, size_t size);
void zs_free(struct zs_pool *pool, unsigned long obj);
+int zs_shrink(struct zs_pool *pool, size_t size);

void *zs_map_object(struct zs_pool *pool, unsigned long handle,
enum zs_mapmode mm);
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 36b4591..b99bec0 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -219,6 +219,8 @@ struct zs_pool {
struct size_class size_class[ZS_SIZE_CLASSES];

gfp_t flags; /* allocation flags used when growing pool */
+
+ struct zs_ops *ops;
};

/*
@@ -389,16 +391,14 @@ static enum fullness_group fix_fullness_group(struct zs_pool *pool,
BUG_ON(!is_first_page(page));

get_zspage_mapping(page, &class_idx, &currfg);
- newfg = get_fullness_group(page);
- if (newfg == currfg)
- goto out;
-
class = &pool->size_class[class_idx];
+ newfg = get_fullness_group(page);
+ /* Need to do this even if currfg == newfg, to update lru */
remove_zspage(page, class, currfg);
insert_zspage(page, class, newfg);
- set_zspage_mapping(page, class_idx, newfg);
+ if (currfg != newfg)
+ set_zspage_mapping(page, class_idx, newfg);

-out:
return newfg;
}

@@ -438,6 +438,36 @@ static int get_pages_per_zspage(int class_size)
}

/*
+ * To determine which class to use when shrinking, we find the
+ * first zspage class that is greater than the requested shrink
+ * size, and has at least one zspage. This returns the class
+ * with the class lock held, or NULL.
+ */
+static struct size_class *get_class_to_shrink(struct zs_pool *pool,
+ size_t size)
+{
+ struct size_class *class;
+ int i;
+ bool in_use, large_enough;
+
+ for (i = 0; i <= ZS_SIZE_CLASSES; i++) {
+ class = &pool->size_class[i];
+
+ spin_lock(&class->lock);
+
+ in_use = class->pages_allocated > 0;
+ large_enough = class->pages_per_zspage * PAGE_SIZE >= size;
+
+ if (in_use && large_enough)
+ return class;
+
+ spin_unlock(&class->lock);
+ }
+
+ return NULL;
+}
+
+/*
* A single 'zspage' is composed of many system pages which are
* linked together using fields in struct page. This function finds
* the first/head page, given any component page of a zspage.
@@ -508,6 +538,48 @@ static unsigned long obj_idx_to_offset(struct page *page,
return off + obj_idx * class_size;
}

+static bool obj_handle_is_free(struct page *first_page,
+ struct size_class *class, unsigned long handle)
+{
+ unsigned long obj, idx, offset;
+ struct page *page;
+ struct link_free *link;
+
+ BUG_ON(!is_first_page(first_page));
+
+ obj = (unsigned long)first_page->freelist;
+
+ while (obj) {
+ if (obj == handle)
+ return true;
+
+ obj_handle_to_location(obj, &page, &idx);
+ offset = obj_idx_to_offset(page, idx, class->size);
+
+ link = (struct link_free *)kmap_atomic(page) +
+ offset / sizeof(*link);
+ obj = (unsigned long)link->next;
+ kunmap_atomic(link);
+ }
+
+ return false;
+}
+
+static void obj_free(unsigned long obj, struct page *page, unsigned long offset)
+{
+ struct page *first_page = get_first_page(page);
+ struct link_free *link;
+
+ /* Insert this object in containing zspage's freelist */
+ link = (struct link_free *)((unsigned char *)kmap_atomic(page)
+ + offset);
+ link->next = first_page->freelist;
+ kunmap_atomic(link);
+ first_page->freelist = (void *)obj;
+
+ first_page->inuse--;
+}
+
static void reset_page(struct page *page)
{
clear_bit(PG_private, &page->flags);
@@ -651,6 +723,39 @@ cleanup:
return first_page;
}

+static int reclaim_zspage(struct zs_pool *pool, struct size_class *class,
+ struct page *first_page)
+{
+ struct page *page = first_page;
+ unsigned long offset = 0, handle, idx, objs;
+ int ret = 0;
+
+ BUG_ON(!is_first_page(page));
+ BUG_ON(!pool->ops);
+ BUG_ON(!pool->ops->evict);
+
+ while (page) {
+ objs = DIV_ROUND_UP(PAGE_SIZE - offset, class->size);
+
+ for (idx = 0; idx < objs; idx++) {
+ handle = (unsigned long)obj_location_to_handle(page,
+ idx);
+ if (!obj_handle_is_free(first_page, class, handle))
+ ret = pool->ops->evict(pool, handle);
+ if (ret)
+ return ret;
+ else
+ obj_free(handle, page, offset);
+ }
+
+ page = get_next_page(page);
+ if (page)
+ offset = page->index;
+ }
+
+ return 0;
+}
+
static struct page *find_get_zspage(struct size_class *class)
{
int i;
@@ -856,7 +961,7 @@ fail:
* On success, a pointer to the newly created pool is returned,
* otherwise NULL.
*/
-struct zs_pool *zs_create_pool(gfp_t flags)
+struct zs_pool *zs_create_pool(gfp_t flags, struct zs_ops *ops)
{
int i, ovhd_size;
struct zs_pool *pool;
@@ -883,6 +988,7 @@ struct zs_pool *zs_create_pool(gfp_t flags)
}

pool->flags = flags;
+ pool->ops = ops;

return pool;
}
@@ -968,7 +1074,6 @@ EXPORT_SYMBOL_GPL(zs_malloc);

void zs_free(struct zs_pool *pool, unsigned long obj)
{
- struct link_free *link;
struct page *first_page, *f_page;
unsigned long f_objidx, f_offset;

@@ -988,14 +1093,8 @@ void zs_free(struct zs_pool *pool, unsigned long obj)

spin_lock(&class->lock);

- /* Insert this object in containing zspage's freelist */
- link = (struct link_free *)((unsigned char *)kmap_atomic(f_page)
- + f_offset);
- link->next = first_page->freelist;
- kunmap_atomic(link);
- first_page->freelist = (void *)obj;
+ obj_free(obj, f_page, f_offset);

- first_page->inuse--;
fullness = fix_fullness_group(pool, first_page);

if (fullness == ZS_EMPTY)
@@ -1008,6 +1107,45 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
}
EXPORT_SYMBOL_GPL(zs_free);

+int zs_shrink(struct zs_pool *pool, size_t size)
+{
+ struct size_class *class;
+ struct page *first_page;
+ enum fullness_group fullness;
+ int ret;
+
+ if (!pool->ops || !pool->ops->evict)
+ return -EINVAL;
+
+ /* the class is returned locked */
+ class = get_class_to_shrink(pool, size);
+ if (!class)
+ return -ENOENT;
+
+ first_page = find_get_zspage(class);
+ if (!first_page) {
+ spin_unlock(&class->lock);
+ return -ENOENT;
+ }
+
+ ret = reclaim_zspage(pool, class, first_page);
+
+ if (ret) {
+ fullness = fix_fullness_group(pool, first_page);
+
+ if (fullness == ZS_EMPTY)
+ class->pages_allocated -= class->pages_per_zspage;
+ }
+
+ spin_unlock(&class->lock);
+
+ if (!ret || fullness == ZS_EMPTY)
+ free_zspage(first_page);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(zs_shrink);
+
/**
* zs_map_object - get address of allocated object from handle.
* @pool: pool from which the object was allocated
--
1.8.3.1

2014-04-21 02:47:28

by Weijie Yang

[permalink] [raw]
Subject: RE: [PATCH 0/4] mm: zpool: add common api for zswap to use zbud/zsmalloc



On Sat, Apr 19, 2014 at 11:52 PM, Dan Streetman <[email protected]> wrote:
> In order to allow zswap users to choose between zbud and zsmalloc for
> the compressed storage pool, this patch set adds a new api "zpool" that
> provides an interface to both zbud and zsmalloc. Only a minor change
> to zbud's interface was needed, as detailed in the first patch;
> zsmalloc required shrinking to be added and a minor interface change,
> as detailed in the second patch.
>
> I believe Seth originally was using zsmalloc for swap, but there were
> concerns about how significant the impact of shrinking zsmalloc would
> be when zswap had to start reclaiming pages. That still may be an
> issue, but this at least allows users to choose themselves whether
> they want a lower-density or higher-density compressed storage medium.
> At least for situations where zswap reclaim is never or rarely reached,
> it probably makes sense to use the higher density of zsmalloc.
>
> Note this patch series does not change zram to use zpool, although that
> change should be possible as well.

I think this idea is acceptable, because for embedded devices reclaiming is
risky due to its write lifetime. By using zsmalloc, zswap user can not only take
the benefit of higher-density compressed storage but aslo supporting the
GFP_HIGHMEM in 32bit system.

I will pay attention to this patch set and give my opinion after my review.

Thanks for your work

>
> Dan Streetman (4):
> mm: zpool: zbud_alloc() minor param change
> mm: zpool: implement zsmalloc shrinking
> mm: zpool: implement common zpool api to zbud/zsmalloc
> mm: zpool: update zswap to use zpool
>
> drivers/block/zram/zram_drv.c | 2 +-
> include/linux/zbud.h | 3 +-
> include/linux/zpool.h | 166 ++++++++++++++++++
> include/linux/zsmalloc.h | 7 +-
> mm/Kconfig | 43 +++--
> mm/Makefile | 1 +
> mm/zbud.c | 28 ++--
> mm/zpool.c | 380 ++++++++++++++++++++++++++++++++++++++++++
> mm/zsmalloc.c | 168 +++++++++++++++++--
> mm/zswap.c | 70 ++++----
> 10 files changed, 787 insertions(+), 81 deletions(-)
> create mode 100644 include/linux/zpool.h
> create mode 100644 mm/zpool.c
>
> --
> 1.8.3.1

2014-04-22 10:05:18

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm: zpool: implement common zpool api to zbud/zsmalloc

Hello,

On (04/19/14 11:52), Dan Streetman wrote:
>
> Add zpool api.
>
> zpool provides an interface for memory storage, typically of compressed
> memory. Users can select what backend to use; currently the only
> implementations are zbud, a low density implementation with exactly
> two compressed pages per storage page, and zsmalloc, a higher density
> implementation with multiple compressed pages per storage page.
>
> Signed-off-by: Dan Streetman <[email protected]>
> ---
> include/linux/zpool.h | 166 ++++++++++++++++++++++
> mm/Kconfig | 43 +++---
> mm/Makefile | 1 +
> mm/zpool.c | 380 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 572 insertions(+), 18 deletions(-)
> create mode 100644 include/linux/zpool.h
> create mode 100644 mm/zpool.c
>
> diff --git a/include/linux/zpool.h b/include/linux/zpool.h
> new file mode 100644
> index 0000000..82d81c6
> --- /dev/null
> +++ b/include/linux/zpool.h
> @@ -0,0 +1,166 @@
> +/*
> + * zpool memory storage api
> + *
> + * Copyright (C) 2014 Dan Streetman
> + *
> + * This is a common frontend for the zbud and zsmalloc memory
> + * storage pool implementations. Typically, this is used to
> + * store compressed memory.
> + */
> +
> +#ifndef _ZPOOL_H_
> +#define _ZPOOL_H_
> +
> +struct zpool;
> +
> +struct zpool_ops {
> + int (*evict)(struct zpool *pool, unsigned long handle);
> +};
> +
> +#define ZPOOL_TYPE_ZSMALLOC "zsmalloc"
> +#define ZPOOL_TYPE_ZBUD "zbud"
> +
> +/*
> + * Control how a handle is mapped. It will be ignored if the
> + * implementation does not support it. Its use is optional.
> + * Note that this does not refer to memory protection, it
> + * refers to how the memory will be copied in/out if copying
> + * is necessary during mapping; read-write is the safest as
> + * it copies the existing memory in on map, and copies the
> + * changed memory back out on unmap. Write-only does not copy
> + * in the memory and should only be used for initialization.
> + * If in doubt, use ZPOOL_MM_DEFAULT which is read-write.
> + */
> +enum zpool_mapmode {
> + ZPOOL_MM_RW, /* normal read-write mapping */
> + ZPOOL_MM_RO, /* read-only (no copy-out at unmap time) */
> + ZPOOL_MM_WO, /* write-only (no copy-in at map time) */
> +
> + ZPOOL_MM_DEFAULT = ZPOOL_MM_RW
> +};
> +
> +/**
> + * zpool_create_pool() - Create a new zpool
> + * @type The type of the zpool to create (e.g. zbud, zsmalloc)
> + * @flags What GFP flags should be used when the zpool allocates memory.
> + * @ops The optional ops callback.
> + * @fallback If other implementations should be used
> + *
> + * This creates a new zpool of the specified type. The zpool will use the
> + * given flags when allocating any memory. If the ops param is NULL, then
> + * the created zpool will not be shrinkable.
> + *
> + * If creation of the implementation @type fails, and @fallback is true,
> + * then other implementation(s) are tried. If @fallback is false or no
> + * implementations could be created, then NULL is returned.
> + *
> + * Returns: New zpool on success, NULL on failure.
> + */
> +struct zpool *zpool_create_pool(char *type, gfp_t flags,
> + struct zpool_ops *ops, bool fallback);
> +
> +/**
> + * zpool_get_type() - Get the type of the zpool
> + * @pool The zpool to check
> + *
> + * This returns the type of the pool, which will match one of the
> + * ZPOOL_TYPE_* defined values. This can be useful after calling
> + * zpool_create_pool() with @fallback set to true.
> + *
> + * Returns: The type of zpool.
> + */
> +char *zpool_get_type(struct zpool *pool);
> +
> +/**
> + * zpool_destroy_pool() - Destroy a zpool
> + * @pool The zpool to destroy.
> + *
> + * This destroys an existing zpool. The zpool should not be in use.
> + */
> +void zpool_destroy_pool(struct zpool *pool);
> +
> +/**
> + * zpool_malloc() - Allocate memory
> + * @pool The zpool to allocate from.
> + * @size The amount of memory to allocate.
> + * @handle Pointer to the handle to set
> + *
> + * This allocates the requested amount of memory from the pool.
> + * The provided @handle will be set to the allocated object handle.
> + *
> + * Returns: 0 on success, negative value on error.
> + */
> +int zpool_malloc(struct zpool *pool, size_t size, unsigned long *handle);
> +
> +/**
> + * zpool_free() - Free previously allocated memory
> + * @pool The zpool that allocated the memory.
> + * @handle The handle to the memory to free.
> + *
> + * This frees previously allocated memory. This does not guarantee
> + * that the pool will actually free memory, only that the memory
> + * in the pool will become available for use by the pool.
> + */
> +void zpool_free(struct zpool *pool, unsigned long handle);
> +
> +/**
> + * zpool_shrink() - Shrink the pool size
> + * @pool The zpool to shrink.
> + * @size The minimum amount to shrink the pool.
> + *
> + * This attempts to shrink the actual memory size of the pool
> + * by evicting currently used handle(s). If the pool was
> + * created with no zpool_ops, or the evict call fails for any
> + * of the handles, this will fail.
> + *
> + * Returns: 0 on success, negative value on error/failure.
> + */
> +int zpool_shrink(struct zpool *pool, size_t size);
> +
> +/**
> + * zpool_map_handle() - Map a previously allocated handle into memory
> + * @pool The zpool that the handle was allocated from
> + * @handle The handle to map
> + * @mm How the memory should be mapped
> + *
> + * This maps a previously allocated handle into memory. The @mm
> + * param indicates to the implemenation how the memory will be
> + * used, i.e. read-only, write-only, read-write. If the
> + * implementation does not support it, the memory will be treated
> + * as read-write.
> + *
> + * This may hold locks, disable interrupts, and/or preemption,
> + * and the zpool_unmap_handle() must be called to undo those
> + * actions. The code that uses the mapped handle should complete
> + * its operatons on the mapped handle memory quickly and unmap
> + * as soon as possible. Multiple handles should not be mapped
> + * concurrently on a cpu.
> + *
> + * Returns: A pointer to the handle's mapped memory area.
> + */
> +void *zpool_map_handle(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode mm);
> +
> +/**
> + * zpool_unmap_handle() - Unmap a previously mapped handle
> + * @pool The zpool that the handle was allocated from
> + * @handle The handle to unmap
> + *
> + * This unmaps a previously mapped handle. Any locks or other
> + * actions that the implemenation took in zpool_map_handle()
> + * will be undone here. The memory area returned from
> + * zpool_map_handle() should no longer be used after this.
> + */
> +void zpool_unmap_handle(struct zpool *pool, unsigned long handle);
> +
> +/**
> + * zpool_get_total_size() - The total size of the pool
> + * @pool The zpool to check
> + *
> + * This returns the total size in bytes of the pool.
> + *
> + * Returns: Total size of the zpool in bytes.
> + */
> +u64 zpool_get_total_size(struct zpool *pool);
> +
> +#endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ebe5880..ed7715c 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -512,21 +512,23 @@ config CMA_DEBUG
> processing calls such as dma_alloc_from_contiguous().
> This option does not affect warning and error messages.
>
> -config ZBUD
> - tristate
> - default n
> +config MEM_SOFT_DIRTY
> + bool "Track memory changes"
> + depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
> + select PROC_PAGE_MONITOR
> help
> - A special purpose allocator for storing compressed pages.
> - It is designed to store up to two compressed pages per physical
> - page. While this design limits storage density, it has simple and
> - deterministic reclaim properties that make it preferable to a higher
> - density approach when reclaim will be used.
> + This option enables memory changes tracking by introducing a
> + soft-dirty bit on pte-s. This bit it set when someone writes
> + into a page just as regular dirty bit, but unlike the latter
> + it can be cleared by hands.
> +
> + See Documentation/vm/soft-dirty.txt for more details.
>
> config ZSWAP
> bool "Compressed cache for swap pages (EXPERIMENTAL)"
> depends on FRONTSWAP && CRYPTO=y
> select CRYPTO_LZO
> - select ZBUD
> + select ZPOOL
> default n
> help
> A lightweight compressed cache for swap pages. It takes
> @@ -542,17 +544,22 @@ config ZSWAP
> they have not be fully explored on the large set of potential
> configurations and workloads that exist.
>
> -config MEM_SOFT_DIRTY
> - bool "Track memory changes"
> - depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
> - select PROC_PAGE_MONITOR
> +config ZPOOL
> + tristate "Common API for compressed memory storage"
> + default n
> help
> - This option enables memory changes tracking by introducing a
> - soft-dirty bit on pte-s. This bit it set when someone writes
> - into a page just as regular dirty bit, but unlike the latter
> - it can be cleared by hands.
> + Compressed memory storage API. This allows using either zbud or
> + zsmalloc.
>
> - See Documentation/vm/soft-dirty.txt for more details.
> +config ZBUD
> + tristate "Low density storage for compressed pages"
> + default n
> + help
> + A special purpose allocator for storing compressed pages.
> + It is designed to store up to two compressed pages per physical
> + page. While this design limits storage density, it has simple and
> + deterministic reclaim properties that make it preferable to a higher
> + density approach when reclaim will be used.
>
> config ZSMALLOC
> bool "Memory allocator for compressed pages"
> diff --git a/mm/Makefile b/mm/Makefile
> index 60cacbb..4135f7c 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -60,6 +60,7 @@ obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
> obj-$(CONFIG_CLEANCACHE) += cleancache.o
> obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
> obj-$(CONFIG_PAGE_OWNER) += pageowner.o
> +obj-$(CONFIG_ZPOOL) += zpool.o

side note, this fails to apply on linux-next. mm/Makefile does not contain
CONFIG_PAGE_OWNER

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/mm/Makefile?id=refs/tags/next-20140422

what tree this patchset is against of?

-ss

> obj-$(CONFIG_ZBUD) += zbud.o
> obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
> obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
> diff --git a/mm/zpool.c b/mm/zpool.c
> new file mode 100644
> index 0000000..592cc0d
> --- /dev/null
> +++ b/mm/zpool.c
> @@ -0,0 +1,380 @@
> +/*
> + * zpool memory storage api
> + *
> + * Copyright (C) 2014 Dan Streetman
> + *
> + * This is a common frontend for the zbud and zsmalloc memory
> + * storage pool implementations. Typically, this is used to
> + * store compressed memory.
> + */
> +
> +#include <linux/list.h>
> +#include <linux/types.h>
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/zpool.h>
> +#include <linux/zbud.h>
> +#include <linux/zsmalloc.h>
> +
> +struct zpool_imp {
> + void (*destroy)(struct zpool *pool);
> +
> + int (*malloc)(struct zpool *pool, size_t size, unsigned long *handle);
> + void (*free)(struct zpool *pool, unsigned long handle);
> +
> + int (*shrink)(struct zpool *pool, size_t size);
> +
> + void *(*map)(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode mm);
> + void (*unmap)(struct zpool *pool, unsigned long handle);
> +
> + u64 (*total_size)(struct zpool *pool);
> +};
> +
> +struct zpool {
> + char *type;
> +
> + union {
> +#ifdef CONFIG_ZSMALLOC
> + struct zs_pool *zsmalloc_pool;
> +#endif
> +#ifdef CONFIG_ZBUD
> + struct zbud_pool *zbud_pool;
> +#endif
> + };
> +
> + struct zpool_imp *imp;
> + struct zpool_ops *ops;
> +
> + struct list_head list;
> +};
> +
> +static LIST_HEAD(zpools);
> +static DEFINE_SPINLOCK(zpools_lock);
> +
> +static int zpool_noop_evict(struct zpool *pool, unsigned long handle)
> +{
> + return -EINVAL;
> +}
> +static struct zpool_ops zpool_noop_ops = {
> + .evict = zpool_noop_evict
> +};
> +
> +
> +/* zsmalloc */
> +
> +#ifdef CONFIG_ZSMALLOC
> +
> +static void zpool_zsmalloc_destroy(struct zpool *zpool)
> +{
> + spin_lock(&zpools_lock);
> + list_del(&zpool->list);
> + spin_unlock(&zpools_lock);
> +
> + zs_destroy_pool(zpool->zsmalloc_pool);
> + kfree(zpool);
> +}
> +
> +static int zpool_zsmalloc_malloc(struct zpool *pool, size_t size,
> + unsigned long *handle)
> +{
> + *handle = zs_malloc(pool->zsmalloc_pool, size);
> + return *handle ? 0 : -1;
> +}
> +
> +static void zpool_zsmalloc_free(struct zpool *pool, unsigned long handle)
> +{
> + zs_free(pool->zsmalloc_pool, handle);
> +}
> +
> +static int zpool_zsmalloc_shrink(struct zpool *pool, size_t size)
> +{
> + return zs_shrink(pool->zsmalloc_pool, size);
> +}
> +
> +static void *zpool_zsmalloc_map(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode zpool_mapmode)
> +{
> + enum zs_mapmode zs_mapmode;
> +
> + switch (zpool_mapmode) {
> + case ZPOOL_MM_RO:
> + zs_mapmode = ZS_MM_RO; break;
> + case ZPOOL_MM_WO:
> + zs_mapmode = ZS_MM_WO; break;
> + case ZPOOL_MM_RW: /* fallthrough */
> + default:
> + zs_mapmode = ZS_MM_RW; break;
> + }
> + return zs_map_object(pool->zsmalloc_pool, handle, zs_mapmode);
> +}
> +
> +static void zpool_zsmalloc_unmap(struct zpool *pool, unsigned long handle)
> +{
> + zs_unmap_object(pool->zsmalloc_pool, handle);
> +}
> +
> +static u64 zpool_zsmalloc_total_size(struct zpool *pool)
> +{
> + return zs_get_total_size_bytes(pool->zsmalloc_pool);
> +}
> +
> +static int zpool_zsmalloc_evict(struct zs_pool *zsmalloc_pool,
> + unsigned long handle)
> +{
> + struct zpool *zpool;
> +
> + spin_lock(&zpools_lock);
> + list_for_each_entry(zpool, &zpools, list) {
> + if (zpool->zsmalloc_pool == zsmalloc_pool) {
> + spin_unlock(&zpools_lock);
> + return zpool->ops->evict(zpool, handle);
> + }
> + }
> + spin_unlock(&zpools_lock);
> + return -EINVAL;
> +}
> +
> +static struct zpool_imp zpool_zsmalloc_imp = {
> + .destroy = zpool_zsmalloc_destroy,
> + .malloc = zpool_zsmalloc_malloc,
> + .free = zpool_zsmalloc_free,
> + .shrink = zpool_zsmalloc_shrink,
> + .map = zpool_zsmalloc_map,
> + .unmap = zpool_zsmalloc_unmap,
> + .total_size = zpool_zsmalloc_total_size
> +};
> +
> +static struct zs_ops zpool_zsmalloc_ops = {
> + .evict = zpool_zsmalloc_evict
> +};
> +
> +static struct zpool *zpool_zsmalloc_create(gfp_t flags, struct zpool_ops *ops)
> +{
> + struct zpool *zpool;
> + struct zs_ops *zs_ops = (ops ? &zpool_zsmalloc_ops : NULL);
> +
> + zpool = kmalloc(sizeof(*zpool), GFP_KERNEL);
> + if (!zpool)
> + return NULL;
> +
> + zpool->zsmalloc_pool = zs_create_pool(flags, zs_ops);
> + if (!zpool->zsmalloc_pool) {
> + kfree(zpool);
> + return NULL;
> + }
> +
> + zpool->type = ZPOOL_TYPE_ZSMALLOC;
> + zpool->imp = &zpool_zsmalloc_imp;
> + zpool->ops = (ops ? ops : &zpool_noop_ops);
> + spin_lock(&zpools_lock);
> + list_add(&zpool->list, &zpools);
> + spin_unlock(&zpools_lock);
> +
> + return zpool;
> +}
> +
> +#else
> +
> +static struct zpool *zpool_zsmalloc_create(gfp_t flags, struct zpool_ops *ops)
> +{
> + pr_info("zpool: no zsmalloc in this kernel\n");
> + return NULL;
> +}
> +
> +#endif /* CONFIG_ZSMALLOC */
> +
> +
> +/* zbud */
> +
> +#ifdef CONFIG_ZBUD
> +
> +static void zpool_zbud_destroy(struct zpool *zpool)
> +{
> + spin_lock(&zpools_lock);
> + list_del(&zpool->list);
> + spin_unlock(&zpools_lock);
> +
> + zbud_destroy_pool(zpool->zbud_pool);
> + kfree(zpool);
> +}
> +
> +static int zpool_zbud_malloc(struct zpool *pool, size_t size,
> + unsigned long *handle)
> +{
> + return zbud_alloc(pool->zbud_pool, size, handle);
> +}
> +
> +static void zpool_zbud_free(struct zpool *pool, unsigned long handle)
> +{
> + zbud_free(pool->zbud_pool, handle);
> +}
> +
> +static int zpool_zbud_shrink(struct zpool *pool, size_t size)
> +{
> + return zbud_reclaim_page(pool->zbud_pool, 3);
> +}
> +
> +static void *zpool_zbud_map(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode zpool_mapmode)
> +{
> + return zbud_map(pool->zbud_pool, handle);
> +}
> +
> +static void zpool_zbud_unmap(struct zpool *pool, unsigned long handle)
> +{
> + zbud_unmap(pool->zbud_pool, handle);
> +}
> +
> +static u64 zpool_zbud_total_size(struct zpool *pool)
> +{
> + return zbud_get_pool_size(pool->zbud_pool) * PAGE_SIZE;
> +}
> +
> +static int zpool_zbud_evict(struct zbud_pool *zbud_pool, unsigned long handle)
> +{
> + struct zpool *zpool;
> +
> + spin_lock(&zpools_lock);
> + list_for_each_entry(zpool, &zpools, list) {
> + if (zpool->zbud_pool == zbud_pool) {
> + spin_unlock(&zpools_lock);
> + return zpool->ops->evict(zpool, handle);
> + }
> + }
> + spin_unlock(&zpools_lock);
> + return -EINVAL;
> +}
> +
> +static struct zpool_imp zpool_zbud_imp = {
> + .destroy = zpool_zbud_destroy,
> + .malloc = zpool_zbud_malloc,
> + .free = zpool_zbud_free,
> + .shrink = zpool_zbud_shrink,
> + .map = zpool_zbud_map,
> + .unmap = zpool_zbud_unmap,
> + .total_size = zpool_zbud_total_size
> +};
> +
> +static struct zbud_ops zpool_zbud_ops = {
> + .evict = zpool_zbud_evict
> +};
> +
> +static struct zpool *zpool_zbud_create(gfp_t flags, struct zpool_ops *ops)
> +{
> + struct zpool *zpool;
> + struct zbud_ops *zbud_ops = (ops ? &zpool_zbud_ops : NULL);
> +
> + zpool = kmalloc(sizeof(*zpool), GFP_KERNEL);
> + if (!zpool)
> + return NULL;
> +
> + zpool->zbud_pool = zbud_create_pool(flags, zbud_ops);
> + if (!zpool->zbud_pool) {
> + kfree(zpool);
> + return NULL;
> + }
> +
> + zpool->type = ZPOOL_TYPE_ZBUD;
> + zpool->imp = &zpool_zbud_imp;
> + zpool->ops = (ops ? ops : &zpool_noop_ops);
> + spin_lock(&zpools_lock);
> + list_add(&zpool->list, &zpools);
> + spin_unlock(&zpools_lock);
> +
> + return zpool;
> +}
> +
> +#else
> +
> +static struct zpool *zpool_zbud_create(gfp_t flags, struct zpool_ops *ops)
> +{
> + pr_info("zpool: no zbud in this kernel\n");
> + return NULL;
> +}
> +
> +#endif /* CONFIG_ZBUD */
> +
> +
> +struct zpool *zpool_fallback_create(gfp_t flags, struct zpool_ops *ops)
> +{
> + struct zpool *pool = NULL;
> +
> +#ifdef CONFIG_ZSMALLOC
> + pool = zpool_zsmalloc_create(flags, ops);
> + if (pool)
> + return pool;
> + pr_info("zpool: fallback unable to create zsmalloc pool\n");
> +#endif
> +
> +#ifdef CONFIG_ZBUD
> + pool = zpool_zbud_create(flags, ops);
> + if (!pool)
> + pr_info("zpool: fallback unable to create zbud pool\n");
> +#endif
> +
> + return pool;
> +}
> +
> +struct zpool *zpool_create_pool(char *type, gfp_t flags,
> + struct zpool_ops *ops, bool fallback)
> +{
> + struct zpool *pool = NULL;
> +
> + if (!strcmp(type, ZPOOL_TYPE_ZSMALLOC))
> + pool = zpool_zsmalloc_create(flags, ops);
> + else if (!strcmp(type, ZPOOL_TYPE_ZBUD))
> + pool = zpool_zbud_create(flags, ops);
> + else
> + pr_err("zpool: unknown type %s\n", type);
> +
> + if (!pool && fallback)
> + pool = zpool_fallback_create(flags, ops);
> +
> + if (!pool)
> + pr_err("zpool: couldn't create zpool\n");
> +
> + return pool;
> +}
> +
> +char *zpool_get_type(struct zpool *pool)
> +{
> + return pool->type;
> +}
> +
> +void zpool_destroy_pool(struct zpool *pool)
> +{
> + pool->imp->destroy(pool);
> +}
> +
> +int zpool_malloc(struct zpool *pool, size_t size, unsigned long *handle)
> +{
> + return pool->imp->malloc(pool, size, handle);
> +}
> +
> +void zpool_free(struct zpool *pool, unsigned long handle)
> +{
> + pool->imp->free(pool, handle);
> +}
> +
> +int zpool_shrink(struct zpool *pool, size_t size)
> +{
> + return pool->imp->shrink(pool, size);
> +}
> +
> +void *zpool_map_handle(struct zpool *pool, unsigned long handle,
> + enum zpool_mapmode mapmode)
> +{
> + return pool->imp->map(pool, handle, mapmode);
> +}
> +
> +void zpool_unmap_handle(struct zpool *pool, unsigned long handle)
> +{
> + pool->imp->unmap(pool, handle);
> +}
> +
> +u64 zpool_get_total_size(struct zpool *pool)
> +{
> + return pool->imp->total_size(pool);
> +}
> --
> 1.8.3.1
>

2014-04-22 13:44:25

by Dan Streetman

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm: zpool: implement common zpool api to zbud/zsmalloc

On Tue, Apr 22, 2014 at 6:05 AM, Sergey Senozhatsky
<[email protected]> wrote:
> Hello,
>> diff --git a/mm/Makefile b/mm/Makefile
>> index 60cacbb..4135f7c 100644
>> --- a/mm/Makefile
>> +++ b/mm/Makefile
>> @@ -60,6 +60,7 @@ obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
>> obj-$(CONFIG_CLEANCACHE) += cleancache.o
>> obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
>> obj-$(CONFIG_PAGE_OWNER) += pageowner.o
>> +obj-$(CONFIG_ZPOOL) += zpool.o
>
> side note, this fails to apply on linux-next. mm/Makefile does not contain
> CONFIG_PAGE_OWNER
>
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/mm/Makefile?id=refs/tags/next-20140422
>
> what tree this patchset is against of?

It's against this mmotm tree:
git://git.cmpxchg.org/linux-mmotm.git

2014-04-26 08:37:38

by Weijie Yang

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm: zpool: implement zsmalloc shrinking

On Sat, Apr 19, 2014 at 11:52 PM, Dan Streetman <[email protected]> wrote:
> Add zs_shrink() and helper functions to zsmalloc. Update zsmalloc
> zs_create_pool() creation function to include ops param that provides
> an evict() function for use during shrinking. Update helper function
> fix_fullness_group() to always reinsert changed zspages even if the
> fullness group did not change, so they are updated in the fullness
> group lru. Also update zram to use the new zsmalloc pool creation
> function but pass NULL as the ops param, since zram does not use
> pool shrinking.
>

I only review the code without test, however, I think this patch is
not acceptable.

The biggest problem is it will call zswap_writeback_entry() under lock,
zswap_writeback_entry() may sleep, so it is a bug. see below

The 3/4 patch has a lot of #ifdef, I don't think it's a good kind of
abstract way.

What about just disable zswap reclaim when using zsmalloc?
There is a long way to optimize writeback reclaim(both zswap and zram) ,
Maybe a small and simple step forward is better.

Regards,

> Signed-off-by: Dan Streetman <[email protected]>
>
> ---
>
> To find all the used objects inside a zspage, I had to do a full scan
> of the zspage->freelist for each object, since there's no list of used
> objects, and no way to keep a list of used objects without allocating
> more memory for each zspage (that I could see). Of course, by taking
> a byte (or really only a bit) out of each object's memory area to use
> as a flag, we could just check that instead of scanning ->freelist
> for each zspage object, but that would (slightly) reduce the available
> size of each zspage object.
>
>
> drivers/block/zram/zram_drv.c | 2 +-
> include/linux/zsmalloc.h | 7 +-
> mm/zsmalloc.c | 168 ++++++++++++++++++++++++++++++++++++++----
> 3 files changed, 160 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 9849b52..dacf343 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -249,7 +249,7 @@ static struct zram_meta *zram_meta_alloc(u64 disksize)
> goto free_meta;
> }
>
> - meta->mem_pool = zs_create_pool(GFP_NOIO | __GFP_HIGHMEM);
> + meta->mem_pool = zs_create_pool(GFP_NOIO | __GFP_HIGHMEM, NULL);
> if (!meta->mem_pool) {
> pr_err("Error creating memory pool\n");
> goto free_table;
> diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
> index e44d634..a75ab6e 100644
> --- a/include/linux/zsmalloc.h
> +++ b/include/linux/zsmalloc.h
> @@ -36,11 +36,16 @@ enum zs_mapmode {
>
> struct zs_pool;
>
> -struct zs_pool *zs_create_pool(gfp_t flags);
> +struct zs_ops {
> + int (*evict)(struct zs_pool *pool, unsigned long handle);
> +};
> +
> +struct zs_pool *zs_create_pool(gfp_t flags, struct zs_ops *ops);
> void zs_destroy_pool(struct zs_pool *pool);
>
> unsigned long zs_malloc(struct zs_pool *pool, size_t size);
> void zs_free(struct zs_pool *pool, unsigned long obj);
> +int zs_shrink(struct zs_pool *pool, size_t size);
>
> void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> enum zs_mapmode mm);
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 36b4591..b99bec0 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -219,6 +219,8 @@ struct zs_pool {
> struct size_class size_class[ZS_SIZE_CLASSES];
>
> gfp_t flags; /* allocation flags used when growing pool */
> +
> + struct zs_ops *ops;
> };
>
> /*
> @@ -389,16 +391,14 @@ static enum fullness_group fix_fullness_group(struct zs_pool *pool,
> BUG_ON(!is_first_page(page));
>
> get_zspage_mapping(page, &class_idx, &currfg);
> - newfg = get_fullness_group(page);
> - if (newfg == currfg)
> - goto out;
> -
> class = &pool->size_class[class_idx];
> + newfg = get_fullness_group(page);
> + /* Need to do this even if currfg == newfg, to update lru */
> remove_zspage(page, class, currfg);
> insert_zspage(page, class, newfg);
> - set_zspage_mapping(page, class_idx, newfg);
> + if (currfg != newfg)
> + set_zspage_mapping(page, class_idx, newfg);
>
> -out:
> return newfg;
> }
>
> @@ -438,6 +438,36 @@ static int get_pages_per_zspage(int class_size)
> }
>
> /*
> + * To determine which class to use when shrinking, we find the
> + * first zspage class that is greater than the requested shrink
> + * size, and has at least one zspage. This returns the class
> + * with the class lock held, or NULL.
> + */
> +static struct size_class *get_class_to_shrink(struct zs_pool *pool,
> + size_t size)
> +{
> + struct size_class *class;
> + int i;
> + bool in_use, large_enough;
> +
> + for (i = 0; i <= ZS_SIZE_CLASSES; i++) {
> + class = &pool->size_class[i];
> +
> + spin_lock(&class->lock);
> +
> + in_use = class->pages_allocated > 0;
> + large_enough = class->pages_per_zspage * PAGE_SIZE >= size;
> +
> + if (in_use && large_enough)
> + return class;
> +
> + spin_unlock(&class->lock);
> + }
> +
> + return NULL;
> +}
> +
> +/*
> * A single 'zspage' is composed of many system pages which are
> * linked together using fields in struct page. This function finds
> * the first/head page, given any component page of a zspage.
> @@ -508,6 +538,48 @@ static unsigned long obj_idx_to_offset(struct page *page,
> return off + obj_idx * class_size;
> }
>
> +static bool obj_handle_is_free(struct page *first_page,
> + struct size_class *class, unsigned long handle)
> +{
> + unsigned long obj, idx, offset;
> + struct page *page;
> + struct link_free *link;
> +
> + BUG_ON(!is_first_page(first_page));
> +
> + obj = (unsigned long)first_page->freelist;
> +
> + while (obj) {
> + if (obj == handle)
> + return true;
> +
> + obj_handle_to_location(obj, &page, &idx);
> + offset = obj_idx_to_offset(page, idx, class->size);
> +
> + link = (struct link_free *)kmap_atomic(page) +
> + offset / sizeof(*link);
> + obj = (unsigned long)link->next;
> + kunmap_atomic(link);
> + }
> +
> + return false;
> +}
> +
> +static void obj_free(unsigned long obj, struct page *page, unsigned long offset)
> +{
> + struct page *first_page = get_first_page(page);
> + struct link_free *link;
> +
> + /* Insert this object in containing zspage's freelist */
> + link = (struct link_free *)((unsigned char *)kmap_atomic(page)
> + + offset);
> + link->next = first_page->freelist;
> + kunmap_atomic(link);
> + first_page->freelist = (void *)obj;
> +
> + first_page->inuse--;
> +}
> +
> static void reset_page(struct page *page)
> {
> clear_bit(PG_private, &page->flags);
> @@ -651,6 +723,39 @@ cleanup:
> return first_page;
> }
>
> +static int reclaim_zspage(struct zs_pool *pool, struct size_class *class,
> + struct page *first_page)
> +{
> + struct page *page = first_page;
> + unsigned long offset = 0, handle, idx, objs;
> + int ret = 0;
> +
> + BUG_ON(!is_first_page(page));
> + BUG_ON(!pool->ops);
> + BUG_ON(!pool->ops->evict);
> +
> + while (page) {
> + objs = DIV_ROUND_UP(PAGE_SIZE - offset, class->size);
> +
> + for (idx = 0; idx < objs; idx++) {
> + handle = (unsigned long)obj_location_to_handle(page,
> + idx);
> + if (!obj_handle_is_free(first_page, class, handle))
> + ret = pool->ops->evict(pool, handle);

call zswap_writeback_entry() under class->lock.

> + if (ret)
> + return ret;
> + else
> + obj_free(handle, page, offset);
> + }
> +
> + page = get_next_page(page);
> + if (page)
> + offset = page->index;
> + }
> +
> + return 0;
> +}
> +
> static struct page *find_get_zspage(struct size_class *class)
> {
> int i;
> @@ -856,7 +961,7 @@ fail:
> * On success, a pointer to the newly created pool is returned,
> * otherwise NULL.
> */
> -struct zs_pool *zs_create_pool(gfp_t flags)
> +struct zs_pool *zs_create_pool(gfp_t flags, struct zs_ops *ops)
> {
> int i, ovhd_size;
> struct zs_pool *pool;
> @@ -883,6 +988,7 @@ struct zs_pool *zs_create_pool(gfp_t flags)
> }
>
> pool->flags = flags;
> + pool->ops = ops;
>
> return pool;
> }
> @@ -968,7 +1074,6 @@ EXPORT_SYMBOL_GPL(zs_malloc);
>
> void zs_free(struct zs_pool *pool, unsigned long obj)
> {
> - struct link_free *link;
> struct page *first_page, *f_page;
> unsigned long f_objidx, f_offset;
>
> @@ -988,14 +1093,8 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
>
> spin_lock(&class->lock);
>
> - /* Insert this object in containing zspage's freelist */
> - link = (struct link_free *)((unsigned char *)kmap_atomic(f_page)
> - + f_offset);
> - link->next = first_page->freelist;
> - kunmap_atomic(link);
> - first_page->freelist = (void *)obj;
> + obj_free(obj, f_page, f_offset);
>
> - first_page->inuse--;
> fullness = fix_fullness_group(pool, first_page);
>
> if (fullness == ZS_EMPTY)
> @@ -1008,6 +1107,45 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
> }
> EXPORT_SYMBOL_GPL(zs_free);
>
> +int zs_shrink(struct zs_pool *pool, size_t size)
> +{
> + struct size_class *class;
> + struct page *first_page;
> + enum fullness_group fullness;
> + int ret;
> +
> + if (!pool->ops || !pool->ops->evict)
> + return -EINVAL;
> +
> + /* the class is returned locked */
> + class = get_class_to_shrink(pool, size);
> + if (!class)
> + return -ENOENT;
> +
> + first_page = find_get_zspage(class);
> + if (!first_page) {
> + spin_unlock(&class->lock);
> + return -ENOENT;
> + }
> +
> + ret = reclaim_zspage(pool, class, first_page);
> +
> + if (ret) {
> + fullness = fix_fullness_group(pool, first_page);
> +
> + if (fullness == ZS_EMPTY)
> + class->pages_allocated -= class->pages_per_zspage;
> + }
> +
> + spin_unlock(&class->lock);
> +
> + if (!ret || fullness == ZS_EMPTY)
> + free_zspage(first_page);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(zs_shrink);
> +
> /**
> * zs_map_object - get address of allocated object from handle.
> * @pool: pool from which the object was allocated
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-04-27 04:13:54

by Dan Streetman

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm: zpool: implement zsmalloc shrinking

On Sat, Apr 26, 2014 at 4:37 AM, Weijie Yang <[email protected]> wrote:
> On Sat, Apr 19, 2014 at 11:52 PM, Dan Streetman <[email protected]> wrote:
>> Add zs_shrink() and helper functions to zsmalloc. Update zsmalloc
>> zs_create_pool() creation function to include ops param that provides
>> an evict() function for use during shrinking. Update helper function
>> fix_fullness_group() to always reinsert changed zspages even if the
>> fullness group did not change, so they are updated in the fullness
>> group lru. Also update zram to use the new zsmalloc pool creation
>> function but pass NULL as the ops param, since zram does not use
>> pool shrinking.
>>
>
> I only review the code without test, however, I think this patch is
> not acceptable.
>
> The biggest problem is it will call zswap_writeback_entry() under lock,
> zswap_writeback_entry() may sleep, so it is a bug. see below

thanks for catching that!

>
> The 3/4 patch has a lot of #ifdef, I don't think it's a good kind of
> abstract way.

it has the #ifdef's because there's no point in compiling in code to
use zbud/zsmalloc if zbud/zsmalloc isn't compiled...what alternative
to #ifdef's would you suggest? Or are there just specific #ifdefs you
suggest to remove?

>
> What about just disable zswap reclaim when using zsmalloc?
> There is a long way to optimize writeback reclaim(both zswap and zram) ,
> Maybe a small and simple step forward is better.

I think it's possible to just remove the zspage from the class while
under lock, then unlock and reclaim it. As long as there's a
guarantee that zswap (or any zpool/zsmalloc reclaim user) doesn't
map/access the handle after evict() completes successfully, that
should work. There does need to be some synchronization between
zs_free() and each handle's eviction though, similar to zbud's
under_reclaim flag. I'll work on a v2 patch.