2012-11-14 18:58:57

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v2] enable all tmem backends to be built and loaded as modules.

There are also some patch I wrote up that are based on this patchset
that I will post soonish.

I hope these patches come out - I am travelling and not sure if my
git-send-email foo will make this work.

I copying here what Dan mentioned with some modifications by me:

Since various parts of transcendent memory ("tmem") [1] were first posted in
2009, reviewers have suggested that various tmem features should be built
as a module and enabled by loading the module, rather than the current clunky
method of compiling as a built-in and enabling via boot parameter. Due
to certain tmem initialization steps, that was not feasible at the time.

[1] http://lwn.net/Articles/454795/

This patchset allows each of the three merged transcendent memory
backends (zcache, ramster, Xen tmem) to be used as modules by first
enabling transcendent memory frontends (cleancache, frontswap) to deal
with "lazy initialization" and, second, by adding the necessary code for
the backends to be built and loaded as modules.

The original mechanism to enable tmem backends -- namely to hardwire
them into the kernel and select/enable one with a kernel boot
parameter -- is retained but should be considered deprecated. When
backends are loaded as modules, certain knobs will now be
properly selected via module_params rather than via undocumented
kernel boot parameters. Note that module UNloading is not yet
supported as it is lower priority and will require significant
additional work.

The lazy initialization support is necessary because filesystems
and swap devices are normally mounted early in boot and these
activites normally trigger tmem calls to setup certain data structures;
if the respective cleancache/frontswap ops are not yet registered
by a back end, the tmem setup would fail for these devices and
cleancache/frontswap would never be enabled for them which limits
much of the value of tmem in many system configurations. Lazy
initialization records the necessary information in cleancache/frontswap
data structures and "replays" it after the ops are registered
to ensure that all filesystems and swap devices can benefit from
the loaded tmem backend.

[PATCH 1/8] mm: cleancache: lazy initialization to allow tmem
[PATCH 2/8] mm: frontswap: lazy initialization to allow tmem

Patches 1 and 2 are the original [2] patches to cleancache and frontswap
proposed by Erlangen University, but rebased to 3.7-rcN plus a couple
of bug fixes I found necessary to run properly + extra review comments.

[2] http://www.spinics.net/lists/linux-mm/msg31490.html

The other two:
[PATCH 3/8] frontswap: Make frontswap_init use a pointer for the
[PATCH 4/8] cleancache: Make cleancache_init use a pointer for the
do a bit of code cleanup that can be done to make it easier to
read and also remove some of the bools.

[PATCH 5/8] staging: zcache2+ramster: enable ramster to be
Enables module support for zcache2. Zsmalloc support
has not yet been merged into zcache2 but, once merged, could now
easily be selected via a module_param.

[PATCH 6/8] staging: zcache2+ramster: enable zcache2 to be
Enables module support for ramster. Ramster will now be
enabled with a module_param to zcache2.

[PATCH 7/8] xen: tmem: enable Xen tmem shim to be built/loaded as a
[PATCH 8/8] xen/tmem: Remove the subsys call.
Enables module support for the Xen tmem shim. Xen self-ballooning and
frontswap-selfshrinking are also "lazily"
initialized when the Xen tmem shim is loaded as a module, unless
explicitly disabled by module_params.


Dan Magenheimer (5):
mm: cleancache: lazy initialization to allow tmem backends to build/run as modules
mm: frontswap: lazy initialization to allow tmem backends to build/run as modules
staging: zcache2+ramster: enable ramster to be built/loaded as a module
staging: zcache2+ramster: enable zcache2 to be built/loaded as a module
xen: tmem: enable Xen tmem shim to be built/loaded as a module

Konrad Rzeszutek Wilk (3):
frontswap: Make frontswap_init use a pointer for the ops.
cleancache: Make cleancache_init use a pointer for the ops
xen/tmem: Remove the subsys call.


drivers/staging/ramster/Kconfig | 6 +-
drivers/staging/ramster/Makefile | 11 +-
drivers/staging/ramster/ramster.h | 6 +-
drivers/staging/ramster/ramster/nodemanager.c | 9 +-
drivers/staging/ramster/ramster/ramster.c | 29 +++-
drivers/staging/ramster/ramster/ramster.h | 2 +-
.../staging/ramster/ramster/ramster_nodemanager.h | 2 +
drivers/staging/ramster/tmem.c | 6 +-
drivers/staging/ramster/tmem.h | 8 +-
drivers/staging/ramster/zcache-main.c | 63 ++++++--
drivers/staging/ramster/zcache.h | 2 +-
drivers/staging/zcache/zcache-main.c | 16 +-
drivers/xen/Kconfig | 4 +-
drivers/xen/tmem.c | 50 ++++--
drivers/xen/xen-selfballoon.c | 13 +-
include/linux/cleancache.h | 2 +-
include/linux/frontswap.h | 2 +-
include/xen/tmem.h | 8 +
mm/cleancache.c | 167 +++++++++++++++++---
mm/frontswap.c | 78 +++++++--
20 files changed, 371 insertions(+), 113 deletions(-)


2012-11-14 18:57:39

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 2/8] mm: frontswap: lazy initialization to allow tmem backends to build/run as modules

From: Dan Magenheimer <[email protected]>

With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
built/loaded as modules rather than built-in and enabled by a boot parameter,
this patch provides "lazy initialization", allowing backends to register to
frontswap even after swapon was run. Before a backend registers all calls
to init are recorded and the creation of tmem_pools delayed until a backend
registers or until a frontswap put is attempted.

Signed-off-by: Stefan Hengelein <[email protected]>
Signed-off-by: Florian Schmaus <[email protected]>
Signed-off-by: Andor Daam <[email protected]>
Signed-off-by: Dan Magenheimer <[email protected]>
[v1: Fixes per Seth Jennings suggestions]
[v2: Removed FRONTSWAP_HAS_.. ]
[v3: Fix up per Bob Liu <[email protected]> recommendations]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
mm/frontswap.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++--------
1 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/mm/frontswap.c b/mm/frontswap.c
index 2890e67..ba58157 100644
--- a/mm/frontswap.c
+++ b/mm/frontswap.c
@@ -80,6 +80,18 @@ static inline void inc_frontswap_succ_stores(void) { }
static inline void inc_frontswap_failed_stores(void) { }
static inline void inc_frontswap_invalidates(void) { }
#endif
+
+/*
+ * When no backend is registered all calls to init are registered and
+ * remembered but fail to create tmem_pools. When a backend registers with
+ * frontswap the previous calls to init are executed to create tmem_pools
+ * and set the respective poolids.
+ * While no backend is registered all "puts", "gets" and "flushes" are
+ * ignored or fail.
+ */
+static DECLARE_BITMAP(need_init, MAX_SWAPFILES);
+static bool backend_registered __read_mostly;
+
/*
* Register operations for frontswap, returning previous thus allowing
* detection of multiple backends and possible nesting.
@@ -87,9 +99,19 @@ static inline void inc_frontswap_invalidates(void) { }
struct frontswap_ops frontswap_register_ops(struct frontswap_ops *ops)
{
struct frontswap_ops old = frontswap_ops;
+ int i;

frontswap_ops = *ops;
frontswap_enabled = true;
+
+ for (i = 0; i < MAX_SWAPFILES; i++) {
+ if (test_and_clear_bit(i, need_init))
+ (*frontswap_ops.init)(i);
+ }
+ /* We MUST have backend_registered called _after_ the frontswap_init's
+ * have been called. Otherwise __frontswap_store might fail. */
+ barrier();
+ backend_registered = true;
return old;
}
EXPORT_SYMBOL(frontswap_register_ops);
@@ -119,10 +141,17 @@ void __frontswap_init(unsigned type)
{
struct swap_info_struct *sis = swap_info[type];

- BUG_ON(sis == NULL);
- if (sis->frontswap_map == NULL)
- return;
- frontswap_ops.init(type);
+ if (backend_registered) {
+ BUG_ON(sis == NULL);
+ if (sis->frontswap_map == NULL)
+ return;
+ (*frontswap_ops.init)(type);
+ }
+ else {
+ BUG_ON(type > MAX_SWAPFILES);
+ set_bit(type, need_init);
+ }
+
}
EXPORT_SYMBOL(__frontswap_init);

@@ -147,6 +176,11 @@ int __frontswap_store(struct page *page)
struct swap_info_struct *sis = swap_info[type];
pgoff_t offset = swp_offset(entry);

+ if (!backend_registered) {
+ inc_frontswap_failed_stores();
+ return ret;
+ }
+
BUG_ON(!PageLocked(page));
BUG_ON(sis == NULL);
if (frontswap_test(sis, offset))
@@ -186,6 +220,9 @@ int __frontswap_load(struct page *page)
struct swap_info_struct *sis = swap_info[type];
pgoff_t offset = swp_offset(entry);

+ if (!backend_registered)
+ return ret;
+
BUG_ON(!PageLocked(page));
BUG_ON(sis == NULL);
if (frontswap_test(sis, offset))
@@ -209,6 +246,9 @@ void __frontswap_invalidate_page(unsigned type, pgoff_t offset)
{
struct swap_info_struct *sis = swap_info[type];

+ if (!backend_registered)
+ return;
+
BUG_ON(sis == NULL);
if (frontswap_test(sis, offset)) {
frontswap_ops.invalidate_page(type, offset);
@@ -226,12 +266,15 @@ void __frontswap_invalidate_area(unsigned type)
{
struct swap_info_struct *sis = swap_info[type];

- BUG_ON(sis == NULL);
- if (sis->frontswap_map == NULL)
- return;
- frontswap_ops.invalidate_area(type);
- atomic_set(&sis->frontswap_pages, 0);
- memset(sis->frontswap_map, 0, sis->max / sizeof(long));
+ if (backend_registered) {
+ BUG_ON(sis == NULL);
+ if (sis->frontswap_map == NULL)
+ return;
+ (*frontswap_ops.invalidate_area)(type);
+ atomic_set(&sis->frontswap_pages, 0);
+ memset(sis->frontswap_map, 0, sis->max / sizeof(long));
+ }
+ clear_bit(type, need_init);
}
EXPORT_SYMBOL(__frontswap_invalidate_area);

@@ -364,6 +407,9 @@ static int __init init_frontswap(void)
debugfs_create_u64("invalidates", S_IRUGO,
root, &frontswap_invalidates);
#endif
+ bitmap_zero(need_init, MAX_SWAPFILES);
+
+ frontswap_enabled = 1;
return 0;
}

--
1.7.7.6

2012-11-14 18:57:46

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 4/8] cleancache: Make cleancache_init use a pointer for the ops

Instead of using a backend_registered to determine whether
a backend is enabled. This allows us to remove the
backend_register check and just do 'if (cleancache_ops)'

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/staging/ramster/zcache-main.c | 8 +++---
drivers/staging/zcache/zcache-main.c | 8 +++---
drivers/xen/tmem.c | 6 ++--
include/linux/cleancache.h | 2 +-
mm/cleancache.c | 43 +++++++++++++++-----------------
5 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/drivers/staging/ramster/zcache-main.c b/drivers/staging/ramster/zcache-main.c
index 6c8959d..ed99170 100644
--- a/drivers/staging/ramster/zcache-main.c
+++ b/drivers/staging/ramster/zcache-main.c
@@ -1495,9 +1495,9 @@ static struct cleancache_ops zcache_cleancache_ops = {
.init_fs = zcache_cleancache_init_fs
};

-struct cleancache_ops zcache_cleancache_register_ops(void)
+struct cleancache_ops *zcache_cleancache_register_ops(void)
{
- struct cleancache_ops old_ops =
+ struct cleancache_ops *old_ops =
cleancache_register_ops(&zcache_cleancache_ops);

return old_ops;
@@ -1781,7 +1781,7 @@ static int __init zcache_init(void)
}
zbud_init();
if (zcache_enabled && !disable_cleancache) {
- struct cleancache_ops old_ops;
+ struct cleancache_ops *old_ops;

register_shrinker(&zcache_shrinker);
old_ops = zcache_cleancache_register_ops();
@@ -1791,7 +1791,7 @@ static int __init zcache_init(void)
pr_info("%s: cleancache: ignorenonactive = %d\n",
namestr, !disable_cleancache_ignore_nonactive);
#endif
- if (old_ops.init_fs != NULL)
+ if (old_ops)
pr_warn("%s: cleancache_ops overridden\n", namestr);
}
if (zcache_enabled && !disable_frontswap) {
diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index 3db38cb..f9ab874 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -1811,9 +1811,9 @@ static struct cleancache_ops zcache_cleancache_ops = {
.init_fs = zcache_cleancache_init_fs
};

-struct cleancache_ops zcache_cleancache_register_ops(void)
+struct cleancache_ops *zcache_cleancache_register_ops(void)
{
- struct cleancache_ops old_ops =
+ struct cleancache_ops *old_ops =
cleancache_register_ops(&zcache_cleancache_ops);

return old_ops;
@@ -2048,14 +2048,14 @@ static int __init zcache_init(void)

#ifdef CONFIG_CLEANCACHE
if (zcache_enabled && use_cleancache) {
- struct cleancache_ops old_ops;
+ struct cleancache_ops *old_ops;

zbud_init();
register_shrinker(&zcache_shrinker);
old_ops = zcache_cleancache_register_ops();
pr_info("zcache: cleancache enabled using kernel "
"transcendent memory and compression buddies\n");
- if (old_ops.init_fs != NULL)
+ if (old_ops)
pr_warning("zcache: cleancache_ops overridden");
}
#endif
diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c
index 4b02c07..15e776c 100644
--- a/drivers/xen/tmem.c
+++ b/drivers/xen/tmem.c
@@ -236,7 +236,7 @@ static int __init no_cleancache(char *s)
}
__setup("nocleancache", no_cleancache);

-static struct cleancache_ops __initdata tmem_cleancache_ops = {
+static struct cleancache_ops tmem_cleancache_ops = {
.put_page = tmem_cleancache_put_page,
.get_page = tmem_cleancache_get_page,
.invalidate_page = tmem_cleancache_flush_page,
@@ -392,9 +392,9 @@ static int __init xen_tmem_init(void)
BUG_ON(sizeof(struct cleancache_filekey) != sizeof(struct tmem_oid));
if (tmem_enabled && use_cleancache) {
char *s = "";
- struct cleancache_ops old_ops =
+ struct cleancache_ops *old_ops =
cleancache_register_ops(&tmem_cleancache_ops);
- if (old_ops.init_fs != NULL)
+ if (old_ops)
s = " (WARNING: cleancache_ops overridden)";
printk(KERN_INFO "cleancache enabled, RAM provided by "
"Xen Transcendent Memory%s\n", s);
diff --git a/include/linux/cleancache.h b/include/linux/cleancache.h
index 42e55de..3af5ea8 100644
--- a/include/linux/cleancache.h
+++ b/include/linux/cleancache.h
@@ -33,7 +33,7 @@ struct cleancache_ops {
void (*invalidate_fs)(int);
};

-extern struct cleancache_ops
+extern struct cleancache_ops *
cleancache_register_ops(struct cleancache_ops *ops);
extern void __cleancache_init_fs(struct super_block *);
extern void __cleancache_init_shared_fs(char *, struct super_block *);
diff --git a/mm/cleancache.c b/mm/cleancache.c
index 318a0ad..95f6618 100644
--- a/mm/cleancache.c
+++ b/mm/cleancache.c
@@ -32,7 +32,7 @@ EXPORT_SYMBOL(cleancache_enabled);
* cleancache_ops is set by cleancache_ops_register to contain the pointers
* to the cleancache "backend" implementation functions.
*/
-static struct cleancache_ops cleancache_ops __read_mostly;
+static struct cleancache_ops *cleancache_ops __read_mostly;

/*
* Counters available via /sys/kernel/debug/frontswap (if debugfs is
@@ -63,27 +63,24 @@ static int fs_poolid_map[MAX_INITIALIZABLE_FS];
static int shared_fs_poolid_map[MAX_INITIALIZABLE_FS];

static char *uuids[MAX_INITIALIZABLE_FS];
-static bool __read_mostly backend_registered;

/*
* register operations for cleancache, returning previous thus allowing
* detection of multiple backends and possible nesting
*/
-struct cleancache_ops cleancache_register_ops(struct cleancache_ops *ops)
+struct cleancache_ops *cleancache_register_ops(struct cleancache_ops *ops)
{
- struct cleancache_ops old = cleancache_ops;
+ struct cleancache_ops *old = cleancache_ops;
int i;

- cleancache_ops = *ops;
-
- backend_registered = true;
for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
if (fs_poolid_map[i] == FS_NO_BACKEND)
- fs_poolid_map[i] = (*cleancache_ops.init_fs)(PAGE_SIZE);
+ fs_poolid_map[i] = ops->init_fs(PAGE_SIZE);
if (shared_fs_poolid_map[i] == FS_NO_BACKEND)
- shared_fs_poolid_map[i] = (*cleancache_ops.init_shared_fs)
+ shared_fs_poolid_map[i] = ops->init_shared_fs
(uuids[i], PAGE_SIZE);
}
+ cleancache_ops = ops;
return old;
}
EXPORT_SYMBOL(cleancache_register_ops);
@@ -96,8 +93,8 @@ void __cleancache_init_fs(struct super_block *sb)
for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
if (fs_poolid_map[i] == FS_UNKNOWN) {
sb->cleancache_poolid = i + FAKE_FS_POOLID_OFFSET;
- if (backend_registered)
- fs_poolid_map[i] = (*cleancache_ops.init_fs)(PAGE_SIZE);
+ if (cleancache_ops)
+ fs_poolid_map[i] = cleancache_ops->init_fs(PAGE_SIZE);
else
fs_poolid_map[i] = FS_NO_BACKEND;
break;
@@ -115,8 +112,8 @@ void __cleancache_init_shared_fs(char *uuid, struct super_block *sb)
if (shared_fs_poolid_map[i] == FS_UNKNOWN) {
sb->cleancache_poolid = i + FAKE_SHARED_FS_POOLID_OFFSET;
uuids[i] = uuid;
- if (backend_registered)
- shared_fs_poolid_map[i] = (*cleancache_ops.init_shared_fs)
+ if (cleancache_ops)
+ shared_fs_poolid_map[i] = cleancache_ops->init_shared_fs
(uuid, PAGE_SIZE);
else
shared_fs_poolid_map[i] = FS_NO_BACKEND;
@@ -178,7 +175,7 @@ int __cleancache_get_page(struct page *page)
int fake_pool_id;
struct cleancache_filekey key = { .u.key = { 0 } };

- if (!backend_registered) {
+ if (!cleancache_ops) {
cleancache_failed_gets++;
goto out;
}
@@ -193,7 +190,7 @@ int __cleancache_get_page(struct page *page)
goto out;

if (pool_id >= 0)
- ret = (*cleancache_ops.get_page)(pool_id,
+ ret = cleancache_ops->get_page(pool_id,
key, page->index, page);
if (ret == 0)
cleancache_succ_gets++;
@@ -216,7 +213,7 @@ void __cleancache_put_page(struct page *page)
int fake_pool_id;
struct cleancache_filekey key = { .u.key = { 0 } };

- if (!backend_registered) {
+ if (!cleancache_ops) {
cleancache_puts++;
return;
}
@@ -230,7 +227,7 @@ void __cleancache_put_page(struct page *page)

if (pool_id >= 0 &&
cleancache_get_key(page->mapping->host, &key) >= 0) {
- (*cleancache_ops.put_page)(pool_id, key, page->index, page);
+ cleancache_ops->put_page(pool_id, key, page->index, page);
cleancache_puts++;
}
}
@@ -248,7 +245,7 @@ void __cleancache_invalidate_page(struct address_space *mapping,
int fake_pool_id = mapping->host->i_sb->cleancache_poolid;
struct cleancache_filekey key = { .u.key = { 0 } };

- if (!backend_registered)
+ if (!cleancache_ops)
return;

if (fake_pool_id >= 0) {
@@ -258,7 +255,7 @@ void __cleancache_invalidate_page(struct address_space *mapping,

VM_BUG_ON(!PageLocked(page));
if (cleancache_get_key(mapping->host, &key) >= 0) {
- (*cleancache_ops.invalidate_page)(pool_id,
+ cleancache_ops->invalidate_page(pool_id,
key, page->index);
cleancache_invalidates++;
}
@@ -277,7 +274,7 @@ void __cleancache_invalidate_inode(struct address_space *mapping)
int fake_pool_id = mapping->host->i_sb->cleancache_poolid;
struct cleancache_filekey key = { .u.key = { 0 } };

- if (!backend_registered)
+ if (!cleancache_ops)
return;

if (fake_pool_id < 0)
@@ -286,7 +283,7 @@ void __cleancache_invalidate_inode(struct address_space *mapping)
pool_id = get_poolid_from_fake(fake_pool_id);

if (pool_id >= 0 && cleancache_get_key(mapping->host, &key) >= 0)
- (*cleancache_ops.invalidate_inode)(pool_id, key);
+ cleancache_ops->invalidate_inode(pool_id, key);
}
EXPORT_SYMBOL(__cleancache_invalidate_inode);

@@ -312,8 +309,8 @@ void __cleancache_invalidate_fs(struct super_block *sb)
fs_poolid_map[index] = FS_UNKNOWN;
}
sb->cleancache_poolid = -1;
- if (backend_registered)
- (*cleancache_ops.invalidate_fs)(old_poolid);
+ if (cleancache_ops)
+ cleancache_ops->invalidate_fs(old_poolid);
}
EXPORT_SYMBOL(__cleancache_invalidate_fs);

--
1.7.7.6

2012-11-14 18:57:50

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 5/8] staging: zcache2+ramster: enable ramster to be built/loaded as a module

From: Dan Magenheimer <[email protected]>

Enable module support for ramster. Note runtime dependency disallows
loading if cleancache/frontswap lazy initialization patches are not
present.

If built-in (not built as a module), the original mechanism of enabling via
a kernel boot parameter is retained, but this should be considered deprecated.

Note that module unload is explicitly not yet supported.

Signed-off-by: Dan Magenheimer <[email protected]>
[v1: Fixed compile issues since ramster_init now has four arguments]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/staging/ramster/ramster.h | 6 +++-
drivers/staging/ramster/ramster/nodemanager.c | 9 +++---
drivers/staging/ramster/ramster/ramster.c | 29 ++++++++++++++++---
drivers/staging/ramster/ramster/ramster.h | 2 +-
.../staging/ramster/ramster/ramster_nodemanager.h | 2 +
drivers/staging/ramster/zcache-main.c | 2 +-
6 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/ramster/ramster.h b/drivers/staging/ramster/ramster.h
index 1b71aea..e1f91d5 100644
--- a/drivers/staging/ramster/ramster.h
+++ b/drivers/staging/ramster/ramster.h
@@ -11,10 +11,14 @@
#ifndef _ZCACHE_RAMSTER_H_
#define _ZCACHE_RAMSTER_H_

+#ifdef CONFIG_RAMSTER_MODULE
+#define CONFIG_RAMSTER
+#endif
+
#ifdef CONFIG_RAMSTER
#include "ramster/ramster.h"
#else
-static inline void ramster_init(bool x, bool y, bool z)
+static inline void ramster_init(bool x, bool y, bool z, bool w)
{
}

diff --git a/drivers/staging/ramster/ramster/nodemanager.c b/drivers/staging/ramster/ramster/nodemanager.c
index c0f4815..2cfe933 100644
--- a/drivers/staging/ramster/ramster/nodemanager.c
+++ b/drivers/staging/ramster/ramster/nodemanager.c
@@ -949,7 +949,7 @@ static void __exit exit_r2nm(void)
r2hb_exit();
}

-static int __init init_r2nm(void)
+int r2nm_init(void)
{
int ret = -1;

@@ -986,10 +986,11 @@ out_r2hb:
out:
return ret;
}
+EXPORT_SYMBOL_GPL(r2nm_init);

MODULE_AUTHOR("Oracle");
MODULE_LICENSE("GPL");

-/* module_init(init_r2nm) */
-late_initcall(init_r2nm);
-/* module_exit(exit_r2nm) */
+#ifndef CONFIG_RAMSTER_MODULE
+late_initcall(r2nm_init);
+#endif
diff --git a/drivers/staging/ramster/ramster/ramster.c b/drivers/staging/ramster/ramster/ramster.c
index c06709f..491ec70 100644
--- a/drivers/staging/ramster/ramster/ramster.c
+++ b/drivers/staging/ramster/ramster/ramster.c
@@ -92,7 +92,7 @@ static unsigned long ramster_remote_page_flushes_failed;
#include <linux/debugfs.h>
#define zdfs debugfs_create_size_t
#define zdfs64 debugfs_create_u64
-static int __init ramster_debugfs_init(void)
+static int ramster_debugfs_init(void)
{
struct dentry *root = debugfs_create_dir("ramster", NULL);
if (root == NULL)
@@ -191,6 +191,7 @@ int ramster_do_preload_flnode(struct tmem_pool *pool)
kmem_cache_free(ramster_flnode_cache, flnode);
return ret;
}
+EXPORT_SYMBOL_GPL(ramster_do_preload_flnode);

/*
* Called by the message handler after a (still compressed) page has been
@@ -458,6 +459,7 @@ void *ramster_pampd_free(void *pampd, struct tmem_pool *pool,
}
return local_pampd;
}
+EXPORT_SYMBOL_GPL(ramster_pampd_free);

void ramster_count_foreign_pages(bool eph, int count)
{
@@ -489,6 +491,7 @@ void ramster_count_foreign_pages(bool eph, int count)
ramster_foreign_pers_pages = c;
}
}
+EXPORT_SYMBOL_GPL(ramster_count_foreign_pages);

/*
* For now, just push over a few pages every few seconds to
@@ -674,7 +677,7 @@ requeue:
ramster_remotify_queue_delayed_work(HZ);
}

-void __init ramster_remotify_init(void)
+void ramster_remotify_init(void)
{
unsigned long n = 60UL;
ramster_remotify_workqueue =
@@ -849,8 +852,10 @@ static bool frontswap_selfshrinking __read_mostly;
static void selfshrink_process(struct work_struct *work);
static DECLARE_DELAYED_WORK(selfshrink_worker, selfshrink_process);

+#ifndef CONFIG_RAMSTER_MODULE
/* Enable/disable with kernel boot option. */
static bool use_frontswap_selfshrink __initdata = true;
+#endif

/*
* The default values for the following parameters were deemed reasonable
@@ -905,6 +910,7 @@ static void frontswap_selfshrink(void)
frontswap_shrink(tgt_frontswap_pages);
}

+#ifndef CONFIG_RAMSTER_MODULE
static int __init ramster_nofrontswap_selfshrink_setup(char *s)
{
use_frontswap_selfshrink = false;
@@ -912,6 +918,7 @@ static int __init ramster_nofrontswap_selfshrink_setup(char *s)
}

__setup("noselfshrink", ramster_nofrontswap_selfshrink_setup);
+#endif

static void selfshrink_process(struct work_struct *work)
{
@@ -930,6 +937,7 @@ void ramster_cpu_up(int cpu)
per_cpu(ramster_remoteputmem1, cpu) = p1;
per_cpu(ramster_remoteputmem2, cpu) = p2;
}
+EXPORT_SYMBOL_GPL(ramster_cpu_up);

void ramster_cpu_down(int cpu)
{
@@ -945,6 +953,7 @@ void ramster_cpu_down(int cpu)
kp->flnode = NULL;
}
}
+EXPORT_SYMBOL_GPL(ramster_cpu_down);

void ramster_register_pamops(struct tmem_pamops *pamops)
{
@@ -955,9 +964,11 @@ void ramster_register_pamops(struct tmem_pamops *pamops)
pamops->repatriate = ramster_pampd_repatriate;
pamops->repatriate_preload = ramster_pampd_repatriate_preload;
}
+EXPORT_SYMBOL_GPL(ramster_register_pamops);

-void __init ramster_init(bool cleancache, bool frontswap,
- bool frontswap_exclusive_gets)
+void ramster_init(bool cleancache, bool frontswap,
+ bool frontswap_exclusive_gets,
+ bool frontswap_selfshrink)
{
int ret = 0;

@@ -972,10 +983,17 @@ void __init ramster_init(bool cleancache, bool frontswap,
if (ret)
pr_err("ramster: can't create sysfs for ramster\n");
(void)r2net_register_handlers();
+#ifdef CONFIG_RAMSTER_MODULE
+ ret = r2nm_init();
+ if (ret)
+ pr_err("ramster: can't init r2net\n");
+ frontswap_selfshrinking = frontswap_selfshrink;
+#else
+ frontswap_selfshrinking = use_frontswap_selfshrink;
+#endif
INIT_LIST_HEAD(&ramster_rem_op_list);
ramster_flnode_cache = kmem_cache_create("ramster_flnode",
sizeof(struct flushlist_node), 0, 0, NULL);
- frontswap_selfshrinking = use_frontswap_selfshrink;
if (frontswap_selfshrinking) {
pr_info("ramster: Initializing frontswap selfshrink driver.\n");
schedule_delayed_work(&selfshrink_worker,
@@ -983,3 +1001,4 @@ void __init ramster_init(bool cleancache, bool frontswap,
}
ramster_remotify_init();
}
+EXPORT_SYMBOL_GPL(ramster_init);
diff --git a/drivers/staging/ramster/ramster/ramster.h b/drivers/staging/ramster/ramster/ramster.h
index 12ae56f..6d41a7a 100644
--- a/drivers/staging/ramster/ramster/ramster.h
+++ b/drivers/staging/ramster/ramster/ramster.h
@@ -147,7 +147,7 @@ extern int r2net_register_handlers(void);
extern int r2net_remote_target_node_set(int);

extern int ramster_remotify_pageframe(bool);
-extern void ramster_init(bool, bool, bool);
+extern void ramster_init(bool, bool, bool, bool);
extern void ramster_register_pamops(struct tmem_pamops *);
extern int ramster_localify(int, struct tmem_oid *oidp, uint32_t, char *,
unsigned int, void *);
diff --git a/drivers/staging/ramster/ramster/ramster_nodemanager.h b/drivers/staging/ramster/ramster/ramster_nodemanager.h
index 49f879d..dbaae34 100644
--- a/drivers/staging/ramster/ramster/ramster_nodemanager.h
+++ b/drivers/staging/ramster/ramster/ramster_nodemanager.h
@@ -36,4 +36,6 @@
/* host name, group name, cluster name all 64 bytes */
#define R2NM_MAX_NAME_LEN 64 /* __NEW_UTS_LEN */

+extern int r2nm_init(void);
+
#endif /* _RAMSTER_NODEMANAGER_H */
diff --git a/drivers/staging/ramster/zcache-main.c b/drivers/staging/ramster/zcache-main.c
index ed99170..38c78b3 100644
--- a/drivers/staging/ramster/zcache-main.c
+++ b/drivers/staging/ramster/zcache-main.c
@@ -1812,7 +1812,7 @@ static int __init zcache_init(void)
}
if (ramster_enabled)
ramster_init(!disable_cleancache, !disable_frontswap,
- frontswap_has_exclusive_gets);
+ frontswap_has_exclusive_gets, false);
out:
return ret;
}
--
1.7.7.6

2012-11-14 18:58:03

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 7/8] xen: tmem: enable Xen tmem shim to be built/loaded as a module

From: Dan Magenheimer <[email protected]>

Allow Xen tmem shim to be built/loaded as a module. Xen self-ballooning
and frontswap-selfshrinking are now also "lazily" initialized when the
Xen tmem shim is loaded as a module, unless explicitly disabled
by module parameters.

Note runtime dependency disallows loading if cleancache/frontswap lazy
initialization patches are not present.

If built-in (not built as a module), the original mechanism of enabling via
a kernel boot parameter is retained, but this should be considered deprecated.

Note that module unload is explicitly not yet supported.

Signed-off-by: Dan Magenheimer <[email protected]>
[v1: Removed the [CLEANCACHE|FRONTSWAP]_HAS_LAZY_INIT ifdef]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/Kconfig | 4 ++--
drivers/xen/tmem.c | 38 +++++++++++++++++++++++++++++---------
drivers/xen/xen-selfballoon.c | 13 +++++++------
include/xen/tmem.h | 8 ++++++++
4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 126d8ce..1e70c60 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -145,9 +145,9 @@ config SWIOTLB_XEN
select SWIOTLB

config XEN_TMEM
- bool
+ tristate
depends on !ARM
- default y if (CLEANCACHE || FRONTSWAP)
+ default m if (CLEANCACHE || FRONTSWAP)
help
Shim to interface in-kernel Transcendent Memory hooks
(e.g. cleancache and frontswap) to Xen tmem hypercalls.
diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c
index 15e776c..9a4a9ec 100644
--- a/drivers/xen/tmem.c
+++ b/drivers/xen/tmem.c
@@ -5,6 +5,7 @@
* Author: Dan Magenheimer
*/

+#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/init.h>
@@ -128,6 +129,7 @@ static int xen_tmem_flush_object(u32 pool_id, struct tmem_oid oid)
return xen_tmem_op(TMEM_FLUSH_OBJECT, pool_id, oid, 0, 0, 0, 0, 0);
}

+#ifndef CONFIG_XEN_TMEM_MODULE
bool __read_mostly tmem_enabled = false;

static int __init enable_tmem(char *s)
@@ -136,6 +138,7 @@ static int __init enable_tmem(char *s)
return 1;
}
__setup("tmem", enable_tmem);
+#endif

#ifdef CONFIG_CLEANCACHE
static int xen_tmem_destroy_pool(u32 pool_id)
@@ -227,14 +230,19 @@ static int tmem_cleancache_init_shared_fs(char *uuid, size_t pagesize)
return xen_tmem_new_pool(shared_uuid, TMEM_POOL_SHARED, pagesize);
}

-static bool __initdata use_cleancache = true;
-
+static bool disable_cleancache __read_mostly;
+static bool disable_selfballooning __read_mostly;
+#ifdef CONFIG_XEN_TMEM_MODULE
+module_param(disable_cleancache, bool, S_IRUGO);
+module_param(disable_selfballooning, bool, S_IRUGO);
+#else
static int __init no_cleancache(char *s)
{
- use_cleancache = false;
+ disable_cleancache = true;
return 1;
}
__setup("nocleancache", no_cleancache);
+#endif

static struct cleancache_ops tmem_cleancache_ops = {
.put_page = tmem_cleancache_put_page,
@@ -353,14 +361,19 @@ static void tmem_frontswap_init(unsigned ignored)
xen_tmem_new_pool(private, TMEM_POOL_PERSIST, PAGE_SIZE);
}

-static bool __initdata use_frontswap = true;
-
+static bool disable_frontswap __read_mostly;
+static bool disable_frontswap_selfshrinking __read_mostly;
+#ifdef CONFIG_XEN_TMEM_MODULE
+module_param(disable_frontswap, bool, S_IRUGO);
+module_param(disable_frontswap_selfshrinking, bool, S_IRUGO);
+#else
static int __init no_frontswap(char *s)
{
- use_frontswap = false;
+ disable_frontswap = true;
return 1;
}
__setup("nofrontswap", no_frontswap);
+#endif

static struct frontswap_ops tmem_frontswap_ops = {
.store = tmem_frontswap_store,
@@ -371,12 +384,12 @@ static struct frontswap_ops tmem_frontswap_ops = {
};
#endif

-static int __init xen_tmem_init(void)
+static int xen_tmem_init(void)
{
if (!xen_domain())
return 0;
#ifdef CONFIG_FRONTSWAP
- if (tmem_enabled && use_frontswap) {
+ if (tmem_enabled && !disable_frontswap) {
char *s = "";
struct frontswap_ops *old_ops =
frontswap_register_ops(&tmem_frontswap_ops);
@@ -390,7 +403,7 @@ static int __init xen_tmem_init(void)
#endif
#ifdef CONFIG_CLEANCACHE
BUG_ON(sizeof(struct cleancache_filekey) != sizeof(struct tmem_oid));
- if (tmem_enabled && use_cleancache) {
+ if (tmem_enabled && !disable_cleancache) {
char *s = "";
struct cleancache_ops *old_ops =
cleancache_register_ops(&tmem_cleancache_ops);
@@ -400,7 +413,14 @@ static int __init xen_tmem_init(void)
"Xen Transcendent Memory%s\n", s);
}
#endif
+#ifdef CONFIG_XEN_SELFBALLOONING
+ xen_selfballoon_init(!disable_selfballooning,
+ !disable_frontswap_selfshrinking);
+#endif
return 0;
}

module_init(xen_tmem_init)
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dan Magenheimer <[email protected]>");
+MODULE_DESCRIPTION("Shim to Xen transcendent memory");
diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c
index 7d041cb..f4808aa 100644
--- a/drivers/xen/xen-selfballoon.c
+++ b/drivers/xen/xen-selfballoon.c
@@ -121,7 +121,7 @@ static DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process);
static bool frontswap_selfshrinking __read_mostly;

/* Enable/disable with kernel boot option. */
-static bool use_frontswap_selfshrink __initdata = true;
+static bool use_frontswap_selfshrink = true;

/*
* The default values for the following parameters were deemed reasonable
@@ -185,7 +185,7 @@ static int __init xen_nofrontswap_selfshrink_setup(char *s)
__setup("noselfshrink", xen_nofrontswap_selfshrink_setup);

/* Disable with kernel boot option. */
-static bool use_selfballooning __initdata = true;
+static bool use_selfballooning = true;

static int __init xen_noselfballooning_setup(char *s)
{
@@ -196,7 +196,7 @@ static int __init xen_noselfballooning_setup(char *s)
__setup("noselfballooning", xen_noselfballooning_setup);
#else /* !CONFIG_FRONTSWAP */
/* Enable with kernel boot option. */
-static bool use_selfballooning __initdata = false;
+static bool use_selfballooning;

static int __init xen_selfballooning_setup(char *s)
{
@@ -537,7 +537,7 @@ int register_xen_selfballooning(struct device *dev)
}
EXPORT_SYMBOL(register_xen_selfballooning);

-static int __init xen_selfballoon_init(void)
+int xen_selfballoon_init(bool use_selfballooning, bool use_frontswap_selfshrink)
{
bool enable = false;

@@ -571,7 +571,8 @@ static int __init xen_selfballoon_init(void)

return 0;
}
+EXPORT_SYMBOL(xen_selfballoon_init);

+#ifndef CONFIG_XEN_TMEM_MODULE
subsys_initcall(xen_selfballoon_init);
-
-MODULE_LICENSE("GPL");
+#endif
diff --git a/include/xen/tmem.h b/include/xen/tmem.h
index 591550a..3930a90 100644
--- a/include/xen/tmem.h
+++ b/include/xen/tmem.h
@@ -3,7 +3,15 @@

#include <linux/types.h>

+#ifdef CONFIG_XEN_TMEM_MODULE
+#define tmem_enabled true
+#else
/* defined in drivers/xen/tmem.c */
extern bool tmem_enabled;
+#endif
+
+#ifdef CONFIG_XEN_SELFBALLOONING
+extern int xen_selfballoon_init(bool, bool);
+#endif

#endif /* _XEN_TMEM_H */
--
1.7.7.6

2012-11-14 18:58:23

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 8/8] xen/tmem: Remove the subsys call.

We get:
drivers/xen/xen-selfballoon.c:577:134: warning: initialization from incompatible pointer type [enabled by default]

We actually do not need this function to be called
before tmem is loaded. So lets remove the subsys_init.

If tmem is built in as a module this is still OK as
xen_selfballoon_init ends up being exported and can
be called by the tmem module.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/xen-selfballoon.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c
index f4808aa..0bd551e 100644
--- a/drivers/xen/xen-selfballoon.c
+++ b/drivers/xen/xen-selfballoon.c
@@ -572,7 +572,3 @@ int xen_selfballoon_init(bool use_selfballooning, bool use_frontswap_selfshrink)
return 0;
}
EXPORT_SYMBOL(xen_selfballoon_init);
-
-#ifndef CONFIG_XEN_TMEM_MODULE
-subsys_initcall(xen_selfballoon_init);
-#endif
--
1.7.7.6

2012-11-14 18:57:45

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 6/8] staging: zcache2+ramster: enable zcache2 to be built/loaded as a module

From: Dan Magenheimer <[email protected]>

Allow zcache2 to be built/loaded as a module. Note runtime dependency
disallows loading if cleancache/frontswap lazy initialization patches
are not present. Zsmalloc support has not yet been merged into zcache2
but, once merged, could now easily be selected via a module_param.

If built-in (not built as a module), the original mechanism of enabling via
a kernel boot parameter is retained, but this should be considered deprecated.

Note that module unload is explicitly not yet supported.

Signed-off-by: Dan Magenheimer <[email protected]>
[v1: Rebased with different order of patches]
[v2: Removed [CLEANCACHE|FRONTSWAP]_HAS_LAZY_INIT ifdef]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/staging/ramster/Kconfig | 6 ++--
drivers/staging/ramster/Makefile | 11 ++++---
drivers/staging/ramster/tmem.c | 6 +++-
drivers/staging/ramster/tmem.h | 8 +++---
drivers/staging/ramster/zcache-main.c | 47 ++++++++++++++++++++++++++++++---
drivers/staging/ramster/zcache.h | 2 +-
6 files changed, 62 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/ramster/Kconfig b/drivers/staging/ramster/Kconfig
index 3abf661..9ce2590 100644
--- a/drivers/staging/ramster/Kconfig
+++ b/drivers/staging/ramster/Kconfig
@@ -1,5 +1,5 @@
config ZCACHE2
- bool "Dynamic compression of swap pages and clean pagecache pages"
+ tristate "Dynamic compression of swap pages and clean pagecache pages"
depends on CRYPTO=y && SWAP=y && CLEANCACHE && FRONTSWAP && !ZCACHE
select CRYPTO_LZO
default n
@@ -16,8 +16,8 @@ config ZCACHE2
version of ramster.

config RAMSTER
- bool "Cross-machine RAM capacity sharing, aka peer-to-peer tmem"
- depends on CONFIGFS_FS=y && SYSFS=y && !HIGHMEM && ZCACHE2=y
+ tristate "Cross-machine RAM capacity sharing, aka peer-to-peer tmem"
+ depends on CONFIGFS_FS && SYSFS && !HIGHMEM && ZCACHE2
depends on NET
# must ensure struct page is 8-byte aligned
select HAVE_ALIGNED_STRUCT_PAGE if !64_BIT
diff --git a/drivers/staging/ramster/Makefile b/drivers/staging/ramster/Makefile
index 2d8b9d0..fcb25cb 100644
--- a/drivers/staging/ramster/Makefile
+++ b/drivers/staging/ramster/Makefile
@@ -1,6 +1,7 @@
zcache-y := zcache-main.o tmem.o zbud.o
-zcache-$(CONFIG_RAMSTER) += ramster/ramster.o ramster/r2net.o
-zcache-$(CONFIG_RAMSTER) += ramster/nodemanager.o ramster/tcp.o
-zcache-$(CONFIG_RAMSTER) += ramster/heartbeat.o ramster/masklog.o
-
-obj-$(CONFIG_ZCACHE2) += zcache.o
+ifneq (,$(filter $(CONFIG_RAMSTER),y m))
+zcache-y += ramster/ramster.o ramster/r2net.o
+zcache-y += ramster/nodemanager.o ramster/tcp.o
+zcache-y += ramster/heartbeat.o ramster/masklog.o
+endif
+obj-$(CONFIG_MODULES) += zcache.o
diff --git a/drivers/staging/ramster/tmem.c b/drivers/staging/ramster/tmem.c
index a2b7e03..d7e51e4 100644
--- a/drivers/staging/ramster/tmem.c
+++ b/drivers/staging/ramster/tmem.c
@@ -35,7 +35,8 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/atomic.h>
-#ifdef CONFIG_RAMSTER
+#include <linux/export.h>
+#if defined(CONFIG_RAMSTER) || defined(CONFIG_RAMSTER_MODULE)
#include <linux/delay.h>
#endif

@@ -641,6 +642,7 @@ void *tmem_localify_get_pampd(struct tmem_pool *pool, struct tmem_oid *oidp,
/* note, hashbucket remains locked */
return pampd;
}
+EXPORT_SYMBOL_GPL(tmem_localify_get_pampd);

void tmem_localify_finish(struct tmem_obj *obj, uint32_t index,
void *pampd, void *saved_hb, bool delete)
@@ -658,6 +660,7 @@ void tmem_localify_finish(struct tmem_obj *obj, uint32_t index,
}
spin_unlock(&hb->lock);
}
+EXPORT_SYMBOL_GPL(tmem_localify_finish);

/*
* For ramster only. Helper function to support asynchronous tmem_get.
@@ -719,6 +722,7 @@ out:
spin_unlock(&hb->lock);
return ret;
}
+EXPORT_SYMBOL_GPL(tmem_replace);
#endif

/*
diff --git a/drivers/staging/ramster/tmem.h b/drivers/staging/ramster/tmem.h
index adbe5a8..d128ce2 100644
--- a/drivers/staging/ramster/tmem.h
+++ b/drivers/staging/ramster/tmem.h
@@ -126,7 +126,7 @@ static inline unsigned tmem_oid_hash(struct tmem_oid *oidp)
TMEM_HASH_BUCKET_BITS);
}

-#ifdef CONFIG_RAMSTER
+#if defined(CONFIG_RAMSTER) || defined(CONFIG_RAMSTER_MODULE)
struct tmem_xhandle {
uint8_t client_id;
uint8_t xh_data_cksum;
@@ -171,7 +171,7 @@ struct tmem_obj {
unsigned int objnode_tree_height;
unsigned long objnode_count;
long pampd_count;
-#ifdef CONFIG_RAMSTER
+#if defined(CONFIG_RAMSTER) || defined(CONFIG_RAMSTER_MODULE)
/*
* for current design of ramster, all pages belonging to
* an object reside on the same remotenode and extra is
@@ -215,7 +215,7 @@ struct tmem_pamops {
uint32_t);
void (*free)(void *, struct tmem_pool *,
struct tmem_oid *, uint32_t, bool);
-#ifdef CONFIG_RAMSTER
+#if defined(CONFIG_RAMSTER) || defined(CONFIG_RAMSTER_MODULE)
void (*new_obj)(struct tmem_obj *);
void (*free_obj)(struct tmem_pool *, struct tmem_obj *, bool);
void *(*repatriate_preload)(void *, struct tmem_pool *,
@@ -247,7 +247,7 @@ extern int tmem_flush_page(struct tmem_pool *, struct tmem_oid *,
extern int tmem_flush_object(struct tmem_pool *, struct tmem_oid *);
extern int tmem_destroy_pool(struct tmem_pool *);
extern void tmem_new_pool(struct tmem_pool *, uint32_t);
-#ifdef CONFIG_RAMSTER
+#if defined(CONFIG_RAMSTER) || defined(CONFIG_RAMSTER_MODULE)
extern int tmem_replace(struct tmem_pool *, struct tmem_oid *, uint32_t index,
void *);
extern void *tmem_localify_get_pampd(struct tmem_pool *, struct tmem_oid *,
diff --git a/drivers/staging/ramster/zcache-main.c b/drivers/staging/ramster/zcache-main.c
index 38c78b3..29e2523 100644
--- a/drivers/staging/ramster/zcache-main.c
+++ b/drivers/staging/ramster/zcache-main.c
@@ -31,8 +31,10 @@
#include "ramster.h"
#ifdef CONFIG_RAMSTER
static int ramster_enabled;
+static int disable_frontswap_selfshrink;
#else
#define ramster_enabled 0
+#define disable_frontswap_selfshrink 0
#endif

#ifndef __PG_WAS_ACTIVE
@@ -68,8 +70,12 @@ static char *namestr __read_mostly = "zcache";
MODULE_LICENSE("GPL");

/* crypto API for zcache */
+#ifdef CONFIG_ZCACHE2_MODULE
+static char *zcache_comp_name = "lzo";
+#else
#define ZCACHE_COMP_NAME_SZ CRYPTO_MAX_ALG_NAME
static char zcache_comp_name[ZCACHE_COMP_NAME_SZ] __read_mostly;
+#endif
static struct crypto_comp * __percpu *zcache_comp_pcpu_tfms __read_mostly;

enum comp_op {
@@ -1346,7 +1352,7 @@ static int zcache_local_new_pool(uint32_t flags)
int zcache_autocreate_pool(unsigned int cli_id, unsigned int pool_id, bool eph)
{
struct tmem_pool *pool;
- struct zcache_client *cli;
+ struct zcache_client *cli = NULL;
uint32_t flags = eph ? 0 : TMEM_POOL_PERSIST;
int ret = -1;

@@ -1640,6 +1646,7 @@ struct frontswap_ops *zcache_frontswap_register_ops(void)
* OR NOTHING HAPPENS!
*/

+#ifndef CONFIG_ZCACHE2_MODULE
static int __init enable_zcache(char *s)
{
zcache_enabled = 1;
@@ -1706,18 +1713,27 @@ static int __init enable_zcache_compressor(char *s)
return 1;
}
__setup("zcache=", enable_zcache_compressor);
+#endif


-static int __init zcache_comp_init(void)
+static int zcache_comp_init(void)
{
int ret = 0;

/* check crypto algorithm */
+#ifdef CONFIG_ZCACHE2_MODULE
+ ret = crypto_has_comp(zcache_comp_name, 0, 0);
+ if (!ret) {
+ ret = -1;
+ goto out;
+ }
+#else
if (*zcache_comp_name != '\0') {
ret = crypto_has_comp(zcache_comp_name, 0, 0);
if (!ret)
pr_info("zcache: %s not supported\n",
zcache_comp_name);
+ goto out;
}
if (!ret)
strcpy(zcache_comp_name, "lzo");
@@ -1726,6 +1742,7 @@ static int __init zcache_comp_init(void)
ret = 1;
goto out;
}
+#endif
pr_info("zcache: using %s compressor\n", zcache_comp_name);

/* alloc percpu transforms */
@@ -1737,10 +1754,13 @@ out:
return ret;
}

-static int __init zcache_init(void)
+static int zcache_init(void)
{
int ret = 0;

+#ifdef CONFIG_ZCACHE2_MODULE
+ zcache_enabled = 1;
+#endif
if (ramster_enabled) {
namestr = "ramster";
ramster_register_pamops(&zcache_pamops);
@@ -1812,9 +1832,28 @@ static int __init zcache_init(void)
}
if (ramster_enabled)
ramster_init(!disable_cleancache, !disable_frontswap,
- frontswap_has_exclusive_gets, false);
+ frontswap_has_exclusive_gets,
+ !disable_frontswap_selfshrink);
out:
return ret;
}

+#ifdef CONFIG_ZCACHE2_MODULE
+#ifdef CONFIG_RAMSTER
+module_param(ramster_enabled, int, S_IRUGO);
+module_param(disable_frontswap_selfshrink, int, S_IRUGO);
+#endif
+module_param(disable_cleancache, int, S_IRUGO);
+module_param(disable_frontswap, int, S_IRUGO);
+#ifdef FRONTSWAP_HAS_EXCLUSIVE_GETS
+module_param(frontswap_has_exclusive_gets, bool, S_IRUGO);
+#endif
+module_param(disable_frontswap_ignore_nonactive, int, S_IRUGO);
+module_param(zcache_comp_name, charp, S_IRUGO);
+module_init(zcache_init);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dan Magenheimer <[email protected]>");
+MODULE_DESCRIPTION("In-kernel compression of cleancache/frontswap pages");
+#else
late_initcall(zcache_init);
+#endif
diff --git a/drivers/staging/ramster/zcache.h b/drivers/staging/ramster/zcache.h
index 81722b3..8491200 100644
--- a/drivers/staging/ramster/zcache.h
+++ b/drivers/staging/ramster/zcache.h
@@ -39,7 +39,7 @@ extern int zcache_flush_page(int, int, struct tmem_oid *, uint32_t);
extern int zcache_flush_object(int, int, struct tmem_oid *);
extern void zcache_decompress_to_page(char *, unsigned int, struct page *);

-#ifdef CONFIG_RAMSTER
+#if defined(CONFIG_RAMSTER) || defined(CONFIG_RAMSTER_MODULE)
extern void *zcache_pampd_create(char *, unsigned int, bool, int,
struct tmem_handle *);
int zcache_autocreate_pool(unsigned int cli_id, unsigned int pool_id, bool eph);
--
1.7.7.6

2012-11-14 18:59:00

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 3/8] frontswap: Make frontswap_init use a pointer for the ops.

This simplifies the code in the frontswap - we can get rid
of the 'backend_registered' test and instead check against
frontswap_ops.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/staging/ramster/zcache-main.c | 8 +++---
drivers/staging/zcache/zcache-main.c | 8 +++---
drivers/xen/tmem.c | 6 ++--
include/linux/frontswap.h | 2 +-
mm/frontswap.c | 34 +++++++++++++++-----------------
5 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/drivers/staging/ramster/zcache-main.c b/drivers/staging/ramster/zcache-main.c
index a09dd5c..6c8959d 100644
--- a/drivers/staging/ramster/zcache-main.c
+++ b/drivers/staging/ramster/zcache-main.c
@@ -1626,9 +1626,9 @@ static struct frontswap_ops zcache_frontswap_ops = {
.init = zcache_frontswap_init
};

-struct frontswap_ops zcache_frontswap_register_ops(void)
+struct frontswap_ops *zcache_frontswap_register_ops(void)
{
- struct frontswap_ops old_ops =
+ struct frontswap_ops *old_ops =
frontswap_register_ops(&zcache_frontswap_ops);

return old_ops;
@@ -1795,7 +1795,7 @@ static int __init zcache_init(void)
pr_warn("%s: cleancache_ops overridden\n", namestr);
}
if (zcache_enabled && !disable_frontswap) {
- struct frontswap_ops old_ops;
+ struct frontswap_ops *old_ops;

old_ops = zcache_frontswap_register_ops();
if (frontswap_has_exclusive_gets)
@@ -1807,7 +1807,7 @@ static int __init zcache_init(void)
namestr, frontswap_has_exclusive_gets,
!disable_frontswap_ignore_nonactive);
#endif
- if (old_ops.init != NULL)
+ if (old_ops)
pr_warn("%s: frontswap_ops overridden\n", namestr);
}
if (ramster_enabled)
diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index 52b43b7..3db38cb 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -1919,9 +1919,9 @@ static struct frontswap_ops zcache_frontswap_ops = {
.init = zcache_frontswap_init
};

-struct frontswap_ops zcache_frontswap_register_ops(void)
+struct frontswap_ops *zcache_frontswap_register_ops(void)
{
- struct frontswap_ops old_ops =
+ struct frontswap_ops *old_ops =
frontswap_register_ops(&zcache_frontswap_ops);

return old_ops;
@@ -2061,12 +2061,12 @@ static int __init zcache_init(void)
#endif
#ifdef CONFIG_FRONTSWAP
if (zcache_enabled && use_frontswap) {
- struct frontswap_ops old_ops;
+ struct frontswap_ops *old_ops;

old_ops = zcache_frontswap_register_ops();
pr_info("zcache: frontswap enabled using kernel "
"transcendent memory and zsmalloc\n");
- if (old_ops.init != NULL)
+ if (old_ops)
pr_warning("zcache: frontswap_ops overridden");
}
#endif
diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c
index 144564e..4b02c07 100644
--- a/drivers/xen/tmem.c
+++ b/drivers/xen/tmem.c
@@ -362,7 +362,7 @@ static int __init no_frontswap(char *s)
}
__setup("nofrontswap", no_frontswap);

-static struct frontswap_ops __initdata tmem_frontswap_ops = {
+static struct frontswap_ops tmem_frontswap_ops = {
.store = tmem_frontswap_store,
.load = tmem_frontswap_load,
.invalidate_page = tmem_frontswap_flush_page,
@@ -378,11 +378,11 @@ static int __init xen_tmem_init(void)
#ifdef CONFIG_FRONTSWAP
if (tmem_enabled && use_frontswap) {
char *s = "";
- struct frontswap_ops old_ops =
+ struct frontswap_ops *old_ops =
frontswap_register_ops(&tmem_frontswap_ops);

tmem_frontswap_poolid = -1;
- if (old_ops.init != NULL)
+ if (old_ops)
s = " (WARNING: frontswap_ops overridden)";
printk(KERN_INFO "frontswap enabled, RAM provided by "
"Xen Transcendent Memory\n");
diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h
index 3044254..d4f2987 100644
--- a/include/linux/frontswap.h
+++ b/include/linux/frontswap.h
@@ -14,7 +14,7 @@ struct frontswap_ops {
};

extern bool frontswap_enabled;
-extern struct frontswap_ops
+extern struct frontswap_ops *
frontswap_register_ops(struct frontswap_ops *ops);
extern void frontswap_shrink(unsigned long);
extern unsigned long frontswap_curr_pages(void);
diff --git a/mm/frontswap.c b/mm/frontswap.c
index ba58157..e73dd23 100644
--- a/mm/frontswap.c
+++ b/mm/frontswap.c
@@ -24,7 +24,7 @@
* frontswap_ops is set by frontswap_register_ops to contain the pointers
* to the frontswap "backend" implementation functions.
*/
-static struct frontswap_ops frontswap_ops __read_mostly;
+static struct frontswap_ops *frontswap_ops __read_mostly;

/*
* This global enablement flag reduces overhead on systems where frontswap_ops
@@ -90,28 +90,26 @@ static inline void inc_frontswap_invalidates(void) { }
* ignored or fail.
*/
static DECLARE_BITMAP(need_init, MAX_SWAPFILES);
-static bool backend_registered __read_mostly;

/*
* Register operations for frontswap, returning previous thus allowing
* detection of multiple backends and possible nesting.
*/
-struct frontswap_ops frontswap_register_ops(struct frontswap_ops *ops)
+struct frontswap_ops *frontswap_register_ops(struct frontswap_ops *ops)
{
- struct frontswap_ops old = frontswap_ops;
+ struct frontswap_ops *old = frontswap_ops;
int i;

- frontswap_ops = *ops;
frontswap_enabled = true;

for (i = 0; i < MAX_SWAPFILES; i++) {
if (test_and_clear_bit(i, need_init))
- (*frontswap_ops.init)(i);
+ ops->init(i);
}
- /* We MUST have backend_registered called _after_ the frontswap_init's
+ /* We MUST have frontswap_ops set _after_ the frontswap_init's
* have been called. Otherwise __frontswap_store might fail. */
barrier();
- backend_registered = true;
+ frontswap_ops = ops;
return old;
}
EXPORT_SYMBOL(frontswap_register_ops);
@@ -141,11 +139,11 @@ void __frontswap_init(unsigned type)
{
struct swap_info_struct *sis = swap_info[type];

- if (backend_registered) {
+ if (frontswap_ops) {
BUG_ON(sis == NULL);
if (sis->frontswap_map == NULL)
return;
- (*frontswap_ops.init)(type);
+ frontswap_ops->init(type);
}
else {
BUG_ON(type > MAX_SWAPFILES);
@@ -176,7 +174,7 @@ int __frontswap_store(struct page *page)
struct swap_info_struct *sis = swap_info[type];
pgoff_t offset = swp_offset(entry);

- if (!backend_registered) {
+ if (!frontswap_ops) {
inc_frontswap_failed_stores();
return ret;
}
@@ -185,7 +183,7 @@ int __frontswap_store(struct page *page)
BUG_ON(sis == NULL);
if (frontswap_test(sis, offset))
dup = 1;
- ret = frontswap_ops.store(type, offset, page);
+ ret = frontswap_ops->store(type, offset, page);
if (ret == 0) {
frontswap_set(sis, offset);
inc_frontswap_succ_stores();
@@ -220,13 +218,13 @@ int __frontswap_load(struct page *page)
struct swap_info_struct *sis = swap_info[type];
pgoff_t offset = swp_offset(entry);

- if (!backend_registered)
+ if (!frontswap_ops)
return ret;

BUG_ON(!PageLocked(page));
BUG_ON(sis == NULL);
if (frontswap_test(sis, offset))
- ret = frontswap_ops.load(type, offset, page);
+ ret = frontswap_ops->load(type, offset, page);
if (ret == 0) {
inc_frontswap_loads();
if (frontswap_tmem_exclusive_gets_enabled) {
@@ -246,12 +244,12 @@ void __frontswap_invalidate_page(unsigned type, pgoff_t offset)
{
struct swap_info_struct *sis = swap_info[type];

- if (!backend_registered)
+ if (!frontswap_ops)
return;

BUG_ON(sis == NULL);
if (frontswap_test(sis, offset)) {
- frontswap_ops.invalidate_page(type, offset);
+ frontswap_ops->invalidate_page(type, offset);
__frontswap_clear(sis, offset);
inc_frontswap_invalidates();
}
@@ -266,11 +264,11 @@ void __frontswap_invalidate_area(unsigned type)
{
struct swap_info_struct *sis = swap_info[type];

- if (backend_registered) {
+ if (frontswap_ops) {
BUG_ON(sis == NULL);
if (sis->frontswap_map == NULL)
return;
- (*frontswap_ops.invalidate_area)(type);
+ frontswap_ops->invalidate_area(type);
atomic_set(&sis->frontswap_pages, 0);
memset(sis->frontswap_map, 0, sis->max / sizeof(long));
}
--
1.7.7.6

2012-11-14 18:57:38

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 1/8] mm: cleancache: lazy initialization to allow tmem backends to build/run as modules

From: Dan Magenheimer <[email protected]>

With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
built/loaded as modules rather than built-in and enabled by a boot parameter,
this patch provides "lazy initialization", allowing backends to register to
cleancache even after filesystems were mounted. Calls to init_fs and
init_shared_fs are remembered as fake poolids but no real tmem_pools created.
On backend registration the fake poolids are mapped to real poolids and
respective tmem_pools.

Signed-off-by: Stefan Hengelein <[email protected]>
Signed-off-by: Florian Schmaus <[email protected]>
Signed-off-by: Andor Daam <[email protected]>
Signed-off-by: Dan Magenheimer <[email protected]>
[v1: Minor fixes: used #define for some values and bools]
[v2: Removed CLEANCACHE_HAS_LAZY_INIT]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
mm/cleancache.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 139 insertions(+), 17 deletions(-)

diff --git a/mm/cleancache.c b/mm/cleancache.c
index 32e6f41..318a0ad 100644
--- a/mm/cleancache.c
+++ b/mm/cleancache.c
@@ -45,15 +45,45 @@ static u64 cleancache_puts;
static u64 cleancache_invalidates;

/*
+ * When no backend is registered all calls to init_fs and init_shard_fs
+ * are registered and fake poolids are given to the respective
+ * super block but no tmem_pools are created. When a backend
+ * registers with cleancache the previous calls to init_fs and
+ * init_shared_fs are executed to create tmem_pools and set the
+ * respective poolids. While no backend is registered all "puts",
+ * "gets" and "flushes" are ignored or fail.
+ */
+#define MAX_INITIALIZABLE_FS 32
+#define FAKE_FS_POOLID_OFFSET 1000
+#define FAKE_SHARED_FS_POOLID_OFFSET 2000
+
+#define FS_NO_BACKEND (-1)
+#define FS_UNKNOWN (-2)
+static int fs_poolid_map[MAX_INITIALIZABLE_FS];
+static int shared_fs_poolid_map[MAX_INITIALIZABLE_FS];
+
+static char *uuids[MAX_INITIALIZABLE_FS];
+static bool __read_mostly backend_registered;
+
+/*
* register operations for cleancache, returning previous thus allowing
* detection of multiple backends and possible nesting
*/
struct cleancache_ops cleancache_register_ops(struct cleancache_ops *ops)
{
struct cleancache_ops old = cleancache_ops;
+ int i;

cleancache_ops = *ops;
- cleancache_enabled = 1;
+
+ backend_registered = true;
+ for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
+ if (fs_poolid_map[i] == FS_NO_BACKEND)
+ fs_poolid_map[i] = (*cleancache_ops.init_fs)(PAGE_SIZE);
+ if (shared_fs_poolid_map[i] == FS_NO_BACKEND)
+ shared_fs_poolid_map[i] = (*cleancache_ops.init_shared_fs)
+ (uuids[i], PAGE_SIZE);
+ }
return old;
}
EXPORT_SYMBOL(cleancache_register_ops);
@@ -61,15 +91,38 @@ EXPORT_SYMBOL(cleancache_register_ops);
/* Called by a cleancache-enabled filesystem at time of mount */
void __cleancache_init_fs(struct super_block *sb)
{
- sb->cleancache_poolid = (*cleancache_ops.init_fs)(PAGE_SIZE);
+ int i;
+
+ for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
+ if (fs_poolid_map[i] == FS_UNKNOWN) {
+ sb->cleancache_poolid = i + FAKE_FS_POOLID_OFFSET;
+ if (backend_registered)
+ fs_poolid_map[i] = (*cleancache_ops.init_fs)(PAGE_SIZE);
+ else
+ fs_poolid_map[i] = FS_NO_BACKEND;
+ break;
+ }
+ }
}
EXPORT_SYMBOL(__cleancache_init_fs);

/* Called by a cleancache-enabled clustered filesystem at time of mount */
void __cleancache_init_shared_fs(char *uuid, struct super_block *sb)
{
- sb->cleancache_poolid =
- (*cleancache_ops.init_shared_fs)(uuid, PAGE_SIZE);
+ int i;
+
+ for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
+ if (shared_fs_poolid_map[i] == FS_UNKNOWN) {
+ sb->cleancache_poolid = i + FAKE_SHARED_FS_POOLID_OFFSET;
+ uuids[i] = uuid;
+ if (backend_registered)
+ shared_fs_poolid_map[i] = (*cleancache_ops.init_shared_fs)
+ (uuid, PAGE_SIZE);
+ else
+ shared_fs_poolid_map[i] = FS_NO_BACKEND;
+ break;
+ }
+ }
}
EXPORT_SYMBOL(__cleancache_init_shared_fs);

@@ -99,6 +152,19 @@ static int cleancache_get_key(struct inode *inode,
}

/*
+ * Returns a pool_id that is associated with a given fake poolid.
+ */
+static int get_poolid_from_fake(int fake_pool_id)
+{
+ if (fake_pool_id >= FAKE_SHARED_FS_POOLID_OFFSET)
+ return shared_fs_poolid_map[fake_pool_id -
+ FAKE_SHARED_FS_POOLID_OFFSET];
+ else if (fake_pool_id >= FAKE_FS_POOLID_OFFSET)
+ return fs_poolid_map[fake_pool_id - FAKE_FS_POOLID_OFFSET];
+ return FS_NO_BACKEND;
+}
+
+/*
* "Get" data from cleancache associated with the poolid/inode/index
* that were specified when the data was put to cleanache and, if
* successful, use it to fill the specified page with data and return 0.
@@ -109,17 +175,26 @@ int __cleancache_get_page(struct page *page)
{
int ret = -1;
int pool_id;
+ int fake_pool_id;
struct cleancache_filekey key = { .u.key = { 0 } };

+ if (!backend_registered) {
+ cleancache_failed_gets++;
+ goto out;
+ }
+
VM_BUG_ON(!PageLocked(page));
- pool_id = page->mapping->host->i_sb->cleancache_poolid;
- if (pool_id < 0)
+ fake_pool_id = page->mapping->host->i_sb->cleancache_poolid;
+ if (fake_pool_id < 0)
goto out;
+ pool_id = get_poolid_from_fake(fake_pool_id);

if (cleancache_get_key(page->mapping->host, &key) < 0)
goto out;

- ret = (*cleancache_ops.get_page)(pool_id, key, page->index, page);
+ if (pool_id >= 0)
+ ret = (*cleancache_ops.get_page)(pool_id,
+ key, page->index, page);
if (ret == 0)
cleancache_succ_gets++;
else
@@ -138,12 +213,23 @@ EXPORT_SYMBOL(__cleancache_get_page);
void __cleancache_put_page(struct page *page)
{
int pool_id;
+ int fake_pool_id;
struct cleancache_filekey key = { .u.key = { 0 } };

+ if (!backend_registered) {
+ cleancache_puts++;
+ return;
+ }
+
VM_BUG_ON(!PageLocked(page));
- pool_id = page->mapping->host->i_sb->cleancache_poolid;
+ fake_pool_id = page->mapping->host->i_sb->cleancache_poolid;
+ if (fake_pool_id < 0)
+ return;
+
+ pool_id = get_poolid_from_fake(fake_pool_id);
+
if (pool_id >= 0 &&
- cleancache_get_key(page->mapping->host, &key) >= 0) {
+ cleancache_get_key(page->mapping->host, &key) >= 0) {
(*cleancache_ops.put_page)(pool_id, key, page->index, page);
cleancache_puts++;
}
@@ -158,14 +244,22 @@ void __cleancache_invalidate_page(struct address_space *mapping,
struct page *page)
{
/* careful... page->mapping is NULL sometimes when this is called */
- int pool_id = mapping->host->i_sb->cleancache_poolid;
+ int pool_id;
+ int fake_pool_id = mapping->host->i_sb->cleancache_poolid;
struct cleancache_filekey key = { .u.key = { 0 } };

- if (pool_id >= 0) {
+ if (!backend_registered)
+ return;
+
+ if (fake_pool_id >= 0) {
+ pool_id = get_poolid_from_fake(fake_pool_id);
+ if (pool_id < 0)
+ return;
+
VM_BUG_ON(!PageLocked(page));
if (cleancache_get_key(mapping->host, &key) >= 0) {
(*cleancache_ops.invalidate_page)(pool_id,
- key, page->index);
+ key, page->index);
cleancache_invalidates++;
}
}
@@ -179,9 +273,18 @@ EXPORT_SYMBOL(__cleancache_invalidate_page);
*/
void __cleancache_invalidate_inode(struct address_space *mapping)
{
- int pool_id = mapping->host->i_sb->cleancache_poolid;
+ int pool_id;
+ int fake_pool_id = mapping->host->i_sb->cleancache_poolid;
struct cleancache_filekey key = { .u.key = { 0 } };

+ if (!backend_registered)
+ return;
+
+ if (fake_pool_id < 0)
+ return;
+
+ pool_id = get_poolid_from_fake(fake_pool_id);
+
if (pool_id >= 0 && cleancache_get_key(mapping->host, &key) >= 0)
(*cleancache_ops.invalidate_inode)(pool_id, key);
}
@@ -194,16 +297,30 @@ EXPORT_SYMBOL(__cleancache_invalidate_inode);
*/
void __cleancache_invalidate_fs(struct super_block *sb)
{
- if (sb->cleancache_poolid >= 0) {
- int old_poolid = sb->cleancache_poolid;
- sb->cleancache_poolid = -1;
- (*cleancache_ops.invalidate_fs)(old_poolid);
+ int index;
+ int fake_pool_id = sb->cleancache_poolid;
+ int old_poolid = fake_pool_id;
+
+ if (fake_pool_id >= FAKE_SHARED_FS_POOLID_OFFSET) {
+ index = fake_pool_id - FAKE_SHARED_FS_POOLID_OFFSET;
+ old_poolid = shared_fs_poolid_map[index];
+ shared_fs_poolid_map[index] = FS_UNKNOWN;
+ uuids[index] = NULL;
+ } else if (fake_pool_id >= FAKE_FS_POOLID_OFFSET) {
+ index = fake_pool_id - FAKE_FS_POOLID_OFFSET;
+ old_poolid = fs_poolid_map[index];
+ fs_poolid_map[index] = FS_UNKNOWN;
}
+ sb->cleancache_poolid = -1;
+ if (backend_registered)
+ (*cleancache_ops.invalidate_fs)(old_poolid);
}
EXPORT_SYMBOL(__cleancache_invalidate_fs);

static int __init init_cleancache(void)
{
+ int i;
+
#ifdef CONFIG_DEBUG_FS
struct dentry *root = debugfs_create_dir("cleancache", NULL);
if (root == NULL)
@@ -215,6 +332,11 @@ static int __init init_cleancache(void)
debugfs_create_u64("invalidates", S_IRUGO,
root, &cleancache_invalidates);
#endif
+ for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
+ fs_poolid_map[i] = FS_UNKNOWN;
+ shared_fs_poolid_map[i] = FS_UNKNOWN;
+ }
+ cleancache_enabled = 1;
return 0;
}
module_init(init_cleancache)
--
1.7.7.6

2012-11-16 23:10:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/8] mm: cleancache: lazy initialization to allow tmem backends to build/run as modules

On Wed, 14 Nov 2012 13:57:05 -0500
Konrad Rzeszutek Wilk <[email protected]> wrote:

> From: Dan Magenheimer <[email protected]>
>
> With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
> built/loaded as modules rather than built-in and enabled by a boot parameter,
> this patch provides "lazy initialization", allowing backends to register to
> cleancache even after filesystems were mounted. Calls to init_fs and
> init_shared_fs are remembered as fake poolids but no real tmem_pools created.
> On backend registration the fake poolids are mapped to real poolids and
> respective tmem_pools.

What is your merge plan/path for this work?

>
> ...
>
> + * When no backend is registered all calls to init_fs and init_shard_fs

"init_shared_fs"

> + * are registered and fake poolids are given to the respective
> + * super block but no tmem_pools are created. When a backend
> + * registers with cleancache the previous calls to init_fs and
> + * init_shared_fs are executed to create tmem_pools and set the
> + * respective poolids. While no backend is registered all "puts",
> + * "gets" and "flushes" are ignored or fail.

The comment could use all 80 cols..

>
> ...
>
> struct cleancache_ops cleancache_register_ops(struct cleancache_ops *ops)
> {
> struct cleancache_ops old = cleancache_ops;
> + int i;
>
> cleancache_ops = *ops;
> - cleancache_enabled = 1;
> +
> + backend_registered = true;
> + for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
> + if (fs_poolid_map[i] == FS_NO_BACKEND)
> + fs_poolid_map[i] = (*cleancache_ops.init_fs)(PAGE_SIZE);
> + if (shared_fs_poolid_map[i] == FS_NO_BACKEND)
> + shared_fs_poolid_map[i] = (*cleancache_ops.init_shared_fs)
> + (uuids[i], PAGE_SIZE);
> + }
> return old;
> }

I never noticed before - this function returns a large structure by
value. That's really really unusual in the kernel. I see no problem
with it per-se, but it might generate awful code.

Also, this function has no locking and is blatantly racy.

> EXPORT_SYMBOL(cleancache_register_ops);
> @@ -61,15 +91,38 @@ EXPORT_SYMBOL(cleancache_register_ops);
> /* Called by a cleancache-enabled filesystem at time of mount */
> void __cleancache_init_fs(struct super_block *sb)
> {
> - sb->cleancache_poolid = (*cleancache_ops.init_fs)(PAGE_SIZE);
> + int i;
> +
> + for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
> + if (fs_poolid_map[i] == FS_UNKNOWN) {
> + sb->cleancache_poolid = i + FAKE_FS_POOLID_OFFSET;
> + if (backend_registered)
> + fs_poolid_map[i] = (*cleancache_ops.init_fs)(PAGE_SIZE);
> + else
> + fs_poolid_map[i] = FS_NO_BACKEND;
> + break;
> + }
> + }
> }
> EXPORT_SYMBOL(__cleancache_init_fs);

This also looks wildly racy.

> /* Called by a cleancache-enabled clustered filesystem at time of mount */
> void __cleancache_init_shared_fs(char *uuid, struct super_block *sb)
> {
> - sb->cleancache_poolid =
> - (*cleancache_ops.init_shared_fs)(uuid, PAGE_SIZE);
> + int i;
> +
> + for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
> + if (shared_fs_poolid_map[i] == FS_UNKNOWN) {
> + sb->cleancache_poolid = i + FAKE_SHARED_FS_POOLID_OFFSET;
> + uuids[i] = uuid;
> + if (backend_registered)
> + shared_fs_poolid_map[i] = (*cleancache_ops.init_shared_fs)
> + (uuid, PAGE_SIZE);
> + else
> + shared_fs_poolid_map[i] = FS_NO_BACKEND;
> + break;
> + }
> + }
> }
> EXPORT_SYMBOL(__cleancache_init_shared_fs);

Again, a huge mess if two threads execute this concurrently.

> @@ -99,6 +152,19 @@ static int cleancache_get_key(struct inode *inode,
> }
>
> /*
> + * Returns a pool_id that is associated with a given fake poolid.

Was there a comment anywhere which tells the reader what a "fake poolid" is?

> + */
> +static int get_poolid_from_fake(int fake_pool_id)
> +{
> + if (fake_pool_id >= FAKE_SHARED_FS_POOLID_OFFSET)
> + return shared_fs_poolid_map[fake_pool_id -
> + FAKE_SHARED_FS_POOLID_OFFSET];
> + else if (fake_pool_id >= FAKE_FS_POOLID_OFFSET)
> + return fs_poolid_map[fake_pool_id - FAKE_FS_POOLID_OFFSET];
> + return FS_NO_BACKEND;
> +}
> +
> +/*
> * "Get" data from cleancache associated with the poolid/inode/index
> * that were specified when the data was put to cleanache and, if
> * successful, use it to fill the specified page with data and return 0.
> @@ -109,17 +175,26 @@ int __cleancache_get_page(struct page *page)
> {
> int ret = -1;
> int pool_id;
> + int fake_pool_id;
> struct cleancache_filekey key = { .u.key = { 0 } };
>
> + if (!backend_registered) {
> + cleancache_failed_gets++;
> + goto out;
> + }

Races everywhere...

> VM_BUG_ON(!PageLocked(page));
> - pool_id = page->mapping->host->i_sb->cleancache_poolid;
> - if (pool_id < 0)
> + fake_pool_id = page->mapping->host->i_sb->cleancache_poolid;
> + if (fake_pool_id < 0)
> goto out;
> + pool_id = get_poolid_from_fake(fake_pool_id);
>
> if (cleancache_get_key(page->mapping->host, &key) < 0)
> goto out;
>
> - ret = (*cleancache_ops.get_page)(pool_id, key, page->index, page);
> + if (pool_id >= 0)
> + ret = (*cleancache_ops.get_page)(pool_id,
> + key, page->index, page);
> if (ret == 0)
> cleancache_succ_gets++;
> else
> @@ -138,12 +213,23 @@ EXPORT_SYMBOL(__cleancache_get_page);
> void __cleancache_put_page(struct page *page)
> {
> int pool_id;
> + int fake_pool_id;
> struct cleancache_filekey key = { .u.key = { 0 } };
>
> + if (!backend_registered) {
> + cleancache_puts++;
> + return;
> + }

More..

> VM_BUG_ON(!PageLocked(page));
> - pool_id = page->mapping->host->i_sb->cleancache_poolid;
> + fake_pool_id = page->mapping->host->i_sb->cleancache_poolid;
> + if (fake_pool_id < 0)
> + return;
> +
> + pool_id = get_poolid_from_fake(fake_pool_id);
> +
> if (pool_id >= 0 &&
> - cleancache_get_key(page->mapping->host, &key) >= 0) {
> + cleancache_get_key(page->mapping->host, &key) >= 0) {
> (*cleancache_ops.put_page)(pool_id, key, page->index, page);
> cleancache_puts++;
> }
>
> ...
>

I don't understand the timing flow here, nor the existing constraints
on what can be done and when, but....

The whole thing looks really hacky? Why do we need to remember all
this stuff for later on? What prevents us from simply synchonously
doing whatever we need to do when someone wants to register a backend?

Maybe a little ascii time chart/flow diagram would help.

2012-11-16 23:16:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/8] mm: frontswap: lazy initialization to allow tmem backends to build/run as modules

On Wed, 14 Nov 2012 13:57:06 -0500
Konrad Rzeszutek Wilk <[email protected]> wrote:

> From: Dan Magenheimer <[email protected]>
>
> With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
> built/loaded as modules rather than built-in and enabled by a boot parameter,
> this patch provides "lazy initialization", allowing backends to register to
> frontswap even after swapon was run. Before a backend registers all calls
> to init are recorded and the creation of tmem_pools delayed until a backend
> registers or until a frontswap put is attempted.
>
>
> ...
>
> --- a/mm/frontswap.c
> +++ b/mm/frontswap.c
> @@ -80,6 +80,18 @@ static inline void inc_frontswap_succ_stores(void) { }
> static inline void inc_frontswap_failed_stores(void) { }
> static inline void inc_frontswap_invalidates(void) { }
> #endif
> +
> +/*
> + * When no backend is registered all calls to init are registered and

What is "init"? Spell it out fully, please.

> + * remembered but fail to create tmem_pools. When a backend registers with
> + * frontswap the previous calls to init are executed to create tmem_pools
> + * and set the respective poolids.

Again, seems really hacky. Why can't we just change callers so they
call things in the correct order?

> + * While no backend is registered all "puts", "gets" and "flushes" are
> + * ignored or fail.
> + */
> +static DECLARE_BITMAP(need_init, MAX_SWAPFILES);
> +static bool backend_registered __read_mostly;
> +
> /*
> * Register operations for frontswap, returning previous thus allowing
> * detection of multiple backends and possible nesting.
> @@ -87,9 +99,19 @@ static inline void inc_frontswap_invalidates(void) { }
> struct frontswap_ops frontswap_register_ops(struct frontswap_ops *ops)
> {
> struct frontswap_ops old = frontswap_ops;
> + int i;
>
> frontswap_ops = *ops;
> frontswap_enabled = true;
> +
> + for (i = 0; i < MAX_SWAPFILES; i++) {
> + if (test_and_clear_bit(i, need_init))

ooh, that wasn't racy ;)

> + (*frontswap_ops.init)(i);
> + }
> + /* We MUST have backend_registered called _after_ the frontswap_init's
> + * have been called. Otherwise __frontswap_store might fail. */

Comment makes no sense - backend_registered is not a function.

Also, let's lay the comments out conventionally please:

/*
* We MUST have backend_registered called _after_ the frontswap_init's
* have been called. Otherwise __frontswap_store might fail.
*/


> + barrier();
> + backend_registered = true;
> return old;
> }
> EXPORT_SYMBOL(frontswap_register_ops);
>
> ...
>
> @@ -226,12 +266,15 @@ void __frontswap_invalidate_area(unsigned type)
> {
> struct swap_info_struct *sis = swap_info[type];
>
> - BUG_ON(sis == NULL);
> - if (sis->frontswap_map == NULL)
> - return;
> - frontswap_ops.invalidate_area(type);
> - atomic_set(&sis->frontswap_pages, 0);
> - memset(sis->frontswap_map, 0, sis->max / sizeof(long));
> + if (backend_registered) {
> + BUG_ON(sis == NULL);
> + if (sis->frontswap_map == NULL)
> + return;
> + (*frontswap_ops.invalidate_area)(type);
> + atomic_set(&sis->frontswap_pages, 0);
> + memset(sis->frontswap_map, 0, sis->max / sizeof(long));
> + }
> + clear_bit(type, need_init);
> }
> EXPORT_SYMBOL(__frontswap_invalidate_area);
>
> @@ -364,6 +407,9 @@ static int __init init_frontswap(void)
> debugfs_create_u64("invalidates", S_IRUGO,
> root, &frontswap_invalidates);
> #endif
> + bitmap_zero(need_init, MAX_SWAPFILES);

unneeded?

> + frontswap_enabled = 1;
> return 0;
> }
>
> ...
>

2012-11-19 00:53:49

by Bob Liu

[permalink] [raw]
Subject: Re: [PATCH 2/8] mm: frontswap: lazy initialization to allow tmem backends to build/run as modules

On Sat, Nov 17, 2012 at 7:16 AM, Andrew Morton
<[email protected]> wrote:
> On Wed, 14 Nov 2012 13:57:06 -0500
> Konrad Rzeszutek Wilk <[email protected]> wrote:
>
>> From: Dan Magenheimer <[email protected]>
>>
>> With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
>> built/loaded as modules rather than built-in and enabled by a boot parameter,
>> this patch provides "lazy initialization", allowing backends to register to
>> frontswap even after swapon was run. Before a backend registers all calls
>> to init are recorded and the creation of tmem_pools delayed until a backend
>> registers or until a frontswap put is attempted.
>>
>>
>> ...
>>
>> --- a/mm/frontswap.c
>> +++ b/mm/frontswap.c
>> @@ -80,6 +80,18 @@ static inline void inc_frontswap_succ_stores(void) { }
>> static inline void inc_frontswap_failed_stores(void) { }
>> static inline void inc_frontswap_invalidates(void) { }
>> #endif
>> +
>> +/*
>> + * When no backend is registered all calls to init are registered and
>
> What is "init"? Spell it out fully, please.
>

I think it's frontswap_init().
swapon will call frontswap_init() and in it we need to call init
function of backends with some parameters
like swap_type.

>> + * remembered but fail to create tmem_pools. When a backend registers with
>> + * frontswap the previous calls to init are executed to create tmem_pools
>> + * and set the respective poolids.
>
> Again, seems really hacky. Why can't we just change callers so they
> call things in the correct order?
>

I don't think so, because it asynchronous.

The original idea was to make backends like zcache/tmem modularization.
So that it's more convenient and flexible to use and testing.

But currently callers like swapon only invoke frontswap_init() once,
it fail if backend not registered.
We have no way to notify swap to call frontswap_init() again when
backend registered in some random time
in future.

>> + * While no backend is registered all "puts", "gets" and "flushes" are
>> + * ignored or fail.
>> + */
>> +static DECLARE_BITMAP(need_init, MAX_SWAPFILES);
>> +static bool backend_registered __read_mostly;
>> +
>> /*
>> * Register operations for frontswap, returning previous thus allowing
>> * detection of multiple backends and possible nesting.
>> @@ -87,9 +99,19 @@ static inline void inc_frontswap_invalidates(void) { }
>> struct frontswap_ops frontswap_register_ops(struct frontswap_ops *ops)
>> {
>> struct frontswap_ops old = frontswap_ops;
>> + int i;
>>
>> frontswap_ops = *ops;
>> frontswap_enabled = true;
>> +
>> + for (i = 0; i < MAX_SWAPFILES; i++) {
>> + if (test_and_clear_bit(i, need_init))
>
> ooh, that wasn't racy ;)
>

Hmm, i agree.
Seems some lock is needed, actually i think this code only support one
backend at the same.
So it's less risky.

>> + (*frontswap_ops.init)(i);
>> + }
>> + /* We MUST have backend_registered called _after_ the frontswap_init's
>> + * have been called. Otherwise __frontswap_store might fail. */
>
> Comment makes no sense - backend_registered is not a function.
>
> Also, let's lay the comments out conventionally please:
>
> /*
> * We MUST have backend_registered called _after_ the frontswap_init's
> * have been called. Otherwise __frontswap_store might fail.
> */
>
>
>> + barrier();
>> + backend_registered = true;
>> return old;
>> }
>> EXPORT_SYMBOL(frontswap_register_ops);
>>
>> ...
>>
>> @@ -226,12 +266,15 @@ void __frontswap_invalidate_area(unsigned type)
>> {
>> struct swap_info_struct *sis = swap_info[type];
>>
>> - BUG_ON(sis == NULL);
>> - if (sis->frontswap_map == NULL)
>> - return;
>> - frontswap_ops.invalidate_area(type);
>> - atomic_set(&sis->frontswap_pages, 0);
>> - memset(sis->frontswap_map, 0, sis->max / sizeof(long));
>> + if (backend_registered) {
>> + BUG_ON(sis == NULL);
>> + if (sis->frontswap_map == NULL)
>> + return;
>> + (*frontswap_ops.invalidate_area)(type);
>> + atomic_set(&sis->frontswap_pages, 0);
>> + memset(sis->frontswap_map, 0, sis->max / sizeof(long));
>> + }
>> + clear_bit(type, need_init);
>> }
>> EXPORT_SYMBOL(__frontswap_invalidate_area);
>>
>> @@ -364,6 +407,9 @@ static int __init init_frontswap(void)
>> debugfs_create_u64("invalidates", S_IRUGO,
>> root, &frontswap_invalidates);
>> #endif
>> + bitmap_zero(need_init, MAX_SWAPFILES);
>
> unneeded?
>
>> + frontswap_enabled = 1;
>> return 0;
>> }
>>
>> ...
>>

--
Thanks,
--Bob

2012-11-19 22:25:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/8] mm: frontswap: lazy initialization to allow tmem backends to build/run as modules

On Mon, 19 Nov 2012 08:53:46 +0800
Bob Liu <[email protected]> wrote:

> On Sat, Nov 17, 2012 at 7:16 AM, Andrew Morton
> <[email protected]> wrote:
> > On Wed, 14 Nov 2012 13:57:06 -0500
> > Konrad Rzeszutek Wilk <[email protected]> wrote:
> >
> >> From: Dan Magenheimer <[email protected]>
> >>
> >> With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
> >> built/loaded as modules rather than built-in and enabled by a boot parameter,
> >> this patch provides "lazy initialization", allowing backends to register to
> >> frontswap even after swapon was run. Before a backend registers all calls
> >> to init are recorded and the creation of tmem_pools delayed until a backend
> >> registers or until a frontswap put is attempted.
> >>
> >>
> >> ...
> >>
> >> --- a/mm/frontswap.c
> >> +++ b/mm/frontswap.c
> >> @@ -80,6 +80,18 @@ static inline void inc_frontswap_succ_stores(void) { }
> >> static inline void inc_frontswap_failed_stores(void) { }
> >> static inline void inc_frontswap_invalidates(void) { }
> >> #endif
> >> +
> >> +/*
> >> + * When no backend is registered all calls to init are registered and
> >
> > What is "init"? Spell it out fully, please.
> >
>
> I think it's frontswap_init().
> swapon will call frontswap_init() and in it we need to call init
> function of backends with some parameters
> like swap_type.

Well, let's improve that comment please.

> >> + * remembered but fail to create tmem_pools. When a backend registers with
> >> + * frontswap the previous calls to init are executed to create tmem_pools
> >> + * and set the respective poolids.
> >
> > Again, seems really hacky. Why can't we just change callers so they
> > call things in the correct order?
> >
>
> I don't think so, because it asynchronous.
>
> The original idea was to make backends like zcache/tmem modularization.
> So that it's more convenient and flexible to use and testing.
>
> But currently callers like swapon only invoke frontswap_init() once,
> it fail if backend not registered.
> We have no way to notify swap to call frontswap_init() again when
> backend registered in some random time
> in future.

We could add such a way?

2012-11-27 21:27:23

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 2/8] mm: frontswap: lazy initialization to allow tmem backends to build/run as modules

On Mon, Nov 19, 2012 at 02:25:16PM -0800, Andrew Morton wrote:
> On Mon, 19 Nov 2012 08:53:46 +0800
> Bob Liu <[email protected]> wrote:
>
> > On Sat, Nov 17, 2012 at 7:16 AM, Andrew Morton
> > <[email protected]> wrote:
> > > On Wed, 14 Nov 2012 13:57:06 -0500
> > > Konrad Rzeszutek Wilk <[email protected]> wrote:
> > >
> > >> From: Dan Magenheimer <[email protected]>
> > >>
> > >> With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
> > >> built/loaded as modules rather than built-in and enabled by a boot parameter,
> > >> this patch provides "lazy initialization", allowing backends to register to
> > >> frontswap even after swapon was run. Before a backend registers all calls
> > >> to init are recorded and the creation of tmem_pools delayed until a backend
> > >> registers or until a frontswap put is attempted.
> > >>
> > >>
> > >> ...
> > >>
> > >> --- a/mm/frontswap.c
> > >> +++ b/mm/frontswap.c
> > >> @@ -80,6 +80,18 @@ static inline void inc_frontswap_succ_stores(void) { }
> > >> static inline void inc_frontswap_failed_stores(void) { }
> > >> static inline void inc_frontswap_invalidates(void) { }
> > >> #endif
> > >> +
> > >> +/*
> > >> + * When no backend is registered all calls to init are registered and
> > >
> > > What is "init"? Spell it out fully, please.
> > >
> >
> > I think it's frontswap_init().
> > swapon will call frontswap_init() and in it we need to call init
> > function of backends with some parameters
> > like swap_type.
>
> Well, let's improve that comment please.
>
> > >> + * remembered but fail to create tmem_pools. When a backend registers with
> > >> + * frontswap the previous calls to init are executed to create tmem_pools
> > >> + * and set the respective poolids.
> > >
> > > Again, seems really hacky. Why can't we just change callers so they
> > > call things in the correct order?
> > >
> >
> > I don't think so, because it asynchronous.
> >
> > The original idea was to make backends like zcache/tmem modularization.
> > So that it's more convenient and flexible to use and testing.
> >
> > But currently callers like swapon only invoke frontswap_init() once,
> > it fail if backend not registered.
> > We have no way to notify swap to call frontswap_init() again when
> > backend registered in some random time
> > in future.
>
> We could add such a way?

Hey Andrew,

Sorry for the late email. Right at as you posted your questions I went on vacation :-)
Let me respond to your email and rebase the patch per your comments/ideas this week.