2013-08-30 08:43:30

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [RFC PATCH 0/4] mm: migrate zbud pages

Hi,

Currently zbud pages are not movable and they cannot be allocated from CMA
region. These patches add migration of zbud pages.

The zbud migration code utilizes mapping so many exceptions to migrate
code was added. It can be replaced for example with pin page
control subsystem:
http://article.gmane.org/gmane.linux.kernel.mm/105308
In such case the zbud migration code (zbud_migrate_page()) can be safely
re-used.


Patch "[PATCH 3/4] mm: use indirect zbud handle and radix tree" changes zbud
handle to support migration. Now the handle is an index in radix tree and
zbud_map() maps it to proper virtual address. This exposes race conditions,
some of them are discussed already here:
http://article.gmane.org/gmane.linux.kernel.mm/105988

Races are fixed by adding internal map count for each zbud handle.
The map count is increased on zbud_map() call.

Some races between writeback and invalidate still exist. In such case a message
can be seen in logs:
zbud: error: could not lookup handle 13810 in tree
Patches from discussion above may resolve it.

I have considered using "pgoff_t offset" as handle but it prevented storing
duplicate pages in zswap.


Patch "[PATCH 2/4] mm: use mapcount for identifying zbud pages" introduces
PageZbud() function which identifies zbud pages by page->_mapcount.
Dave Hansen proposed aliasing PG_zbud=PG_slab but in such case patch
would be more intrusive.

Any ideas for a better solution are welcome.


This patch set is based on v3.11-rc7-30-g41615e8.


This is continuation of my previous work: reclaiming zbud pages on migration
and compaction. However it current solution is completely different so I am not
attaching previous changelog.
Previous patches can be found here:
* [RFC PATCH v2 0/4] mm: reclaim zbud pages on migration and compaction
http://article.gmane.org/gmane.linux.kernel.mm/105153
* [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction
http://article.gmane.org/gmane.linux.kernel.mm/104801

One patch from previous work is re-used along with minor changes:
"[PATCH 1/4] zbud: use page ref counter for zbud pages"
* Add missing spin_unlock in zbud_reclaim_page().
* Decrease pool->pages_nr in zbud_free(), not when putting page. This also
removes the need of holding lock while call to put_zbud_page().


Best regards,
Krzysztof Kozlowski


Krzysztof Kozlowski (4):
zbud: use page ref counter for zbud pages
mm: use mapcount for identifying zbud pages
mm: use indirect zbud handle and radix tree
mm: migrate zbud pages

include/linux/mm.h | 23 +++
include/linux/zbud.h | 3 +-
mm/compaction.c | 7 +
mm/migrate.c | 17 +-
mm/zbud.c | 552 +++++++++++++++++++++++++++++++++++++++++---------
mm/zswap.c | 28 ++-
6 files changed, 525 insertions(+), 105 deletions(-)

--
1.7.9.5


2013-08-30 08:43:35

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [RFC PATCH 2/4] mm: use mapcount for identifying zbud pages

Currently zbud pages do not have any flags set so it is not possible to
identify them during migration or compaction.

Implement PageZbud() by comparing page->_mapcount to -127 to distinguish
pages allocated by zbud. Just like PageBuddy() is implemented.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
include/linux/mm.h | 23 +++++++++++++++++++++++
mm/zbud.c | 4 ++++
2 files changed, 27 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f022460..b9ae6f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -440,6 +440,7 @@ static inline void init_page_count(struct page *page)
* efficiently by most CPU architectures.
*/
#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
+#define PAGE_ZBUD_MAPCOUNT_VALUE (-127)

static inline int PageBuddy(struct page *page)
{
@@ -458,6 +459,28 @@ static inline void __ClearPageBuddy(struct page *page)
atomic_set(&page->_mapcount, -1);
}

+#ifdef CONFIG_ZBUD
+static inline int PageZbud(struct page *page)
+{
+ return atomic_read(&page->_mapcount) == PAGE_ZBUD_MAPCOUNT_VALUE;
+}
+
+static inline void SetPageZbud(struct page *page)
+{
+ VM_BUG_ON(atomic_read(&page->_mapcount) != -1);
+ atomic_set(&page->_mapcount, PAGE_ZBUD_MAPCOUNT_VALUE);
+}
+
+static inline void ClearPageZbud(struct page *page)
+{
+ VM_BUG_ON(!PageZbud(page));
+ atomic_set(&page->_mapcount, -1);
+}
+#else
+PAGEFLAG_FALSE(Zbud)
+#endif
+
+
void put_page(struct page *page);
void put_pages_list(struct list_head *pages);

diff --git a/mm/zbud.c b/mm/zbud.c
index aa9a15c..9267cd9 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -197,7 +197,10 @@ static void get_zbud_page(struct zbud_header *zhdr)
static int put_zbud_page(struct zbud_header *zhdr)
{
struct page *page = virt_to_page(zhdr);
+ VM_BUG_ON(!PageZbud(page));
+
if (put_page_testzero(page)) {
+ ClearPageZbud(page);
free_hot_cold_page(page, 0);
return 1;
}
@@ -307,6 +310,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
* don't increase the page count.
*/
zhdr = init_zbud_page(page);
+ SetPageZbud(page);
bud = FIRST;

found:
--
1.7.9.5

2013-08-30 08:43:49

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [RFC PATCH 4/4] mm: migrate zbud pages

Add migration support for zbud. This allows adding __GFP_MOVABLE flag
when allocating zbud pages and effectively CMA pool can be used for
zswap.

zbud pages are not movable and are not stored under any LRU (except
zbud's LRU). PageZbud flag is used in isolate_migratepages_range() to
grab zbud pages and pass them later for migration.

page->private field is used for storing pointer to zbud_pool.
The zbud_pool is needed during migration for locking the pool and
accessing radix tree.

The zbud migration code utilizes mapping so many exceptions to migrate
code was added. It can be replaced for example with pin page control
subsystem:
http://article.gmane.org/gmane.linux.kernel.mm/105308
In such case the zbud migration code (zbud_migrate_page()) can be safely
re-used.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
include/linux/zbud.h | 1 +
mm/compaction.c | 7 +++
mm/migrate.c | 17 +++++-
mm/zbud.c | 164 +++++++++++++++++++++++++++++++++++++++++++++++---
mm/zswap.c | 4 +-
5 files changed, 179 insertions(+), 14 deletions(-)

diff --git a/include/linux/zbud.h b/include/linux/zbud.h
index 12d72df..3bc2e38 100644
--- a/include/linux/zbud.h
+++ b/include/linux/zbud.h
@@ -11,6 +11,7 @@ struct zbud_ops {

struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops);
void zbud_destroy_pool(struct zbud_pool *pool);
+int zbud_put_page(struct page *page);
int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
unsigned long *handle);
void zbud_free(struct zbud_pool *pool, unsigned long handle);
diff --git a/mm/compaction.c b/mm/compaction.c
index 05ccb4c..8acd198 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -534,6 +534,12 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
goto next_pageblock;
}

+ if (PageZbud(page)) {
+ BUG_ON(PageLRU(page));
+ get_page(page);
+ goto isolated;
+ }
+
/*
* Check may be lockless but that's ok as we recheck later.
* It's possible to migrate LRU pages and balloon pages
@@ -601,6 +607,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
/* Successfully isolated */
cc->finished_update_migrate = true;
del_page_from_lru_list(page, lruvec, page_lru(page));
+isolated:
list_add(&page->lru, migratelist);
cc->nr_migratepages++;
nr_isolated++;
diff --git a/mm/migrate.c b/mm/migrate.c
index 6f0c244..5254eb2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -36,6 +36,7 @@
#include <linux/hugetlb_cgroup.h>
#include <linux/gfp.h>
#include <linux/balloon_compaction.h>
+#include <linux/zbud.h>

#include <asm/tlbflush.h>

@@ -105,6 +106,8 @@ void putback_movable_pages(struct list_head *l)
page_is_file_cache(page));
if (unlikely(balloon_page_movable(page)))
balloon_page_putback(page);
+ else if (unlikely(PageZbud(page)))
+ zbud_put_page(page);
else
putback_lru_page(page);
}
@@ -832,6 +835,10 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
goto skip_unmap;
}

+ if (unlikely(PageZbud(page))) {
+ remap_swapcache = 0;
+ goto skip_unmap;
+ }
/* Establish migration ptes or remove ptes */
try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);

@@ -902,13 +909,19 @@ out:
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- putback_lru_page(page);
+ if (unlikely(PageZbud(page)))
+ zbud_put_page(page);
+ else
+ putback_lru_page(page);
}
/*
* Move the new page to the LRU. If migration was not successful
* then this will free the page.
*/
- putback_lru_page(newpage);
+ if (unlikely(PageZbud(newpage)))
+ zbud_put_page(newpage);
+ else
+ putback_lru_page(newpage);
if (result) {
if (rc)
*result = rc;
diff --git a/mm/zbud.c b/mm/zbud.c
index 5ff4ffa..63f0a91 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -51,6 +51,8 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/radix-tree.h>
+#include <linux/fs.h>
+#include <linux/pagemap.h>
#include <linux/zbud.h>

/*****************
@@ -211,17 +213,9 @@ static void get_zbud_page(struct zbud_header *zhdr)
*
* Returns 1 if page was freed and 0 otherwise.
*/
-static int put_zbud_page(struct zbud_header *zhdr)
+static inline int put_zbud_page(struct zbud_header *zhdr)
{
- struct page *page = virt_to_page(zhdr);
- VM_BUG_ON(!PageZbud(page));
-
- if (put_page_testzero(page)) {
- ClearPageZbud(page);
- free_hot_cold_page(page, 0);
- return 1;
- }
- return 0;
+ return zbud_put_page(virt_to_page(zhdr));
}

/*
@@ -261,6 +255,27 @@ static int put_map_count(struct zbud_header *zhdr, unsigned long handle)
}

/*
+ * Replaces item expected in radix tree by a new item, while holding pool lock.
+ */
+static int tree_replace(struct zbud_pool *pool,
+ unsigned long handle, void *expected, void *replacement)
+{
+ void **pslot;
+ void *item = NULL;
+
+ VM_BUG_ON(!expected);
+ VM_BUG_ON(!replacement);
+ pslot = radix_tree_lookup_slot(&pool->page_tree, handle);
+ if (pslot)
+ item = radix_tree_deref_slot_protected(pslot,
+ &pool->lock);
+ if (item != expected)
+ return -ENOENT;
+ radix_tree_replace_slot(pslot, replacement);
+ return 0;
+}
+
+/*
* Searches for zbud header in radix tree.
* Returns NULL if handle could not be found.
*
@@ -328,6 +343,110 @@ static int tree_insert_zbud_header(struct zbud_pool *pool,
return radix_tree_insert(&pool->page_tree, *handle, zhdr);
}

+/*
+ * Copy page during migration.
+ *
+ * Must be called under pool->lock.
+ */
+static void copy_zbud_page(struct page *newpage, struct page *page)
+{
+ void *to, *from;
+ SetPageZbud(newpage);
+ newpage->mapping = page->mapping;
+ set_page_private(newpage, page_private(page));
+ /* copy data */
+ to = kmap_atomic(newpage);
+ from = kmap_atomic(page);
+ memcpy(to, from, PAGE_SIZE);
+ kunmap_atomic(from);
+ kunmap_atomic(to);
+}
+
+/*
+ * Replaces old zbud header in radix tree with new, updates page
+ * count (puts old, gets new) and puts map_count for old page.
+ *
+ * The map_count for new page is not increased because it was already
+ * copied by copy_zbud_page().
+ *
+ * Must be called under pool->lock.
+ */
+static void replace_page_handles(struct zbud_pool *pool,
+ struct zbud_header *zhdr, struct zbud_header *newzhdr)
+{
+ if (zhdr->first_handle) {
+ int ret = tree_replace(pool, zhdr->first_handle, zhdr,
+ newzhdr);
+ VM_BUG_ON(ret);
+ get_zbud_page(newzhdr);
+ /* get_map_count() skipped, already copied */
+ put_map_count(zhdr, zhdr->first_handle);
+ put_zbud_page(zhdr);
+ }
+ if (zhdr->last_handle) {
+ int ret = tree_replace(pool, zhdr->last_handle, zhdr,
+ newzhdr);
+ VM_BUG_ON(ret);
+ get_zbud_page(newzhdr);
+ /* get_map_count() skipped, already copied */
+ put_map_count(zhdr, zhdr->last_handle);
+ put_zbud_page(zhdr);
+ }
+}
+
+
+/*
+ * Migrates zbud page.
+ * Returns 0 on success or -EAGAIN if page was dirtied by zbud_map during
+ * migration.
+ */
+static int zbud_migrate_page(struct address_space *mapping,
+ struct page *newpage, struct page *page,
+ enum migrate_mode mode)
+{
+ int rc = 0;
+ struct zbud_header *zhdr, *newzhdr;
+ struct zbud_pool *pool;
+ int expected_cnt = 1; /* one page reference from isolate */
+
+ VM_BUG_ON(!PageLocked(page));
+ zhdr = page_address(page);
+ newzhdr = page_address(newpage);
+ pool = (struct zbud_pool *)page_private(page);
+ VM_BUG_ON(!pool);
+
+ spin_lock(&pool->lock);
+ if (zhdr->first_handle)
+ expected_cnt++;
+ if (zhdr->last_handle)
+ expected_cnt++;
+
+ if (page_count(page) != expected_cnt) {
+ /* Still used by someone (paraller compaction) or dirtied
+ * by zbud_map() before we acquired spin_lock. */
+ rc = -EAGAIN;
+ goto out;
+ }
+ copy_zbud_page(newpage, page);
+ replace_page_handles(pool, zhdr, newzhdr);
+ /* Update lists */
+ list_replace(&zhdr->lru, &newzhdr->lru);
+ list_replace(&zhdr->buddy, &newzhdr->buddy);
+ memset(zhdr, 0, sizeof(struct zbud_header));
+
+out:
+ spin_unlock(&pool->lock);
+ return rc;
+}
+
+static const struct address_space_operations zbud_aops = {
+ .migratepage = zbud_migrate_page,
+};
+const struct address_space zbud_space = {
+ .a_ops = &zbud_aops,
+};
+EXPORT_SYMBOL_GPL(zbud_space);
+
/*****************
* API Functions
*****************/
@@ -370,6 +489,29 @@ void zbud_destroy_pool(struct zbud_pool *pool)
kfree(pool);
}

+/*
+ * zbud_put_page() - decreases ref count for zbud page
+ * @page: zbud page to put
+ *
+ * Page is freed if reference count reaches 0.
+ *
+ * Returns 1 if page was freed and 0 otherwise.
+ */
+int zbud_put_page(struct page *page)
+{
+ VM_BUG_ON(!PageZbud(page));
+
+ if (put_page_testzero(page)) {
+ VM_BUG_ON(PageLocked(page));
+ page->mapping = NULL;
+ set_page_private(page, 0);
+ ClearPageZbud(page);
+ free_hot_cold_page(page, 0);
+ return 1;
+ }
+ return 0;
+}
+
/**
* zbud_alloc() - allocates a region of a given size
* @pool: zbud pool from which to allocate
@@ -439,6 +581,8 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
*/
zhdr = init_zbud_page(page);
SetPageZbud(page);
+ page->mapping = (struct address_space *)&zbud_space;
+ set_page_private(page, (unsigned long)pool);
bud = FIRST;

err = tree_insert_zbud_header(pool, zhdr, &next_handle);
diff --git a/mm/zswap.c b/mm/zswap.c
index 706046a..ac8b768 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -665,8 +665,8 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,

/* store */
len = dlen + sizeof(struct zswap_header);
- ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN,
- &handle);
+ ret = zbud_alloc(tree->pool, len,
+ __GFP_NORETRY | __GFP_NOWARN | __GFP_MOVABLE, &handle);
if (ret == -ENOSPC) {
zswap_reject_compress_poor++;
goto freepage;
--
1.7.9.5

2013-08-30 08:44:21

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [RFC PATCH 3/4] mm: use indirect zbud handle and radix tree

Add radix tree to zbud pool and use indirect zbud handle as radix tree
index.

This allows migration of zbud pages while the handle used by zswap
remains untouched. Previously zbud handles were virtual addresses. This
imposed problem when page was migrated.

This change also exposes and fixes race condition between:
- zbud_reclaim_page() (called from zswap_frontswap_store())
and
- zbud_free() (called from zswap_frontswap_invalidate_page()).
This race was present already but additional locking and in-direct use
handle makes it frequent during high memory pressure.

Race typically looks like:
- thread 1: zbud_reclaim_page()
- thread 1: zswap_writeback_entry()
- zbud_map()
- thread 0: zswap_frontswap_invalidate_page()
- zbud_free()
- thread 1: read zswap_entry from memory or call zbud_unmap(), now on
invalid memory address

The zbud_reclaim_page() calls evict function (zswap_writeback_entry())
without holding pool lock. The zswap_writeback_entry() reads
zswap_header from memory obtained from zbud_map() without holding
tree's lock. If invalidate happens during this time the zbud_free()
will remove handle from the tree.

The new map_count fields in zbud_header try to address this problem by
protecting handles from freeing.
Also the call to zbud_unmap() in zswap_writeback_entry() was moved
further - when the tree's lock could be obtained.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
include/linux/zbud.h | 2 +-
mm/zbud.c | 313 +++++++++++++++++++++++++++++++++++++++++---------
mm/zswap.c | 24 +++-
3 files changed, 280 insertions(+), 59 deletions(-)

diff --git a/include/linux/zbud.h b/include/linux/zbud.h
index 2571a5c..12d72df 100644
--- a/include/linux/zbud.h
+++ b/include/linux/zbud.h
@@ -16,7 +16,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
void zbud_free(struct zbud_pool *pool, unsigned long handle);
int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries);
void *zbud_map(struct zbud_pool *pool, unsigned long handle);
-void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
+int zbud_unmap(struct zbud_pool *pool, unsigned long handle);
u64 zbud_get_pool_size(struct zbud_pool *pool);

#endif /* _ZBUD_H_ */
diff --git a/mm/zbud.c b/mm/zbud.c
index 9267cd9..5ff4ffa 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -50,6 +50,7 @@
#include <linux/preempt.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
+#include <linux/radix-tree.h>
#include <linux/zbud.h>

/*****************
@@ -69,6 +70,9 @@
#define NCHUNKS (PAGE_SIZE >> CHUNK_SHIFT)
#define ZHDR_SIZE_ALIGNED CHUNK_SIZE

+/* Empty handle, not yet allocated */
+#define HANDLE_EMPTY 0
+
/**
* struct zbud_pool - stores metadata for each zbud pool
* @lock: protects all pool fields and first|last_chunk fields of any
@@ -83,6 +87,10 @@
* @pages_nr: number of zbud pages in the pool.
* @ops: pointer to a structure of user defined operations specified at
* pool creation time.
+ * @page_tree: mapping handle->zbud_header for zbud_map and migration;
+ * many pools may exist so do not use the mapping->page_tree
+ * @last_handle: last handle calculated; used as starting point when searching
+ * for next handle in page_tree in zbud_alloc.
*
* This structure is allocated at pool creation time and maintains metadata
* pertaining to a particular zbud pool.
@@ -94,6 +102,8 @@ struct zbud_pool {
struct list_head lru;
u64 pages_nr;
struct zbud_ops *ops;
+ struct radix_tree_root page_tree;
+ unsigned long last_handle;
};

/*
@@ -103,12 +113,23 @@ struct zbud_pool {
* @lru: links the zbud page into the lru list in the pool
* @first_chunks: the size of the first buddy in chunks, 0 if free
* @last_chunks: the size of the last buddy in chunks, 0 if free
+ * @first_handle: handle to page stored in first buddy
+ * @last_handle: handle to page stored in last buddy
+ * @first_map_count: mapped count of page stored in first buddy
+ * @last_map_count: mapped count of page stored in last buddy
+ *
+ * When map count reaches zero the corresponding handle is removed
+ * from radix tree and cannot be used any longer.
*/
struct zbud_header {
struct list_head buddy;
struct list_head lru;
+ unsigned long first_handle;
+ unsigned long last_handle;
unsigned int first_chunks;
unsigned int last_chunks;
+ short int first_map_count;
+ short int last_map_count;
};

/*****************
@@ -135,38 +156,34 @@ static struct zbud_header *init_zbud_page(struct page *page)
struct zbud_header *zhdr = page_address(page);
zhdr->first_chunks = 0;
zhdr->last_chunks = 0;
+ zhdr->first_handle = HANDLE_EMPTY;
+ zhdr->last_handle = HANDLE_EMPTY;
+ zhdr->first_map_count = 0;
+ zhdr->last_map_count = 0;
INIT_LIST_HEAD(&zhdr->buddy);
INIT_LIST_HEAD(&zhdr->lru);
return zhdr;
}

/*
- * Encodes the handle of a particular buddy within a zbud page
+ * Encodes the address of a particular buddy within a zbud page
* Pool lock should be held as this function accesses first|last_chunks
*/
-static unsigned long encode_handle(struct zbud_header *zhdr, enum buddy bud)
+static unsigned long calc_addr(struct zbud_header *zhdr, unsigned long handle)
{
- unsigned long handle;
+ unsigned long addr;

/*
- * For now, the encoded handle is actually just the pointer to the data
- * but this might not always be the case. A little information hiding.
* Add CHUNK_SIZE to the handle if it is the first allocation to jump
* over the zbud header in the first chunk.
*/
- handle = (unsigned long)zhdr;
- if (bud == FIRST)
+ addr = (unsigned long)zhdr;
+ if (handle == zhdr->first_handle)
/* skip over zbud header */
- handle += ZHDR_SIZE_ALIGNED;
- else /* bud == LAST */
- handle += PAGE_SIZE - (zhdr->last_chunks << CHUNK_SHIFT);
- return handle;
-}
-
-/* Returns the zbud page where a given handle is stored */
-static struct zbud_header *handle_to_zbud_header(unsigned long handle)
-{
- return (struct zbud_header *)(handle & PAGE_MASK);
+ addr += ZHDR_SIZE_ALIGNED;
+ else /* handle == zhdr->last_handle */
+ addr += PAGE_SIZE - (zhdr->last_chunks << CHUNK_SHIFT);
+ return addr;
}

/* Returns the number of free chunks in a zbud page */
@@ -207,6 +224,109 @@ static int put_zbud_page(struct zbud_header *zhdr)
return 0;
}

+/*
+ * Increases map count for given handle.
+ *
+ * The map count is used to prevent any races between zbud_reclaim()
+ * and zbud_free().
+ *
+ * Must be called under pool->lock.
+ */
+static void get_map_count(struct zbud_header *zhdr, unsigned long handle)
+{
+ VM_BUG_ON(handle == HANDLE_EMPTY);
+ if (zhdr->first_handle == handle)
+ zhdr->first_map_count++;
+ else
+ zhdr->last_map_count++;
+}
+
+/*
+ * Decreases map count for given handle.
+ *
+ * Must be called under pool->lock.
+ *
+ * Returns 1 if no more map counts for handle exist and 0 otherwise.
+ */
+static int put_map_count(struct zbud_header *zhdr, unsigned long handle)
+{
+ VM_BUG_ON(handle == HANDLE_EMPTY);
+ if (zhdr->first_handle == handle) {
+ VM_BUG_ON(!zhdr->first_map_count);
+ return ((--zhdr->first_map_count) == 0);
+ } else {
+ VM_BUG_ON(!zhdr->last_map_count);
+ return ((--zhdr->last_map_count) == 0);
+ }
+}
+
+/*
+ * Searches for zbud header in radix tree.
+ * Returns NULL if handle could not be found.
+ *
+ * Handle could not be found in case of race between:
+ * - zswap_writeback_entry() (called from zswap_frontswap_store())
+ * and
+ * - zbud_free() (called from zswap_frontswap_invalidate())
+ *
+ */
+static struct zbud_header *handle_to_zbud_header(struct zbud_pool *pool,
+ unsigned long handle)
+{
+ struct zbud_header *zhdr;
+
+ rcu_read_lock();
+ zhdr = radix_tree_lookup(&pool->page_tree, handle);
+ rcu_read_unlock();
+ if (unlikely(!zhdr)) {
+ /* race: zswap_writeback_entry() and zswap_invalidate() */
+ pr_err("error: could not lookup handle %lu in tree\n", handle);
+ }
+ return zhdr;
+}
+
+/*
+ * Scans radix tree for next free handle and returns it.
+ * Returns HANDLE_EMPTY (0) if no free handle could be found.
+ *
+ * Must be called under pool->lock to be sure that there
+ * won't be other users of found handle.
+ */
+static unsigned long search_next_handle(struct zbud_pool *pool)
+{
+ unsigned long handle = pool->last_handle;
+ unsigned int retries = 0;
+ do {
+ /* 0 will be returned in case of search failure as we'll hit
+ * the max index.
+ */
+ handle = radix_tree_next_hole(&pool->page_tree,
+ handle + 1, ULONG_MAX);
+ } while ((handle == HANDLE_EMPTY) && (++retries < 2));
+ WARN((retries == 2), "%s: reached max number of retries (%u) when " \
+ "searching for next handle (last handle: %lu)\n",
+ __func__, retries, pool->last_handle);
+ return handle;
+}
+
+/*
+ * Searches for next free handle in page_tree and inserts zbud_header
+ * under it. Stores the handle under given pointer and updates
+ * pool->last_handle.
+ *
+ * Must be called under pool->lock.
+ *
+ * Returns 0 on success or negative otherwise.
+ */
+static int tree_insert_zbud_header(struct zbud_pool *pool,
+ struct zbud_header *zhdr, unsigned long *handle)
+{
+ *handle = search_next_handle(pool);
+ if (unlikely(*handle == HANDLE_EMPTY))
+ return -ENOSPC;
+ pool->last_handle = *handle;
+ return radix_tree_insert(&pool->page_tree, *handle, zhdr);
+}

/*****************
* API Functions
@@ -232,8 +352,10 @@ struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops)
INIT_LIST_HEAD(&pool->unbuddied[i]);
INIT_LIST_HEAD(&pool->buddied);
INIT_LIST_HEAD(&pool->lru);
+ INIT_RADIX_TREE(&pool->page_tree, GFP_ATOMIC);
pool->pages_nr = 0;
pool->ops = ops;
+ pool->last_handle = HANDLE_EMPTY;
return pool;
}

@@ -270,10 +392,11 @@ void zbud_destroy_pool(struct zbud_pool *pool)
int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
unsigned long *handle)
{
- int chunks, i;
+ int chunks, i, err;
struct zbud_header *zhdr = NULL;
enum buddy bud;
struct page *page;
+ unsigned long next_handle;

if (size <= 0 || gfp & __GFP_HIGHMEM)
return -EINVAL;
@@ -288,6 +411,11 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
if (!list_empty(&pool->unbuddied[i])) {
zhdr = list_first_entry(&pool->unbuddied[i],
struct zbud_header, buddy);
+ err = tree_insert_zbud_header(pool, zhdr, &next_handle);
+ if (unlikely(err)) {
+ spin_unlock(&pool->lock);
+ return err;
+ }
list_del(&zhdr->buddy);
if (zhdr->first_chunks == 0)
bud = FIRST;
@@ -313,11 +441,22 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
SetPageZbud(page);
bud = FIRST;

+ err = tree_insert_zbud_header(pool, zhdr, &next_handle);
+ if (unlikely(err)) {
+ put_zbud_page(zhdr);
+ spin_unlock(&pool->lock);
+ return err;
+ }
+
found:
- if (bud == FIRST)
+ if (bud == FIRST) {
zhdr->first_chunks = chunks;
- else
+ zhdr->first_handle = next_handle;
+ } else {
zhdr->last_chunks = chunks;
+ zhdr->last_handle = next_handle;
+ }
+ get_map_count(zhdr, next_handle);

if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
/* Add to unbuddied list */
@@ -333,12 +472,45 @@ found:
list_del(&zhdr->lru);
list_add(&zhdr->lru, &pool->lru);

- *handle = encode_handle(zhdr, bud);
+ *handle = next_handle;
spin_unlock(&pool->lock);

return 0;
}

+/*
+ * Real code for freeing handle.
+ * Removes handle from radix tree, empties chunks and handle in zbud header,
+ * removes buddy from lists and finally puts page.
+ */
+static void zbud_header_free(struct zbud_pool *pool, struct zbud_header *zhdr,
+ unsigned long handle)
+{
+ struct zbud_header *old = radix_tree_delete(&pool->page_tree, handle);
+ VM_BUG_ON(old != zhdr);
+
+ if (zhdr->first_handle == handle) {
+ zhdr->first_chunks = 0;
+ zhdr->first_handle = HANDLE_EMPTY;
+ } else {
+ zhdr->last_chunks = 0;
+ zhdr->last_handle = HANDLE_EMPTY;
+ }
+
+ /* Remove from existing buddy list */
+ list_del(&zhdr->buddy);
+
+ if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
+ list_del(&zhdr->lru);
+ pool->pages_nr--;
+ } else {
+ /* Add to unbuddied list */
+ int freechunks = num_free_chunks(zhdr);
+ list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
+ }
+ put_zbud_page(zhdr);
+}
+
/**
* zbud_free() - frees the allocation associated with the given handle
* @pool: pool in which the allocation resided
@@ -354,27 +526,18 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
struct zbud_header *zhdr;

spin_lock(&pool->lock);
- zhdr = handle_to_zbud_header(handle);
-
- /* If first buddy, handle will be page aligned */
- if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK)
- zhdr->last_chunks = 0;
- else
- zhdr->first_chunks = 0;
-
- /* Remove from existing buddy list */
- list_del(&zhdr->buddy);
+ zhdr = handle_to_zbud_header(pool, handle);
+ VM_BUG_ON(!zhdr);

- if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
- list_del(&zhdr->lru);
- pool->pages_nr--;
+ if (!put_map_count(zhdr, handle)) {
+ /*
+ * Still mapped, so just put page count and
+ * zbud_unmap() will free later.
+ */
+ put_zbud_page(zhdr);
} else {
- /* Add to unbuddied list */
- int freechunks = num_free_chunks(zhdr);
- list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
+ zbud_header_free(pool, zhdr, handle);
}
-
- put_zbud_page(zhdr);
spin_unlock(&pool->lock);
}

@@ -448,15 +611,11 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
/* Protect zbud page against free */
get_zbud_page(zhdr);
/*
- * We need encode the handles before unlocking, since we can
+ * Grab handles before unlocking, since we can
* race with free that will set (first|last)_chunks to 0
*/
- first_handle = 0;
- last_handle = 0;
- if (zhdr->first_chunks)
- first_handle = encode_handle(zhdr, FIRST);
- if (zhdr->last_chunks)
- last_handle = encode_handle(zhdr, LAST);
+ first_handle = zhdr->first_handle;
+ last_handle = zhdr->last_handle;
spin_unlock(&pool->lock);

/* Issue the eviction callback(s) */
@@ -482,27 +641,69 @@ next:
/**
* zbud_map() - maps the allocation associated with the given handle
* @pool: pool in which the allocation resides
- * @handle: handle associated with the allocation to be mapped
+ * @handle: handle to be mapped
*
- * While trivial for zbud, the mapping functions for others allocators
- * implementing this allocation API could have more complex information encoded
- * in the handle and could create temporary mappings to make the data
- * accessible to the user.
+ * Increases the page ref count and map count for handle.
*
- * Returns: a pointer to the mapped allocation
+ * Returns: a pointer to the mapped allocation or NULL if page could
+ * not be found in radix tree for given handle.
*/
void *zbud_map(struct zbud_pool *pool, unsigned long handle)
{
- return (void *)(handle);
+ struct zbud_header *zhdr;
+ void *addr;
+
+ /*
+ * Grab lock to prevent races with zbud_free or migration.
+ */
+ spin_lock(&pool->lock);
+ zhdr = handle_to_zbud_header(pool, handle);
+ if (!zhdr) {
+ spin_unlock(&pool->lock);
+ return NULL;
+ }
+ /*
+ * Get page so zbud_free or migration could detect that it is
+ * mapped by someone.
+ */
+ get_zbud_page(zhdr);
+ get_map_count(zhdr, handle);
+ addr = (void *)calc_addr(zhdr, handle);
+ spin_unlock(&pool->lock);
+
+ return addr;
}

/**
- * zbud_unmap() - maps the allocation associated with the given handle
+ * zbud_unmap() - unmaps the allocation associated with the given handle
* @pool: pool in which the allocation resides
- * @handle: handle associated with the allocation to be unmapped
+ * @handle: handle to be unmapped
+ *
+ * Decreases the page ref count and map count for handle.
+ * If map count reaches 0 then handle is freed (it must be freed because
+ * zbud_free() was called already on it) and -EFAULT is returned.
+ *
+ * Returns: 0 on successful unmap, negative on error indicating that handle
+ * was invalidated already by zbud_free() and cannot be used anymore
*/
-void zbud_unmap(struct zbud_pool *pool, unsigned long handle)
+int zbud_unmap(struct zbud_pool *pool, unsigned long handle)
{
+ struct zbud_header *zhdr;
+
+ zhdr = handle_to_zbud_header(pool, handle);
+ if (unlikely(!zhdr))
+ return -ENOENT;
+ spin_lock(&pool->lock);
+ if (put_map_count(zhdr, handle)) {
+ /* racing zbud_free() could not free the handle because
+ * we were still using it so it is our job to free */
+ zbud_header_free(pool, zhdr, handle);
+ spin_unlock(&pool->lock);
+ return -EFAULT;
+ }
+ put_zbud_page(zhdr);
+ spin_unlock(&pool->lock);
+ return 0;
}

/**
diff --git a/mm/zswap.c b/mm/zswap.c
index deda2b6..706046a 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -509,8 +509,15 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)

/* extract swpentry from data */
zhdr = zbud_map(pool, handle);
+ if (!zhdr) {
+ /*
+ * Race with zbud_free() (called from invalidate).
+ * Entry was invalidated already.
+ */
+ return 0;
+ }
swpentry = zhdr->swpentry; /* here */
- zbud_unmap(pool, handle);
+ VM_BUG_ON(swp_type(swpentry) >= MAX_SWAPFILES);
tree = zswap_trees[swp_type(swpentry)];
offset = swp_offset(swpentry);
BUG_ON(pool != tree->pool);
@@ -520,10 +527,20 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
entry = zswap_rb_search(&tree->rbroot, offset);
if (!entry) {
/* entry was invalidated */
+ zbud_unmap(pool, handle);
spin_unlock(&tree->lock);
return 0;
}
zswap_entry_get(entry);
+ /*
+ * Unmap zbud after obtaining tree lock and entry ref to prevent
+ * any races with invalidate.
+ */
+ if (zbud_unmap(pool, handle)) {
+ zswap_entry_put(entry);
+ spin_unlock(&tree->lock);
+ return 0;
+ }
spin_unlock(&tree->lock);
BUG_ON(offset != entry->offset);

@@ -544,6 +561,7 @@ static int zswap_writeback_entry(struct zbud_pool *pool, unsigned long handle)
dlen = PAGE_SIZE;
src = (u8 *)zbud_map(tree->pool, entry->handle) +
sizeof(struct zswap_header);
+ VM_BUG_ON(!src);
dst = kmap_atomic(page);
ret = zswap_comp_op(ZSWAP_COMPOP_DECOMPRESS, src,
entry->length, dst, &dlen);
@@ -661,8 +679,9 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
zhdr->swpentry = swp_entry(type, offset);
buf = (u8 *)(zhdr + 1);
memcpy(buf, dst, dlen);
- zbud_unmap(tree->pool, handle);
+ ret = zbud_unmap(tree->pool, handle);
put_cpu_var(zswap_dstmem);
+ VM_BUG_ON(ret);

/* populate entry */
entry->offset = offset;
@@ -726,6 +745,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
dlen = PAGE_SIZE;
src = (u8 *)zbud_map(tree->pool, entry->handle) +
sizeof(struct zswap_header);
+ VM_BUG_ON(!src);
dst = kmap_atomic(page);
ret = zswap_comp_op(ZSWAP_COMPOP_DECOMPRESS, src, entry->length,
dst, &dlen);
--
1.7.9.5

2013-08-30 08:44:53

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages

Use page reference counter for zbud pages. The ref counter replaces
zbud_header.under_reclaim flag and ensures that zbud page won't be freed
when zbud_free() is called during reclaim. It allows implementation of
additional reclaim paths.

The page count is incremented when:
- a handle is created and passed to zswap (in zbud_alloc()),
- user-supplied eviction callback is called (in zbud_reclaim_page()).

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Tomasz Stanislawski <[email protected]>
Reviewed-by: Bob Liu <[email protected]>
---
mm/zbud.c | 97 +++++++++++++++++++++++++++++++++----------------------------
1 file changed, 52 insertions(+), 45 deletions(-)

diff --git a/mm/zbud.c b/mm/zbud.c
index ad1e781..aa9a15c 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -109,7 +109,6 @@ struct zbud_header {
struct list_head lru;
unsigned int first_chunks;
unsigned int last_chunks;
- bool under_reclaim;
};

/*****************
@@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
zhdr->last_chunks = 0;
INIT_LIST_HEAD(&zhdr->buddy);
INIT_LIST_HEAD(&zhdr->lru);
- zhdr->under_reclaim = 0;
return zhdr;
}

-/* Resets the struct page fields and frees the page */
-static void free_zbud_page(struct zbud_header *zhdr)
-{
- __free_page(virt_to_page(zhdr));
-}
-
/*
* Encodes the handle of a particular buddy within a zbud page
* Pool lock should be held as this function accesses first|last_chunks
@@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
}

+/*
+ * Increases ref count for zbud page.
+ */
+static void get_zbud_page(struct zbud_header *zhdr)
+{
+ get_page(virt_to_page(zhdr));
+}
+
+/*
+ * Decreases ref count for zbud page and frees the page if it reaches 0
+ * (no external references, e.g. handles).
+ *
+ * Returns 1 if page was freed and 0 otherwise.
+ */
+static int put_zbud_page(struct zbud_header *zhdr)
+{
+ struct page *page = virt_to_page(zhdr);
+ if (put_page_testzero(page)) {
+ free_hot_cold_page(page, 0);
+ return 1;
+ }
+ return 0;
+}
+
+
/*****************
* API Functions
*****************/
@@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
unsigned long *handle)
{
- int chunks, i, freechunks;
+ int chunks, i;
struct zbud_header *zhdr = NULL;
enum buddy bud;
struct page *page;
@@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
bud = FIRST;
else
bud = LAST;
+ get_zbud_page(zhdr);
goto found;
}
}
@@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
return -ENOMEM;
spin_lock(&pool->lock);
pool->pages_nr++;
+ /*
+ * We will be using zhdr instead of page, so
+ * don't increase the page count.
+ */
zhdr = init_zbud_page(page);
bud = FIRST;

@@ -295,7 +317,7 @@ found:

if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
/* Add to unbuddied list */
- freechunks = num_free_chunks(zhdr);
+ int freechunks = num_free_chunks(zhdr);
list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
} else {
/* Add to buddied list */
@@ -326,7 +348,6 @@ found:
void zbud_free(struct zbud_pool *pool, unsigned long handle)
{
struct zbud_header *zhdr;
- int freechunks;

spin_lock(&pool->lock);
zhdr = handle_to_zbud_header(handle);
@@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
else
zhdr->first_chunks = 0;

- if (zhdr->under_reclaim) {
- /* zbud page is under reclaim, reclaim will free */
- spin_unlock(&pool->lock);
- return;
- }
-
/* Remove from existing buddy list */
list_del(&zhdr->buddy);

if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
- /* zbud page is empty, free */
list_del(&zhdr->lru);
- free_zbud_page(zhdr);
pool->pages_nr--;
} else {
/* Add to unbuddied list */
- freechunks = num_free_chunks(zhdr);
+ int freechunks = num_free_chunks(zhdr);
list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
}

+ put_zbud_page(zhdr);
spin_unlock(&pool->lock);
}

@@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
*/
int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
{
- int i, ret, freechunks;
+ int i, ret;
struct zbud_header *zhdr;
unsigned long first_handle = 0, last_handle = 0;

@@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
return -EINVAL;
}
for (i = 0; i < retries; i++) {
+ if (list_empty(&pool->lru)) {
+ /*
+ * LRU was emptied during evict calls in previous
+ * iteration but put_zbud_page() returned 0 meaning
+ * that someone still holds the page. This may
+ * happen when some other mm mechanism increased
+ * the page count.
+ * In such case we succedded with reclaim.
+ */
+ spin_unlock(&pool->lock);
+ return 0;
+ }
zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
+ /* Move this last element to beginning of LRU */
list_del(&zhdr->lru);
- list_del(&zhdr->buddy);
+ list_add(&zhdr->lru, &pool->lru);
/* Protect zbud page against free */
- zhdr->under_reclaim = true;
+ get_zbud_page(zhdr);
/*
* We need encode the handles before unlocking, since we can
* race with free that will set (first|last)_chunks to 0
@@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
goto next;
}
next:
- spin_lock(&pool->lock);
- zhdr->under_reclaim = false;
- if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
- /*
- * Both buddies are now free, free the zbud page and
- * return success.
- */
- free_zbud_page(zhdr);
- pool->pages_nr--;
- spin_unlock(&pool->lock);
+ if (put_zbud_page(zhdr))
return 0;
- } else if (zhdr->first_chunks == 0 ||
- zhdr->last_chunks == 0) {
- /* add to unbuddied list */
- freechunks = num_free_chunks(zhdr);
- list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
- } else {
- /* add to buddied list */
- list_add(&zhdr->buddy, &pool->buddied);
- }
-
- /* add to beginning of LRU */
- list_add(&zhdr->lru, &pool->lru);
+ spin_lock(&pool->lock);
}
spin_unlock(&pool->lock);
return -EAGAIN;
--
1.7.9.5

2013-09-06 17:30:50

by Seth Jennings

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] mm: migrate zbud pages

On Fri, Aug 30, 2013 at 10:42:52AM +0200, Krzysztof Kozlowski wrote:
> Hi,
>
> Currently zbud pages are not movable and they cannot be allocated from CMA
> region. These patches add migration of zbud pages.

Hey Krzysztof,

Thanks for the patches. I haven't had time to look at them yet but wanted to
let you know that I plan to early next week.

Seth

2013-09-08 09:05:29

by Bob Liu

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages

Hi Krzysztof,

On 08/30/2013 04:42 PM, Krzysztof Kozlowski wrote:
> Use page reference counter for zbud pages. The ref counter replaces
> zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> when zbud_free() is called during reclaim. It allows implementation of
> additional reclaim paths.
>
> The page count is incremented when:
> - a handle is created and passed to zswap (in zbud_alloc()),
> - user-supplied eviction callback is called (in zbud_reclaim_page()).
>
> Signed-off-by: Krzysztof Kozlowski <[email protected]>
> Signed-off-by: Tomasz Stanislawski <[email protected]>
> Reviewed-by: Bob Liu <[email protected]>

AFAIR, the previous version you sent out has a function called
rebalance_lists() which I think is a good clean up.
But I didn't see that function any more in this version.

Thanks,
-Bob


> ---
> mm/zbud.c | 97 +++++++++++++++++++++++++++++++++----------------------------
> 1 file changed, 52 insertions(+), 45 deletions(-)
>
> diff --git a/mm/zbud.c b/mm/zbud.c
> index ad1e781..aa9a15c 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -109,7 +109,6 @@ struct zbud_header {
> struct list_head lru;
> unsigned int first_chunks;
> unsigned int last_chunks;
> - bool under_reclaim;
> };
>
> /*****************
> @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> zhdr->last_chunks = 0;
> INIT_LIST_HEAD(&zhdr->buddy);
> INIT_LIST_HEAD(&zhdr->lru);
> - zhdr->under_reclaim = 0;
> return zhdr;
> }
>
> -/* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> -{
> - __free_page(virt_to_page(zhdr));
> -}
> -
> /*
> * Encodes the handle of a particular buddy within a zbud page
> * Pool lock should be held as this function accesses first|last_chunks
> @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> }
>
> +/*
> + * Increases ref count for zbud page.
> + */
> +static void get_zbud_page(struct zbud_header *zhdr)
> +{
> + get_page(virt_to_page(zhdr));
> +}
> +
> +/*
> + * Decreases ref count for zbud page and frees the page if it reaches 0
> + * (no external references, e.g. handles).
> + *
> + * Returns 1 if page was freed and 0 otherwise.
> + */
> +static int put_zbud_page(struct zbud_header *zhdr)
> +{
> + struct page *page = virt_to_page(zhdr);
> + if (put_page_testzero(page)) {
> + free_hot_cold_page(page, 0);
> + return 1;
> + }
> + return 0;
> +}
> +
> +
> /*****************
> * API Functions
> *****************/
> @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> unsigned long *handle)
> {
> - int chunks, i, freechunks;
> + int chunks, i;
> struct zbud_header *zhdr = NULL;
> enum buddy bud;
> struct page *page;
> @@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> bud = FIRST;
> else
> bud = LAST;
> + get_zbud_page(zhdr);
> goto found;
> }
> }
> @@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> return -ENOMEM;
> spin_lock(&pool->lock);
> pool->pages_nr++;
> + /*
> + * We will be using zhdr instead of page, so
> + * don't increase the page count.
> + */
> zhdr = init_zbud_page(page);
> bud = FIRST;
>
> @@ -295,7 +317,7 @@ found:
>
> if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
> /* Add to unbuddied list */
> - freechunks = num_free_chunks(zhdr);
> + int freechunks = num_free_chunks(zhdr);
> list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> } else {
> /* Add to buddied list */
> @@ -326,7 +348,6 @@ found:
> void zbud_free(struct zbud_pool *pool, unsigned long handle)
> {
> struct zbud_header *zhdr;
> - int freechunks;
>
> spin_lock(&pool->lock);
> zhdr = handle_to_zbud_header(handle);
> @@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> else
> zhdr->first_chunks = 0;
>
> - if (zhdr->under_reclaim) {
> - /* zbud page is under reclaim, reclaim will free */
> - spin_unlock(&pool->lock);
> - return;
> - }
> -
> /* Remove from existing buddy list */
> list_del(&zhdr->buddy);
>
> if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> - /* zbud page is empty, free */
> list_del(&zhdr->lru);
> - free_zbud_page(zhdr);
> pool->pages_nr--;
> } else {
> /* Add to unbuddied list */
> - freechunks = num_free_chunks(zhdr);
> + int freechunks = num_free_chunks(zhdr);
> list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> }
>
> + put_zbud_page(zhdr);
> spin_unlock(&pool->lock);
> }
>
> @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> */
> int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> {
> - int i, ret, freechunks;
> + int i, ret;
> struct zbud_header *zhdr;
> unsigned long first_handle = 0, last_handle = 0;
>
> @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> return -EINVAL;
> }
> for (i = 0; i < retries; i++) {
> + if (list_empty(&pool->lru)) {
> + /*
> + * LRU was emptied during evict calls in previous
> + * iteration but put_zbud_page() returned 0 meaning
> + * that someone still holds the page. This may
> + * happen when some other mm mechanism increased
> + * the page count.
> + * In such case we succedded with reclaim.
> + */
> + spin_unlock(&pool->lock);
> + return 0;
> + }
> zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> + /* Move this last element to beginning of LRU */
> list_del(&zhdr->lru);
> - list_del(&zhdr->buddy);
> + list_add(&zhdr->lru, &pool->lru);
> /* Protect zbud page against free */
> - zhdr->under_reclaim = true;
> + get_zbud_page(zhdr);
> /*
> * We need encode the handles before unlocking, since we can
> * race with free that will set (first|last)_chunks to 0
> @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> goto next;
> }
> next:
> - spin_lock(&pool->lock);
> - zhdr->under_reclaim = false;
> - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> - /*
> - * Both buddies are now free, free the zbud page and
> - * return success.
> - */
> - free_zbud_page(zhdr);
> - pool->pages_nr--;
> - spin_unlock(&pool->lock);
> + if (put_zbud_page(zhdr))
> return 0;
> - } else if (zhdr->first_chunks == 0 ||
> - zhdr->last_chunks == 0) {
> - /* add to unbuddied list */
> - freechunks = num_free_chunks(zhdr);
> - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> - } else {
> - /* add to buddied list */
> - list_add(&zhdr->buddy, &pool->buddied);
> - }
> -
> - /* add to beginning of LRU */
> - list_add(&zhdr->lru, &pool->lru);
> + spin_lock(&pool->lock);
> }
> spin_unlock(&pool->lock);
> return -EAGAIN;
>

2013-09-09 06:14:52

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages

Hi Bob,


On nie, 2013-09-08 at 17:04 +0800, Bob Liu wrote:
> Hi Krzysztof,
>
> On 08/30/2013 04:42 PM, Krzysztof Kozlowski wrote:
> > Use page reference counter for zbud pages. The ref counter replaces
> > zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> > when zbud_free() is called during reclaim. It allows implementation of
> > additional reclaim paths.
> >
> > The page count is incremented when:
> > - a handle is created and passed to zswap (in zbud_alloc()),
> > - user-supplied eviction callback is called (in zbud_reclaim_page()).
> >
> > Signed-off-by: Krzysztof Kozlowski <[email protected]>
> > Signed-off-by: Tomasz Stanislawski <[email protected]>
> > Reviewed-by: Bob Liu <[email protected]>
>
> AFAIR, the previous version you sent out has a function called
> rebalance_lists() which I think is a good clean up.
> But I didn't see that function any more in this version.

Yes, that function was added because similar code was present in
zbud_free/zbud_alloc/zbud_reclaim_page. I removed it because it turned
out that there is no benefit in generalizing this code. Seth found an
issue in this function (zbud page was re-inserted in to LRU when
zbud_free() was called on one buddy). Fixing this issue added more if-s
and more code thus enlarging the rebalance_lists() function.

Best regards,
Krzysztof

> Thanks,
> -Bob
>
>
> > ---
> > mm/zbud.c | 97 +++++++++++++++++++++++++++++++++----------------------------
> > 1 file changed, 52 insertions(+), 45 deletions(-)
> >
> > diff --git a/mm/zbud.c b/mm/zbud.c
> > index ad1e781..aa9a15c 100644
> > --- a/mm/zbud.c
> > +++ b/mm/zbud.c
> > @@ -109,7 +109,6 @@ struct zbud_header {
> > struct list_head lru;
> > unsigned int first_chunks;
> > unsigned int last_chunks;
> > - bool under_reclaim;
> > };
> >
> > /*****************
> > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> > zhdr->last_chunks = 0;
> > INIT_LIST_HEAD(&zhdr->buddy);
> > INIT_LIST_HEAD(&zhdr->lru);
> > - zhdr->under_reclaim = 0;
> > return zhdr;
> > }
> >
> > -/* Resets the struct page fields and frees the page */
> > -static void free_zbud_page(struct zbud_header *zhdr)
> > -{
> > - __free_page(virt_to_page(zhdr));
> > -}
> > -
> > /*
> > * Encodes the handle of a particular buddy within a zbud page
> > * Pool lock should be held as this function accesses first|last_chunks
> > @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> > return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> > }
> >
> > +/*
> > + * Increases ref count for zbud page.
> > + */
> > +static void get_zbud_page(struct zbud_header *zhdr)
> > +{
> > + get_page(virt_to_page(zhdr));
> > +}
> > +
> > +/*
> > + * Decreases ref count for zbud page and frees the page if it reaches 0
> > + * (no external references, e.g. handles).
> > + *
> > + * Returns 1 if page was freed and 0 otherwise.
> > + */
> > +static int put_zbud_page(struct zbud_header *zhdr)
> > +{
> > + struct page *page = virt_to_page(zhdr);
> > + if (put_page_testzero(page)) {
> > + free_hot_cold_page(page, 0);
> > + return 1;
> > + }
> > + return 0;
> > +}
> > +
> > +
> > /*****************
> > * API Functions
> > *****************/
> > @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> > int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> > unsigned long *handle)
> > {
> > - int chunks, i, freechunks;
> > + int chunks, i;
> > struct zbud_header *zhdr = NULL;
> > enum buddy bud;
> > struct page *page;
> > @@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> > bud = FIRST;
> > else
> > bud = LAST;
> > + get_zbud_page(zhdr);
> > goto found;
> > }
> > }
> > @@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> > return -ENOMEM;
> > spin_lock(&pool->lock);
> > pool->pages_nr++;
> > + /*
> > + * We will be using zhdr instead of page, so
> > + * don't increase the page count.
> > + */
> > zhdr = init_zbud_page(page);
> > bud = FIRST;
> >
> > @@ -295,7 +317,7 @@ found:
> >
> > if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
> > /* Add to unbuddied list */
> > - freechunks = num_free_chunks(zhdr);
> > + int freechunks = num_free_chunks(zhdr);
> > list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> > } else {
> > /* Add to buddied list */
> > @@ -326,7 +348,6 @@ found:
> > void zbud_free(struct zbud_pool *pool, unsigned long handle)
> > {
> > struct zbud_header *zhdr;
> > - int freechunks;
> >
> > spin_lock(&pool->lock);
> > zhdr = handle_to_zbud_header(handle);
> > @@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> > else
> > zhdr->first_chunks = 0;
> >
> > - if (zhdr->under_reclaim) {
> > - /* zbud page is under reclaim, reclaim will free */
> > - spin_unlock(&pool->lock);
> > - return;
> > - }
> > -
> > /* Remove from existing buddy list */
> > list_del(&zhdr->buddy);
> >
> > if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > - /* zbud page is empty, free */
> > list_del(&zhdr->lru);
> > - free_zbud_page(zhdr);
> > pool->pages_nr--;
> > } else {
> > /* Add to unbuddied list */
> > - freechunks = num_free_chunks(zhdr);
> > + int freechunks = num_free_chunks(zhdr);
> > list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> > }
> >
> > + put_zbud_page(zhdr);
> > spin_unlock(&pool->lock);
> > }
> >
> > @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> > */
> > int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> > {
> > - int i, ret, freechunks;
> > + int i, ret;
> > struct zbud_header *zhdr;
> > unsigned long first_handle = 0, last_handle = 0;
> >
> > @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> > return -EINVAL;
> > }
> > for (i = 0; i < retries; i++) {
> > + if (list_empty(&pool->lru)) {
> > + /*
> > + * LRU was emptied during evict calls in previous
> > + * iteration but put_zbud_page() returned 0 meaning
> > + * that someone still holds the page. This may
> > + * happen when some other mm mechanism increased
> > + * the page count.
> > + * In such case we succedded with reclaim.
> > + */
> > + spin_unlock(&pool->lock);
> > + return 0;
> > + }
> > zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > + /* Move this last element to beginning of LRU */
> > list_del(&zhdr->lru);
> > - list_del(&zhdr->buddy);
> > + list_add(&zhdr->lru, &pool->lru);
> > /* Protect zbud page against free */
> > - zhdr->under_reclaim = true;
> > + get_zbud_page(zhdr);
> > /*
> > * We need encode the handles before unlocking, since we can
> > * race with free that will set (first|last)_chunks to 0
> > @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> > goto next;
> > }
> > next:
> > - spin_lock(&pool->lock);
> > - zhdr->under_reclaim = false;
> > - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > - /*
> > - * Both buddies are now free, free the zbud page and
> > - * return success.
> > - */
> > - free_zbud_page(zhdr);
> > - pool->pages_nr--;
> > - spin_unlock(&pool->lock);
> > + if (put_zbud_page(zhdr))
> > return 0;
> > - } else if (zhdr->first_chunks == 0 ||
> > - zhdr->last_chunks == 0) {
> > - /* add to unbuddied list */
> > - freechunks = num_free_chunks(zhdr);
> > - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> > - } else {
> > - /* add to buddied list */
> > - list_add(&zhdr->buddy, &pool->buddied);
> > - }
> > -
> > - /* add to beginning of LRU */
> > - list_add(&zhdr->lru, &pool->lru);
> > + spin_lock(&pool->lock);
> > }
> > spin_unlock(&pool->lock);
> > return -EAGAIN;
> >

2013-09-09 06:55:18

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] mm: migrate zbud pages

On piÄ…, 2013-09-06 at 12:30 -0500, Seth Jennings wrote:
> On Fri, Aug 30, 2013 at 10:42:52AM +0200, Krzysztof Kozlowski wrote:
> > Hi,
> >
> > Currently zbud pages are not movable and they cannot be allocated from CMA
> > region. These patches add migration of zbud pages.
>
> Hey Krzysztof,
>
> Thanks for the patches. I haven't had time to look at them yet but wanted to
> let you know that I plan to early next week.
>
> Seth

Great, thanks! Patches rebase and builds cleanly on current
mainline (v3.11-7890-ge5c832d).


Best regards,
Krzysztof

2013-09-09 19:47:59

by Seth Jennings

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages

On Fri, Aug 30, 2013 at 10:42:53AM +0200, Krzysztof Kozlowski wrote:
> Use page reference counter for zbud pages. The ref counter replaces
> zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> when zbud_free() is called during reclaim. It allows implementation of
> additional reclaim paths.

I like this idea.

>
> The page count is incremented when:
> - a handle is created and passed to zswap (in zbud_alloc()),
> - user-supplied eviction callback is called (in zbud_reclaim_page()).
>
> Signed-off-by: Krzysztof Kozlowski <[email protected]>
> Signed-off-by: Tomasz Stanislawski <[email protected]>
> Reviewed-by: Bob Liu <[email protected]>
> ---
> mm/zbud.c | 97 +++++++++++++++++++++++++++++++++----------------------------
> 1 file changed, 52 insertions(+), 45 deletions(-)
>
> diff --git a/mm/zbud.c b/mm/zbud.c
> index ad1e781..aa9a15c 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -109,7 +109,6 @@ struct zbud_header {
> struct list_head lru;
> unsigned int first_chunks;
> unsigned int last_chunks;
> - bool under_reclaim;
> };
>
> /*****************
> @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> zhdr->last_chunks = 0;
> INIT_LIST_HEAD(&zhdr->buddy);
> INIT_LIST_HEAD(&zhdr->lru);
> - zhdr->under_reclaim = 0;
> return zhdr;
> }
>
> -/* Resets the struct page fields and frees the page */
> -static void free_zbud_page(struct zbud_header *zhdr)
> -{
> - __free_page(virt_to_page(zhdr));
> -}
> -
> /*
> * Encodes the handle of a particular buddy within a zbud page
> * Pool lock should be held as this function accesses first|last_chunks
> @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> }
>
> +/*
> + * Increases ref count for zbud page.
> + */
> +static void get_zbud_page(struct zbud_header *zhdr)
> +{
> + get_page(virt_to_page(zhdr));
> +}
> +
> +/*
> + * Decreases ref count for zbud page and frees the page if it reaches 0
> + * (no external references, e.g. handles).
> + *
> + * Returns 1 if page was freed and 0 otherwise.
> + */
> +static int put_zbud_page(struct zbud_header *zhdr)
> +{
> + struct page *page = virt_to_page(zhdr);
> + if (put_page_testzero(page)) {
> + free_hot_cold_page(page, 0);
> + return 1;
> + }
> + return 0;
> +}
> +
> +
> /*****************
> * API Functions
> *****************/
> @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> unsigned long *handle)
> {
> - int chunks, i, freechunks;
> + int chunks, i;

This change (make freechunks a block local variable) is unrelated to the
patch description and should be its own patch.

> struct zbud_header *zhdr = NULL;
> enum buddy bud;
> struct page *page;
> @@ -273,6 +290,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> bud = FIRST;
> else
> bud = LAST;
> + get_zbud_page(zhdr);
> goto found;
> }
> }
> @@ -284,6 +302,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> return -ENOMEM;
> spin_lock(&pool->lock);
> pool->pages_nr++;
> + /*
> + * We will be using zhdr instead of page, so
> + * don't increase the page count.
> + */
> zhdr = init_zbud_page(page);
> bud = FIRST;
>
> @@ -295,7 +317,7 @@ found:
>
> if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) {
> /* Add to unbuddied list */
> - freechunks = num_free_chunks(zhdr);
> + int freechunks = num_free_chunks(zhdr);

Same here.

> list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> } else {
> /* Add to buddied list */
> @@ -326,7 +348,6 @@ found:
> void zbud_free(struct zbud_pool *pool, unsigned long handle)
> {
> struct zbud_header *zhdr;
> - int freechunks;

And here.

>
> spin_lock(&pool->lock);
> zhdr = handle_to_zbud_header(handle);
> @@ -337,26 +358,19 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> else
> zhdr->first_chunks = 0;
>
> - if (zhdr->under_reclaim) {
> - /* zbud page is under reclaim, reclaim will free */
> - spin_unlock(&pool->lock);
> - return;
> - }
> -
> /* Remove from existing buddy list */
> list_del(&zhdr->buddy);
>
> if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> - /* zbud page is empty, free */
> list_del(&zhdr->lru);
> - free_zbud_page(zhdr);
> pool->pages_nr--;
> } else {
> /* Add to unbuddied list */
> - freechunks = num_free_chunks(zhdr);
> + int freechunks = num_free_chunks(zhdr);

Last one.

> list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> }
>
> + put_zbud_page(zhdr);
> spin_unlock(&pool->lock);
> }
>
> @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> */
> int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> {
> - int i, ret, freechunks;
> + int i, ret;
> struct zbud_header *zhdr;
> unsigned long first_handle = 0, last_handle = 0;
>
> @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> return -EINVAL;
> }
> for (i = 0; i < retries; i++) {
> + if (list_empty(&pool->lru)) {
> + /*
> + * LRU was emptied during evict calls in previous
> + * iteration but put_zbud_page() returned 0 meaning
> + * that someone still holds the page. This may
> + * happen when some other mm mechanism increased
> + * the page count.
> + * In such case we succedded with reclaim.
> + */
> + spin_unlock(&pool->lock);
> + return 0;
> + }
> zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> + /* Move this last element to beginning of LRU */
> list_del(&zhdr->lru);
> - list_del(&zhdr->buddy);
> + list_add(&zhdr->lru, &pool->lru);

This adds the entry back to the lru list, but the entry is never removed
from the list. I'm not sure how this could work.

> /* Protect zbud page against free */
> - zhdr->under_reclaim = true;
> + get_zbud_page(zhdr);
> /*
> * We need encode the handles before unlocking, since we can
> * race with free that will set (first|last)_chunks to 0
> @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> goto next;
> }
> next:
> - spin_lock(&pool->lock);
> - zhdr->under_reclaim = false;
> - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> - /*
> - * Both buddies are now free, free the zbud page and
> - * return success.
> - */
> - free_zbud_page(zhdr);
> - pool->pages_nr--;
> - spin_unlock(&pool->lock);
> + if (put_zbud_page(zhdr))

There is a single put_zbud_page() regardless of result of the eviction
attempts. What if both buddies of a buddied zbud page are evicted?
Then you leak the zbud page. What if the eviction in an unbuddied zbud
page failed? Then you prematurely free the page and crash later.

Seth

> return 0;
> - } else if (zhdr->first_chunks == 0 ||
> - zhdr->last_chunks == 0) {
> - /* add to unbuddied list */
> - freechunks = num_free_chunks(zhdr);
> - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> - } else {
> - /* add to buddied list */
> - list_add(&zhdr->buddy, &pool->buddied);
> - }
> -
> - /* add to beginning of LRU */
> - list_add(&zhdr->lru, &pool->lru);
> + spin_lock(&pool->lock);
> }
> spin_unlock(&pool->lock);
> return -EAGAIN;
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2013-09-09 19:53:45

by Seth Jennings

[permalink] [raw]
Subject: Re: [RFC PATCH 2/4] mm: use mapcount for identifying zbud pages

On Fri, Aug 30, 2013 at 10:42:54AM +0200, Krzysztof Kozlowski wrote:
> Currently zbud pages do not have any flags set so it is not possible to
> identify them during migration or compaction.
>
> Implement PageZbud() by comparing page->_mapcount to -127 to distinguish
> pages allocated by zbud. Just like PageBuddy() is implemented.
>
> Signed-off-by: Krzysztof Kozlowski <[email protected]>

Reviewed-by: Seth Jennings <[email protected]>

> ---
> include/linux/mm.h | 23 +++++++++++++++++++++++
> mm/zbud.c | 4 ++++
> 2 files changed, 27 insertions(+)

2013-09-10 07:16:57

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages

Hi,

On pon, 2013-09-09 at 14:47 -0500, Seth Jennings wrote:
> On Fri, Aug 30, 2013 at 10:42:53AM +0200, Krzysztof Kozlowski wrote:
> > Use page reference counter for zbud pages. The ref counter replaces
> > zbud_header.under_reclaim flag and ensures that zbud page won't be freed
> > when zbud_free() is called during reclaim. It allows implementation of
> > additional reclaim paths.
>
> I like this idea.
>
> >
> > The page count is incremented when:
> > - a handle is created and passed to zswap (in zbud_alloc()),
> > - user-supplied eviction callback is called (in zbud_reclaim_page()).
> >
> > Signed-off-by: Krzysztof Kozlowski <[email protected]>
> > Signed-off-by: Tomasz Stanislawski <[email protected]>
> > Reviewed-by: Bob Liu <[email protected]>
> > ---
> > mm/zbud.c | 97 +++++++++++++++++++++++++++++++++----------------------------
> > 1 file changed, 52 insertions(+), 45 deletions(-)
> >
> > diff --git a/mm/zbud.c b/mm/zbud.c
> > index ad1e781..aa9a15c 100644
> > --- a/mm/zbud.c
> > +++ b/mm/zbud.c
> > @@ -109,7 +109,6 @@ struct zbud_header {
> > struct list_head lru;
> > unsigned int first_chunks;
> > unsigned int last_chunks;
> > - bool under_reclaim;
> > };
> >
> > /*****************
> > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page)
> > zhdr->last_chunks = 0;
> > INIT_LIST_HEAD(&zhdr->buddy);
> > INIT_LIST_HEAD(&zhdr->lru);
> > - zhdr->under_reclaim = 0;
> > return zhdr;
> > }
> >
> > -/* Resets the struct page fields and frees the page */
> > -static void free_zbud_page(struct zbud_header *zhdr)
> > -{
> > - __free_page(virt_to_page(zhdr));
> > -}
> > -
> > /*
> > * Encodes the handle of a particular buddy within a zbud page
> > * Pool lock should be held as this function accesses first|last_chunks
> > @@ -188,6 +180,31 @@ static int num_free_chunks(struct zbud_header *zhdr)
> > return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1;
> > }
> >
> > +/*
> > + * Increases ref count for zbud page.
> > + */
> > +static void get_zbud_page(struct zbud_header *zhdr)
> > +{
> > + get_page(virt_to_page(zhdr));
> > +}
> > +
> > +/*
> > + * Decreases ref count for zbud page and frees the page if it reaches 0
> > + * (no external references, e.g. handles).
> > + *
> > + * Returns 1 if page was freed and 0 otherwise.
> > + */
> > +static int put_zbud_page(struct zbud_header *zhdr)
> > +{
> > + struct page *page = virt_to_page(zhdr);
> > + if (put_page_testzero(page)) {
> > + free_hot_cold_page(page, 0);
> > + return 1;
> > + }
> > + return 0;
> > +}
> > +
> > +
> > /*****************
> > * API Functions
> > *****************/
> > @@ -250,7 +267,7 @@ void zbud_destroy_pool(struct zbud_pool *pool)
> > int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp,
> > unsigned long *handle)
> > {
> > - int chunks, i, freechunks;
> > + int chunks, i;
>
> This change (make freechunks a block local variable) is unrelated to the
> patch description and should be its own patch.

Sure, I will split this into 2 patches.


[...]
> > list_add(&zhdr->buddy, &pool->unbuddied[freechunks]);
> > }
> >
> > + put_zbud_page(zhdr);
> > spin_unlock(&pool->lock);
> > }
> >
> > @@ -400,7 +414,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle)
> > */
> > int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> > {
> > - int i, ret, freechunks;
> > + int i, ret;
> > struct zbud_header *zhdr;
> > unsigned long first_handle = 0, last_handle = 0;
> >
> > @@ -411,11 +425,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> > return -EINVAL;
> > }
> > for (i = 0; i < retries; i++) {
> > + if (list_empty(&pool->lru)) {
> > + /*
> > + * LRU was emptied during evict calls in previous
> > + * iteration but put_zbud_page() returned 0 meaning
> > + * that someone still holds the page. This may
> > + * happen when some other mm mechanism increased
> > + * the page count.
> > + * In such case we succedded with reclaim.
> > + */
> > + spin_unlock(&pool->lock);
> > + return 0;
> > + }
> > zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru);
> > + /* Move this last element to beginning of LRU */
> > list_del(&zhdr->lru);
> > - list_del(&zhdr->buddy);
> > + list_add(&zhdr->lru, &pool->lru);
>
> This adds the entry back to the lru list, but the entry is never removed
> from the list. I'm not sure how this could work.

The entry will be removed from LRU and un/buddied list in zbud_free()
called from eviction handler.

So this actually works well however I wonder if this may be a source of
race conditions because during the call to eviction handler, the zbud
header is still present on buddied list (till call to zbud_free()).

Another issue is that I haven't update the documentation for
zbud_reclaim_page(). I fix this in next version.


> > /* Protect zbud page against free */
> > - zhdr->under_reclaim = true;
> > + get_zbud_page(zhdr);
> > /*
> > * We need encode the handles before unlocking, since we can
> > * race with free that will set (first|last)_chunks to 0
> > @@ -440,29 +467,9 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
> > goto next;
> > }
> > next:
> > - spin_lock(&pool->lock);
> > - zhdr->under_reclaim = false;
> > - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
> > - /*
> > - * Both buddies are now free, free the zbud page and
> > - * return success.
> > - */
> > - free_zbud_page(zhdr);
> > - pool->pages_nr--;
> > - spin_unlock(&pool->lock);
> > + if (put_zbud_page(zhdr))
>
> There is a single put_zbud_page() regardless of result of the eviction
> attempts. What if both buddies of a buddied zbud page are evicted?
> Then you leak the zbud page. What if the eviction in an unbuddied zbud
> page failed? Then you prematurely free the page and crash later.

This single put_zbud_page() puts the reference obtained few lines
earlier.

If one or two buddies are evicted then the eviction handler should call
zbud_free(). zbud_free() will put the initial ref count (obtained from
zbud_alloc()).

If the eviction handler failed then zbud_free() won't be called so the
initial ref count still be valid and this put_zbud_page() won't free the
page.

Let me show the flow of ref counts:
1. zbud_alloc() for first buddy, ref count = 1
2. zbud_alloc() for last buddy, ref count = 2
3. zbud_reclaim_page() start, ref count = 3
4. calling user eviction handler
5. zbud_free() for first buddy (called from handler), ref count = 2
6. zbud_free() for last buddy (called from handler), ref count = 1
7. zbud_reclaim_page() end, ref count = 0 (free the page)

If eviction handler fails at step 5 or 6, then the ref count won't drop
to 0 as zswap still holds the initial reference to handle.


Thanks for comments. I appreciate them very much.


Best regards,
Krzysztof