Recently, I got many reports about perfermance degradation
in embedded system(Android mobile phone, webOS TV and so on)
and failed to fork easily.
The problem was fragmentation caused by zram and GPU driver
pages. Their pages cannot be migrated so compaction cannot
work well, either so reclaimer ends up shrinking all of working
set pages. It made system very slow and even to fail to fork
easily.
Other pain point is that they cannot work with CMA.
Most of CMA memory space could be idle(ie, it could be used
for movable pages unless driver is using) but if driver(i.e.,
zram) cannot migrate his page, that memory space could be
wasted. In our product which has big CMA memory, it reclaims
zones too exccessively although there are lots of free space
in CMA so system was very slow easily.
To solve these problem, this patch try to add facility to
migrate non-lru pages via introducing new friend functions
of migratepage in address_space_operation and new page flags.
(isolate_page, putback_page)
(PG_movable, PG_isolated)
For details, please read description in
"mm/compaction: support non-lru movable page migration".
Originally, Gioh Kim tried to support this feature but he moved
so I took over the work. But I took many code from his work and
changed a little bit.
Thanks, Gioh!
And I should mention Konstantin Khlebnikov. He really heped Gioh
at that time so he should deserve to have many credit, too.
Thanks, Konstantin!
This patchset consists of five parts
1. clean up migration
mm: use put_page to free page instead of putback_lru_page
2. zsmalloc clean-up for preparing page migration
zsmalloc: use first_page rather than page
zsmalloc: clean up many BUG_ON
zsmalloc: reordering function parameter
zsmalloc: remove unused pool param in obj_free
zsmalloc: keep max_object in size_class
zsmalloc: squeeze inuse into page->mapping
zsmalloc: squeeze freelist into page->mapping
zsmalloc: move struct zs_meta from mapping to freelist
zsmalloc: factor page chain functionality out
zsmalloc: separate free_zspage from putback_zspage
zsmalloc: zs_compact refactoring
3. add non-lru page migration feature
mm/compaction: support non-lru movable page migration
4. rework KVM memory-ballooning
mm/balloon: use general movable page feature into balloon
5. add zsmalloc page migration
zsmalloc: migrate head page of zspage
zsmalloc: use single linked list for page chain
zsmalloc: migrate tail pages in zspage
zram: use __GFP_MOVABLE for memory allocation
* From v1
* rebase on v4.5-mmotm-2016-03-17-15-04
* reordering patches to merge clean-up patches first
* add Acked-by/Reviewed-by from Vlastimil and Sergey
* use each own mount model instead of reusing anon_inode_fs - Al Viro
* small changes - YiPing, Gioh
Minchan Kim (18):
mm: use put_page to free page instead of putback_lru_page
zsmalloc: use first_page rather than page
zsmalloc: clean up many BUG_ON
zsmalloc: reordering function parameter
zsmalloc: remove unused pool param in obj_free
zsmalloc: keep max_object in size_class
zsmalloc: squeeze inuse into page->mapping
zsmalloc: squeeze freelist into page->mapping
zsmalloc: move struct zs_meta from mapping to freelist
zsmalloc: factor page chain functionality out
zsmalloc: separate free_zspage from putback_zspage
zsmalloc: zs_compact refactoring
mm/compaction: support non-lru movable page migration
mm/balloon: use general movable page feature into balloon
zsmalloc: migrate head page of zspage
zsmalloc: use single linked list for page chain
zsmalloc: migrate tail pages in zspage
zram: use __GFP_MOVABLE for memory allocation
Documentation/filesystems/Locking | 4 +
Documentation/filesystems/vfs.txt | 5 +
drivers/block/zram/zram_drv.c | 3 +-
drivers/virtio/virtio_balloon.c | 45 +-
fs/proc/page.c | 3 +
include/linux/balloon_compaction.h | 47 +-
include/linux/fs.h | 2 +
include/linux/migrate.h | 2 +
include/linux/page-flags.h | 41 +-
include/uapi/linux/kernel-page-flags.h | 1 +
include/uapi/linux/magic.h | 2 +
mm/balloon_compaction.c | 101 +--
mm/compaction.c | 15 +-
mm/migrate.c | 198 +++--
mm/vmscan.c | 2 +-
mm/zsmalloc.c | 1338 +++++++++++++++++++++++---------
16 files changed, 1284 insertions(+), 525 deletions(-)
--
1.9.1
Procedure of page migration is as follows:
First of all, it should isolate a page from LRU and try to
migrate the page. If it is successful, it releases the page
for freeing. Otherwise, it should put the page back to LRU
list.
For LRU pages, we have used putback_lru_page for both freeing
and putback to LRU list. It's okay because put_page is aware of
LRU list so if it releases last refcount of the page, it removes
the page from LRU list. However, It makes unnecessary operations
(e.g., lru_cache_add, pagevec and flags operations. It would be
not significant but no worth to do) and harder to support new
non-lru page migration because put_page isn't aware of non-lru
page's data structure.
To solve the problem, we can add new hook in put_page with
PageMovable flags check but it can increase overhead in
hot path and needs new locking scheme to stabilize the flag check
with put_page.
So, this patch cleans it up to divide two semantic(ie, put and putback).
If migration is successful, use put_page instead of putback_lru_page and
use putback_lru_page only on failure. That makes code more readable
and doesn't add overhead in put_page.
Comment from Vlastimil
"Yeah, and compaction (perhaps also other migration users) has to drain
the lru pvec... Getting rid of this stuff is worth even by itself."
Cc: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/migrate.c | 50 +++++++++++++++++++++++++++++++-------------------
1 file changed, 31 insertions(+), 19 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 6c822a7b27e0..b65c84267ce0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -913,6 +913,14 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
put_anon_vma(anon_vma);
unlock_page(page);
out:
+ /* If migration is scucessful, move newpage to right list */
+ if (rc == MIGRATEPAGE_SUCCESS) {
+ if (unlikely(__is_movable_balloon_page(newpage)))
+ put_page(newpage);
+ else
+ putback_lru_page(newpage);
+ }
+
return rc;
}
@@ -946,6 +954,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
if (page_count(page) == 1) {
/* page was freed from under us. So we are done. */
+ ClearPageActive(page);
+ ClearPageUnevictable(page);
+ if (put_new_page)
+ put_new_page(newpage, private);
+ else
+ put_page(newpage);
goto out;
}
@@ -958,10 +972,8 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
}
rc = __unmap_and_move(page, newpage, force, mode);
- if (rc == MIGRATEPAGE_SUCCESS) {
- put_new_page = NULL;
+ if (rc == MIGRATEPAGE_SUCCESS)
set_page_owner_migrate_reason(newpage, reason);
- }
out:
if (rc != -EAGAIN) {
@@ -974,28 +986,28 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- /* Soft-offlined page shouldn't go through lru cache list */
+ }
+
+ /*
+ * If migration is successful, drop the reference grabbed during
+ * isolation. Otherwise, restore the page to LRU list unless we
+ * want to retry.
+ */
+ if (rc == MIGRATEPAGE_SUCCESS) {
+ put_page(page);
if (reason == MR_MEMORY_FAILURE) {
- put_page(page);
if (!test_set_page_hwpoison(page))
num_poisoned_pages_inc();
- } else
+ }
+ } else {
+ if (rc != -EAGAIN)
putback_lru_page(page);
+ if (put_new_page)
+ put_new_page(newpage, private);
+ else
+ put_page(newpage);
}
- /*
- * If migration was not successful and there's a freeing callback, use
- * it. Otherwise, putback_lru_page() will drop the reference grabbed
- * during isolation.
- */
- if (put_new_page)
- put_new_page(newpage, private);
- else if (unlikely(__is_movable_balloon_page(newpage))) {
- /* drop our reference, page already in the balloon */
- put_page(newpage);
- } else
- putback_lru_page(newpage);
-
if (result) {
if (rc)
*result = rc;
--
1.9.1
Let's remove unused pool param in obj_free
Reviewed-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 16556a6db628..a0890e9003e2 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1438,8 +1438,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
}
EXPORT_SYMBOL_GPL(zs_malloc);
-static void obj_free(struct zs_pool *pool, struct size_class *class,
- unsigned long obj)
+static void obj_free(struct size_class *class, unsigned long obj)
{
struct link_free *link;
struct page *first_page, *f_page;
@@ -1485,7 +1484,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
class = pool->size_class[class_idx];
spin_lock(&class->lock);
- obj_free(pool, class, obj);
+ obj_free(class, obj);
fullness = fix_fullness_group(class, first_page);
if (fullness == ZS_EMPTY) {
zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
@@ -1648,7 +1647,7 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
free_obj |= BIT(HANDLE_PIN_BIT);
record_obj(handle, free_obj);
unpin_tag(handle);
- obj_free(pool, class, used_obj);
+ obj_free(class, used_obj);
}
/* Remember last position in this iteration */
--
1.9.1
Currently, putback_zspage does free zspage under class->lock
if fullness become ZS_EMPTY but it makes trouble to implement
locking scheme for new zspage migration.
So, this patch is to separate free_zspage from putback_zspage
and free zspage out of class->lock which is preparation for
zspage migration.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 46 +++++++++++++++++++++++-----------------------
1 file changed, 23 insertions(+), 23 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 833da8f4ffc9..9c0ab1e92e9b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -950,7 +950,8 @@ static void reset_page(struct page *page)
page_mapcount_reset(page);
}
-static void free_zspage(struct page *first_page)
+static void free_zspage(struct zs_pool *pool, struct size_class *class,
+ struct page *first_page)
{
struct page *nextp, *tmp, *head_extra;
@@ -973,6 +974,11 @@ static void free_zspage(struct page *first_page)
}
reset_page(head_extra);
__free_page(head_extra);
+
+ zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+ class->size, class->pages_per_zspage));
+ atomic_long_sub(class->pages_per_zspage,
+ &pool->pages_allocated);
}
/* Initialize a newly allocated zspage */
@@ -1560,13 +1566,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
spin_lock(&class->lock);
obj_free(class, obj);
fullness = fix_fullness_group(class, first_page);
- if (fullness == ZS_EMPTY) {
- zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
- class->size, class->pages_per_zspage));
- atomic_long_sub(class->pages_per_zspage,
- &pool->pages_allocated);
- free_zspage(first_page);
- }
+ if (fullness == ZS_EMPTY)
+ free_zspage(pool, class, first_page);
spin_unlock(&class->lock);
unpin_tag(handle);
@@ -1753,7 +1754,7 @@ static struct page *isolate_target_page(struct size_class *class)
* @class: destination class
* @first_page: target page
*
- * Return @fist_page's fullness_group
+ * Return @first_page's updated fullness_group
*/
static enum fullness_group putback_zspage(struct zs_pool *pool,
struct size_class *class,
@@ -1765,15 +1766,6 @@ static enum fullness_group putback_zspage(struct zs_pool *pool,
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
- if (fullness == ZS_EMPTY) {
- zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
- class->size, class->pages_per_zspage));
- atomic_long_sub(class->pages_per_zspage,
- &pool->pages_allocated);
-
- free_zspage(first_page);
- }
-
return fullness;
}
@@ -1836,23 +1828,31 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
if (!migrate_zspage(pool, class, &cc))
break;
- putback_zspage(pool, class, dst_page);
+ VM_BUG_ON_PAGE(putback_zspage(pool, class,
+ dst_page) == ZS_EMPTY, dst_page);
}
/* Stop if we couldn't find slot */
if (dst_page == NULL)
break;
- putback_zspage(pool, class, dst_page);
- if (putback_zspage(pool, class, src_page) == ZS_EMPTY)
+ VM_BUG_ON_PAGE(putback_zspage(pool, class,
+ dst_page) == ZS_EMPTY, dst_page);
+ if (putback_zspage(pool, class, src_page) == ZS_EMPTY) {
pool->stats.pages_compacted += class->pages_per_zspage;
- spin_unlock(&class->lock);
+ spin_unlock(&class->lock);
+ free_zspage(pool, class, src_page);
+ } else {
+ spin_unlock(&class->lock);
+ }
+
cond_resched();
spin_lock(&class->lock);
}
if (src_page)
- putback_zspage(pool, class, src_page);
+ VM_BUG_ON_PAGE(putback_zspage(pool, class,
+ src_page) == ZS_EMPTY, src_page);
spin_unlock(&class->lock);
}
--
1.9.1
This patch cleans up function parameter "struct page".
Many functions of zsmalloc expects that page paramter is "first_page"
so use "first_page" rather than "page" for code readability.
Reviewed-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 62 ++++++++++++++++++++++++++++++-----------------------------
1 file changed, 32 insertions(+), 30 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e72efb109fde..b09a80d398c9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -413,26 +413,28 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
}
-static void get_zspage_mapping(struct page *page, unsigned int *class_idx,
+static void get_zspage_mapping(struct page *first_page,
+ unsigned int *class_idx,
enum fullness_group *fullness)
{
unsigned long m;
- BUG_ON(!is_first_page(page));
+ BUG_ON(!is_first_page(first_page));
- m = (unsigned long)page->mapping;
+ m = (unsigned long)first_page->mapping;
*fullness = m & FULLNESS_MASK;
*class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK;
}
-static void set_zspage_mapping(struct page *page, unsigned int class_idx,
+static void set_zspage_mapping(struct page *first_page,
+ unsigned int class_idx,
enum fullness_group fullness)
{
unsigned long m;
- BUG_ON(!is_first_page(page));
+ BUG_ON(!is_first_page(first_page));
m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
(fullness & FULLNESS_MASK);
- page->mapping = (struct address_space *)m;
+ first_page->mapping = (struct address_space *)m;
}
/*
@@ -625,14 +627,14 @@ static inline void zs_pool_stat_destroy(struct zs_pool *pool)
* the pool (not yet implemented). This function returns fullness
* status of the given page.
*/
-static enum fullness_group get_fullness_group(struct page *page)
+static enum fullness_group get_fullness_group(struct page *first_page)
{
int inuse, max_objects;
enum fullness_group fg;
- BUG_ON(!is_first_page(page));
+ BUG_ON(!is_first_page(first_page));
- inuse = page->inuse;
- max_objects = page->objects;
+ inuse = first_page->inuse;
+ max_objects = first_page->objects;
if (inuse == 0)
fg = ZS_EMPTY;
@@ -652,12 +654,12 @@ static enum fullness_group get_fullness_group(struct page *page)
* have. This functions inserts the given zspage into the freelist
* identified by <class, fullness_group>.
*/
-static void insert_zspage(struct page *page, struct size_class *class,
+static void insert_zspage(struct page *first_page, struct size_class *class,
enum fullness_group fullness)
{
struct page **head;
- BUG_ON(!is_first_page(page));
+ BUG_ON(!is_first_page(first_page));
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -667,7 +669,7 @@ static void insert_zspage(struct page *page, struct size_class *class,
head = &class->fullness_list[fullness];
if (!*head) {
- *head = page;
+ *head = first_page;
return;
}
@@ -675,21 +677,21 @@ static void insert_zspage(struct page *page, struct size_class *class,
* We want to see more ZS_FULL pages and less almost
* empty/full. Put pages with higher ->inuse first.
*/
- list_add_tail(&page->lru, &(*head)->lru);
- if (page->inuse >= (*head)->inuse)
- *head = page;
+ list_add_tail(&first_page->lru, &(*head)->lru);
+ if (first_page->inuse >= (*head)->inuse)
+ *head = first_page;
}
/*
* This function removes the given zspage from the freelist identified
* by <class, fullness_group>.
*/
-static void remove_zspage(struct page *page, struct size_class *class,
+static void remove_zspage(struct page *first_page, struct size_class *class,
enum fullness_group fullness)
{
struct page **head;
- BUG_ON(!is_first_page(page));
+ BUG_ON(!is_first_page(first_page));
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -698,11 +700,11 @@ static void remove_zspage(struct page *page, struct size_class *class,
BUG_ON(!*head);
if (list_empty(&(*head)->lru))
*head = NULL;
- else if (*head == page)
+ else if (*head == first_page)
*head = (struct page *)list_entry((*head)->lru.next,
struct page, lru);
- list_del_init(&page->lru);
+ list_del_init(&first_page->lru);
zs_stat_dec(class, fullness == ZS_ALMOST_EMPTY ?
CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1);
}
@@ -717,21 +719,21 @@ static void remove_zspage(struct page *page, struct size_class *class,
* fullness group.
*/
static enum fullness_group fix_fullness_group(struct size_class *class,
- struct page *page)
+ struct page *first_page)
{
int class_idx;
enum fullness_group currfg, newfg;
- BUG_ON(!is_first_page(page));
+ BUG_ON(!is_first_page(first_page));
- get_zspage_mapping(page, &class_idx, &currfg);
- newfg = get_fullness_group(page);
+ get_zspage_mapping(first_page, &class_idx, &currfg);
+ newfg = get_fullness_group(first_page);
if (newfg == currfg)
goto out;
- remove_zspage(page, class, currfg);
- insert_zspage(page, class, newfg);
- set_zspage_mapping(page, class_idx, newfg);
+ remove_zspage(first_page, class, currfg);
+ insert_zspage(first_page, class, newfg);
+ set_zspage_mapping(first_page, class_idx, newfg);
out:
return newfg;
@@ -1234,11 +1236,11 @@ static bool can_merge(struct size_class *prev, int size, int pages_per_zspage)
return true;
}
-static bool zspage_full(struct page *page)
+static bool zspage_full(struct page *first_page)
{
- BUG_ON(!is_first_page(page));
+ BUG_ON(!is_first_page(first_page));
- return page->inuse == page->objects;
+ return first_page->inuse == first_page->objects;
}
unsigned long zs_get_total_pages(struct zs_pool *pool)
--
1.9.1
There are many BUG_ON in zsmalloc.c which is not recommened so
change them as alternatives.
Normal rule is as follows:
1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE
2. use VM_BUG_ON_PAGE if we need to see struct page's fields
3. use those assertion in primitive functions so higher functions
can rely on the assertion in the primitive function.
4. Don't use assertion if following instruction can trigger Oops
Reviewed-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 42 +++++++++++++++---------------------------
1 file changed, 15 insertions(+), 27 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b09a80d398c9..6a7b9313ee8c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -418,7 +418,7 @@ static void get_zspage_mapping(struct page *first_page,
enum fullness_group *fullness)
{
unsigned long m;
- BUG_ON(!is_first_page(first_page));
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
m = (unsigned long)first_page->mapping;
*fullness = m & FULLNESS_MASK;
@@ -430,7 +430,7 @@ static void set_zspage_mapping(struct page *first_page,
enum fullness_group fullness)
{
unsigned long m;
- BUG_ON(!is_first_page(first_page));
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
(fullness & FULLNESS_MASK);
@@ -631,7 +631,8 @@ static enum fullness_group get_fullness_group(struct page *first_page)
{
int inuse, max_objects;
enum fullness_group fg;
- BUG_ON(!is_first_page(first_page));
+
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
inuse = first_page->inuse;
max_objects = first_page->objects;
@@ -659,7 +660,7 @@ static void insert_zspage(struct page *first_page, struct size_class *class,
{
struct page **head;
- BUG_ON(!is_first_page(first_page));
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
@@ -691,13 +692,13 @@ static void remove_zspage(struct page *first_page, struct size_class *class,
{
struct page **head;
- BUG_ON(!is_first_page(first_page));
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
if (fullness >= _ZS_NR_FULLNESS_GROUPS)
return;
head = &class->fullness_list[fullness];
- BUG_ON(!*head);
+ VM_BUG_ON_PAGE(!*head, first_page);
if (list_empty(&(*head)->lru))
*head = NULL;
else if (*head == first_page)
@@ -724,8 +725,6 @@ static enum fullness_group fix_fullness_group(struct size_class *class,
int class_idx;
enum fullness_group currfg, newfg;
- BUG_ON(!is_first_page(first_page));
-
get_zspage_mapping(first_page, &class_idx, &currfg);
newfg = get_fullness_group(first_page);
if (newfg == currfg)
@@ -811,7 +810,7 @@ static void *location_to_obj(struct page *page, unsigned long obj_idx)
unsigned long obj;
if (!page) {
- BUG_ON(obj_idx);
+ VM_BUG_ON(obj_idx);
return NULL;
}
@@ -844,7 +843,7 @@ static unsigned long obj_to_head(struct size_class *class, struct page *page,
void *obj)
{
if (class->huge) {
- VM_BUG_ON(!is_first_page(page));
+ VM_BUG_ON_PAGE(!is_first_page(page), page);
return page_private(page);
} else
return *(unsigned long *)obj;
@@ -894,8 +893,8 @@ static void free_zspage(struct page *first_page)
{
struct page *nextp, *tmp, *head_extra;
- BUG_ON(!is_first_page(first_page));
- BUG_ON(first_page->inuse);
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+ VM_BUG_ON_PAGE(first_page->inuse, first_page);
head_extra = (struct page *)page_private(first_page);
@@ -921,7 +920,8 @@ static void init_zspage(struct page *first_page, struct size_class *class)
unsigned long off = 0;
struct page *page = first_page;
- BUG_ON(!is_first_page(first_page));
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
while (page) {
struct page *next_page;
struct link_free *link;
@@ -1238,7 +1238,7 @@ static bool can_merge(struct size_class *prev, int size, int pages_per_zspage)
static bool zspage_full(struct page *first_page)
{
- BUG_ON(!is_first_page(first_page));
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
return first_page->inuse == first_page->objects;
}
@@ -1276,14 +1276,12 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
struct page *pages[2];
void *ret;
- BUG_ON(!handle);
-
/*
* Because we use per-cpu mapping areas shared among the
* pools/users, we can't allow mapping in interrupt context
* because it can corrupt another users mappings.
*/
- BUG_ON(in_interrupt());
+ WARN_ON_ONCE(in_interrupt());
/* From now on, migration cannot move the object */
pin_tag(handle);
@@ -1327,8 +1325,6 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
struct size_class *class;
struct mapping_area *area;
- BUG_ON(!handle);
-
obj = handle_to_obj(handle);
obj_to_location(obj, &page, &obj_idx);
get_zspage_mapping(get_first_page(page), &class_idx, &fg);
@@ -1448,8 +1444,6 @@ static void obj_free(struct zs_pool *pool, struct size_class *class,
unsigned long f_objidx, f_offset;
void *vaddr;
- BUG_ON(!obj);
-
obj &= ~OBJ_ALLOCATED_TAG;
obj_to_location(obj, &f_page, &f_objidx);
first_page = get_first_page(f_page);
@@ -1549,7 +1543,6 @@ static void zs_object_copy(unsigned long dst, unsigned long src,
kunmap_atomic(d_addr);
kunmap_atomic(s_addr);
s_page = get_next_page(s_page);
- BUG_ON(!s_page);
s_addr = kmap_atomic(s_page);
d_addr = kmap_atomic(d_page);
s_size = class->size - written;
@@ -1559,7 +1552,6 @@ static void zs_object_copy(unsigned long dst, unsigned long src,
if (d_off >= PAGE_SIZE) {
kunmap_atomic(d_addr);
d_page = get_next_page(d_page);
- BUG_ON(!d_page);
d_addr = kmap_atomic(d_page);
d_size = class->size - written;
d_off = 0;
@@ -1694,8 +1686,6 @@ static enum fullness_group putback_zspage(struct zs_pool *pool,
{
enum fullness_group fullness;
- BUG_ON(!is_first_page(first_page));
-
fullness = get_fullness_group(first_page);
insert_zspage(first_page, class, fullness);
set_zspage_mapping(first_page, class->index, fullness);
@@ -1756,8 +1746,6 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
spin_lock(&class->lock);
while ((src_page = isolate_source_page(class))) {
- BUG_ON(!is_first_page(src_page));
-
if (!zs_can_compact(class))
break;
--
1.9.1
Every zspage in a size_class has same number of max objects so
we could move it to a size_class.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a0890e9003e2..8649d0243e6c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -32,8 +32,6 @@
* page->freelist: points to the first free object in zspage.
* Free objects are linked together using in-place
* metadata.
- * page->objects: maximum number of objects we can store in this
- * zspage (class->zspage_order * PAGE_SIZE / class->size)
* page->lru: links together first pages of various zspages.
* Basically forming list of zspages in a fullness group.
* page->mapping: class index and fullness group of the zspage
@@ -211,6 +209,7 @@ struct size_class {
* of ZS_ALIGN.
*/
int size;
+ int objs_per_zspage;
unsigned int index;
struct zs_size_stat stats;
@@ -627,21 +626,22 @@ static inline void zs_pool_stat_destroy(struct zs_pool *pool)
* the pool (not yet implemented). This function returns fullness
* status of the given page.
*/
-static enum fullness_group get_fullness_group(struct page *first_page)
+static enum fullness_group get_fullness_group(struct size_class *class,
+ struct page *first_page)
{
- int inuse, max_objects;
+ int inuse, objs_per_zspage;
enum fullness_group fg;
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
inuse = first_page->inuse;
- max_objects = first_page->objects;
+ objs_per_zspage = class->objs_per_zspage;
if (inuse == 0)
fg = ZS_EMPTY;
- else if (inuse == max_objects)
+ else if (inuse == objs_per_zspage)
fg = ZS_FULL;
- else if (inuse <= 3 * max_objects / fullness_threshold_frac)
+ else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac)
fg = ZS_ALMOST_EMPTY;
else
fg = ZS_ALMOST_FULL;
@@ -728,7 +728,7 @@ static enum fullness_group fix_fullness_group(struct size_class *class,
enum fullness_group currfg, newfg;
get_zspage_mapping(first_page, &class_idx, &currfg);
- newfg = get_fullness_group(first_page);
+ newfg = get_fullness_group(class, first_page);
if (newfg == currfg)
goto out;
@@ -1008,9 +1008,6 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
init_zspage(class, first_page);
first_page->freelist = location_to_obj(first_page, 0);
- /* Maximum number of objects we can store in this zspage */
- first_page->objects = class->pages_per_zspage * PAGE_SIZE / class->size;
-
error = 0; /* Success */
cleanup:
@@ -1238,11 +1235,11 @@ static bool can_merge(struct size_class *prev, int size, int pages_per_zspage)
return true;
}
-static bool zspage_full(struct page *first_page)
+static bool zspage_full(struct size_class *class, struct page *first_page)
{
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- return first_page->inuse == first_page->objects;
+ return first_page->inuse == class->objs_per_zspage;
}
unsigned long zs_get_total_pages(struct zs_pool *pool)
@@ -1628,7 +1625,7 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
}
/* Stop if there is no more space */
- if (zspage_full(d_page)) {
+ if (zspage_full(class, d_page)) {
unpin_tag(handle);
ret = -ENOMEM;
break;
@@ -1687,7 +1684,7 @@ static enum fullness_group putback_zspage(struct zs_pool *pool,
{
enum fullness_group fullness;
- fullness = get_fullness_group(first_page);
+ fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
@@ -1936,8 +1933,9 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t flags)
class->size = size;
class->index = i;
class->pages_per_zspage = pages_per_zspage;
- if (pages_per_zspage == 1 &&
- get_maxobj_per_zspage(size, pages_per_zspage) == 1)
+ class->objs_per_zspage = class->pages_per_zspage *
+ PAGE_SIZE / class->size;
+ if (pages_per_zspage == 1 && class->objs_per_zspage == 1)
class->huge = true;
spin_lock_init(&class->lock);
pool->size_class[i] = class;
--
1.9.1
Currently, we store class:fullness into page->mapping.
The number of class we can support is 255 and fullness is 4 so
(8 + 2 = 10bit) is enough to represent them.
Meanwhile, the bits we need to store in-use objects in zspage
is that 11bit is enough.
For example, If we assume that 64K PAGE_SIZE, class_size 32
which is worst case, class->pages_per_zspage become 1 so
the number of objects in zspage is 2048 so 11bit is enough.
The next class is 32 + 256(i.e., ZS_SIZE_CLASS_DELTA).
With worst case that ZS_MAX_PAGES_PER_ZSPAGE, 64K * 4 /
(32 + 256) = 910 so 11bit is still enough.
So, we could squeeze inuse object count to page->mapping.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 103 ++++++++++++++++++++++++++++++++++++++++------------------
1 file changed, 71 insertions(+), 32 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8649d0243e6c..4dd72a803568 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -34,8 +34,7 @@
* metadata.
* page->lru: links together first pages of various zspages.
* Basically forming list of zspages in a fullness group.
- * page->mapping: class index and fullness group of the zspage
- * page->inuse: the number of objects that are used in this zspage
+ * page->mapping: override by struct zs_meta
*
* Usage of struct page flags:
* PG_private: identifies the first component page
@@ -132,6 +131,13 @@
/* each chunk includes extra space to keep handle */
#define ZS_MAX_ALLOC_SIZE PAGE_SIZE
+#define CLASS_BITS 8
+#define CLASS_MASK ((1 << CLASS_BITS) - 1)
+#define FULLNESS_BITS 2
+#define FULLNESS_MASK ((1 << FULLNESS_BITS) - 1)
+#define INUSE_BITS 11
+#define INUSE_MASK ((1 << INUSE_BITS) - 1)
+
/*
* On systems with 4K page size, this gives 255 size classes! There is a
* trader-off here:
@@ -145,7 +151,7 @@
* ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
* (reason above)
*/
-#define ZS_SIZE_CLASS_DELTA (PAGE_SIZE >> 8)
+#define ZS_SIZE_CLASS_DELTA (PAGE_SIZE >> CLASS_BITS)
/*
* We do not maintain any list for completely empty or full pages
@@ -155,7 +161,7 @@ enum fullness_group {
ZS_ALMOST_EMPTY,
_ZS_NR_FULLNESS_GROUPS,
- ZS_EMPTY,
+ ZS_EMPTY = _ZS_NR_FULLNESS_GROUPS,
ZS_FULL
};
@@ -263,14 +269,11 @@ struct zs_pool {
#endif
};
-/*
- * A zspage's class index and fullness group
- * are encoded in its (first)page->mapping
- */
-#define CLASS_IDX_BITS 28
-#define FULLNESS_BITS 4
-#define CLASS_IDX_MASK ((1 << CLASS_IDX_BITS) - 1)
-#define FULLNESS_MASK ((1 << FULLNESS_BITS) - 1)
+struct zs_meta {
+ unsigned long class:CLASS_BITS;
+ unsigned long fullness:FULLNESS_BITS;
+ unsigned long inuse:INUSE_BITS;
+};
struct mapping_area {
#ifdef CONFIG_PGTABLE_MAPPING
@@ -412,28 +415,61 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
}
+static int get_zspage_inuse(struct page *first_page)
+{
+ struct zs_meta *m;
+
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+ m = (struct zs_meta *)&first_page->mapping;
+
+ return m->inuse;
+}
+
+static void set_zspage_inuse(struct page *first_page, int val)
+{
+ struct zs_meta *m;
+
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+ m = (struct zs_meta *)&first_page->mapping;
+ m->inuse = val;
+}
+
+static void mod_zspage_inuse(struct page *first_page, int val)
+{
+ struct zs_meta *m;
+
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+ m = (struct zs_meta *)&first_page->mapping;
+ m->inuse += val;
+}
+
static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
{
- unsigned long m;
+ struct zs_meta *m;
+
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- m = (unsigned long)first_page->mapping;
- *fullness = m & FULLNESS_MASK;
- *class_idx = (m >> FULLNESS_BITS) & CLASS_IDX_MASK;
+ m = (struct zs_meta *)&first_page->mapping;
+ *fullness = m->fullness;
+ *class_idx = m->class;
}
static void set_zspage_mapping(struct page *first_page,
unsigned int class_idx,
enum fullness_group fullness)
{
- unsigned long m;
+ struct zs_meta *m;
+
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- m = ((class_idx & CLASS_IDX_MASK) << FULLNESS_BITS) |
- (fullness & FULLNESS_MASK);
- first_page->mapping = (struct address_space *)m;
+ m = (struct zs_meta *)&first_page->mapping;
+ m->fullness = fullness;
+ m->class = class_idx;
}
/*
@@ -632,9 +668,7 @@ static enum fullness_group get_fullness_group(struct size_class *class,
int inuse, objs_per_zspage;
enum fullness_group fg;
- VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
-
- inuse = first_page->inuse;
+ inuse = get_zspage_inuse(first_page);
objs_per_zspage = class->objs_per_zspage;
if (inuse == 0)
@@ -677,10 +711,10 @@ static void insert_zspage(struct size_class *class,
/*
* We want to see more ZS_FULL pages and less almost
- * empty/full. Put pages with higher ->inuse first.
+ * empty/full. Put pages with higher inuse first.
*/
list_add_tail(&first_page->lru, &(*head)->lru);
- if (first_page->inuse >= (*head)->inuse)
+ if (get_zspage_inuse(first_page) >= get_zspage_inuse(*head))
*head = first_page;
}
@@ -896,7 +930,7 @@ static void free_zspage(struct page *first_page)
struct page *nextp, *tmp, *head_extra;
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- VM_BUG_ON_PAGE(first_page->inuse, first_page);
+ VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page);
head_extra = (struct page *)page_private(first_page);
@@ -992,7 +1026,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
SetPagePrivate(page);
set_page_private(page, 0);
first_page = page;
- first_page->inuse = 0;
+ set_zspage_inuse(page, 0);
}
if (i == 1)
set_page_private(first_page, (unsigned long)page);
@@ -1237,9 +1271,7 @@ static bool can_merge(struct size_class *prev, int size, int pages_per_zspage)
static bool zspage_full(struct size_class *class, struct page *first_page)
{
- VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
-
- return first_page->inuse == class->objs_per_zspage;
+ return get_zspage_inuse(first_page) == class->objs_per_zspage;
}
unsigned long zs_get_total_pages(struct zs_pool *pool)
@@ -1372,7 +1404,7 @@ static unsigned long obj_malloc(struct size_class *class,
/* record handle in first_page->private */
set_page_private(first_page, handle);
kunmap_atomic(vaddr);
- first_page->inuse++;
+ mod_zspage_inuse(first_page, 1);
zs_stat_inc(class, OBJ_USED, 1);
return obj;
@@ -1457,7 +1489,7 @@ static void obj_free(struct size_class *class, unsigned long obj)
set_page_private(first_page, 0);
kunmap_atomic(vaddr);
first_page->freelist = (void *)obj;
- first_page->inuse--;
+ mod_zspage_inuse(first_page, -1);
zs_stat_dec(class, OBJ_USED, 1);
}
@@ -2002,6 +2034,13 @@ static int __init zs_init(void)
if (ret)
goto notifier_fail;
+ /*
+ * A zspage's class index, fullness group, inuse object count are
+ * encoded in its (first)page->mapping so sizeof(struct zs_meta)
+ * should be less than sizeof(page->mapping(i.e., unsigned long)).
+ */
+ BUILD_BUG_ON(sizeof(struct zs_meta) > sizeof(unsigned long));
+
init_zs_size_classes();
#ifdef CONFIG_ZPOOL
--
1.9.1
Zsmalloc stores first free object's position into first_page->freelist
in each zspage. If we change it with object index from first_page
instead of location, we could squeeze it into page->mapping because
the number of bit we need to store offset is at most 11bit.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 159 +++++++++++++++++++++++++++++++++++-----------------------
1 file changed, 96 insertions(+), 63 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 4dd72a803568..0c8ccd87c084 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -18,9 +18,7 @@
* Usage of struct page fields:
* page->private: points to the first component (0-order) page
* page->index (union with page->freelist): offset of the first object
- * starting in this page. For the first page, this is
- * always 0, so we use this field (aka freelist) to point
- * to the first free object in zspage.
+ * starting in this page.
* page->lru: links together all component pages (except the first page)
* of a zspage
*
@@ -29,9 +27,6 @@
* page->private: refers to the component page after the first page
* If the page is first_page for huge object, it stores handle.
* Look at size_class->huge.
- * page->freelist: points to the first free object in zspage.
- * Free objects are linked together using in-place
- * metadata.
* page->lru: links together first pages of various zspages.
* Basically forming list of zspages in a fullness group.
* page->mapping: override by struct zs_meta
@@ -131,6 +126,7 @@
/* each chunk includes extra space to keep handle */
#define ZS_MAX_ALLOC_SIZE PAGE_SIZE
+#define FREEOBJ_BITS 11
#define CLASS_BITS 8
#define CLASS_MASK ((1 << CLASS_BITS) - 1)
#define FULLNESS_BITS 2
@@ -228,17 +224,17 @@ struct size_class {
/*
* Placed within free objects to form a singly linked list.
- * For every zspage, first_page->freelist gives head of this list.
+ * For every zspage, first_page->freeobj gives head of this list.
*
* This must be power of 2 and less than or equal to ZS_ALIGN
*/
struct link_free {
union {
/*
- * Position of next free chunk (encodes <PFN, obj_idx>)
+ * free object list
* It's valid for non-allocated object
*/
- void *next;
+ unsigned long next;
/*
* Handle of allocated object.
*/
@@ -270,6 +266,7 @@ struct zs_pool {
};
struct zs_meta {
+ unsigned long freeobj:FREEOBJ_BITS;
unsigned long class:CLASS_BITS;
unsigned long fullness:FULLNESS_BITS;
unsigned long inuse:INUSE_BITS;
@@ -446,6 +443,26 @@ static void mod_zspage_inuse(struct page *first_page, int val)
m->inuse += val;
}
+static void set_freeobj(struct page *first_page, int idx)
+{
+ struct zs_meta *m;
+
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+ m = (struct zs_meta *)&first_page->mapping;
+ m->freeobj = idx;
+}
+
+static unsigned long get_freeobj(struct page *first_page)
+{
+ struct zs_meta *m;
+
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+ m = (struct zs_meta *)&first_page->mapping;
+ return m->freeobj;
+}
+
static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
@@ -837,30 +854,33 @@ static struct page *get_next_page(struct page *page)
return next;
}
-/*
- * Encode <page, obj_idx> as a single handle value.
- * We use the least bit of handle for tagging.
- */
-static void *location_to_obj(struct page *page, unsigned long obj_idx)
+static void objidx_to_page_and_offset(struct size_class *class,
+ struct page *first_page,
+ unsigned long obj_idx,
+ struct page **obj_page,
+ unsigned long *offset_in_page)
{
- unsigned long obj;
+ int i;
+ unsigned long offset;
+ struct page *cursor;
+ int nr_page;
- if (!page) {
- VM_BUG_ON(obj_idx);
- return NULL;
- }
+ offset = obj_idx * class->size;
+ cursor = first_page;
+ nr_page = offset >> PAGE_SHIFT;
- obj = page_to_pfn(page) << OBJ_INDEX_BITS;
- obj |= ((obj_idx) & OBJ_INDEX_MASK);
- obj <<= OBJ_TAG_BITS;
+ *offset_in_page = offset & ~PAGE_MASK;
+
+ for (i = 0; i < nr_page; i++)
+ cursor = get_next_page(cursor);
- return (void *)obj;
+ *obj_page = cursor;
}
-/*
- * Decode <page, obj_idx> pair from the given object handle. We adjust the
- * decoded obj_idx back to its original value since it was adjusted in
- * location_to_obj().
+/**
+ * obj_to_location - get (<page>, <obj_idx>) from encoded object value
+ * @page: page object resides in zspage
+ * @obj_idx: object index
*/
static void obj_to_location(unsigned long obj, struct page **page,
unsigned long *obj_idx)
@@ -870,6 +890,23 @@ static void obj_to_location(unsigned long obj, struct page **page,
*obj_idx = (obj & OBJ_INDEX_MASK);
}
+/**
+ * location_to_obj - get obj value encoded from (<page>, <obj_idx>)
+ * @page: page object resides in zspage
+ * @obj_idx: object index
+ */
+static unsigned long location_to_obj(struct page *page,
+ unsigned long obj_idx)
+{
+ unsigned long obj;
+
+ obj = page_to_pfn(page) << OBJ_INDEX_BITS;
+ obj |= obj_idx & OBJ_INDEX_MASK;
+ obj <<= OBJ_TAG_BITS;
+
+ return obj;
+}
+
static unsigned long handle_to_obj(unsigned long handle)
{
return *(unsigned long *)handle;
@@ -885,17 +922,6 @@ static unsigned long obj_to_head(struct size_class *class, struct page *page,
return *(unsigned long *)obj;
}
-static unsigned long obj_idx_to_offset(struct page *page,
- unsigned long obj_idx, int class_size)
-{
- unsigned long off = 0;
-
- if (!is_first_page(page))
- off = page->index;
-
- return off + obj_idx * class_size;
-}
-
static inline int trypin_tag(unsigned long handle)
{
unsigned long *ptr = (unsigned long *)handle;
@@ -921,7 +947,6 @@ static void reset_page(struct page *page)
clear_bit(PG_private_2, &page->flags);
set_page_private(page, 0);
page->mapping = NULL;
- page->freelist = NULL;
page_mapcount_reset(page);
}
@@ -953,6 +978,7 @@ static void free_zspage(struct page *first_page)
/* Initialize a newly allocated zspage */
static void init_zspage(struct size_class *class, struct page *first_page)
{
+ int freeobj = 1;
unsigned long off = 0;
struct page *page = first_page;
@@ -961,14 +987,11 @@ static void init_zspage(struct size_class *class, struct page *first_page)
while (page) {
struct page *next_page;
struct link_free *link;
- unsigned int i = 1;
void *vaddr;
/*
* page->index stores offset of first object starting
- * in the page. For the first page, this is always 0,
- * so we use first_page->index (aka ->freelist) to store
- * head of corresponding zspage's freelist.
+ * in the page.
*/
if (page != first_page)
page->index = off;
@@ -977,7 +1000,7 @@ static void init_zspage(struct size_class *class, struct page *first_page)
link = (struct link_free *)vaddr + off / sizeof(*link);
while ((off += class->size) < PAGE_SIZE) {
- link->next = location_to_obj(page, i++);
+ link->next = freeobj++ << OBJ_ALLOCATED_TAG;
link += class->size / sizeof(*link);
}
@@ -987,11 +1010,21 @@ static void init_zspage(struct size_class *class, struct page *first_page)
* page (if present)
*/
next_page = get_next_page(page);
- link->next = location_to_obj(next_page, 0);
+ if (next_page) {
+ link->next = freeobj++ << OBJ_ALLOCATED_TAG;
+ } else {
+ /*
+ * Reset OBJ_ALLOCATED_TAG bit to last link for
+ * migration to know it is allocated object or not.
+ */
+ link->next = -1 << OBJ_ALLOCATED_TAG;
+ }
kunmap_atomic(vaddr);
page = next_page;
off %= PAGE_SIZE;
}
+
+ set_freeobj(first_page, 0);
}
/*
@@ -1041,7 +1074,6 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
init_zspage(class, first_page);
- first_page->freelist = location_to_obj(first_page, 0);
error = 0; /* Success */
cleanup:
@@ -1321,7 +1353,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
obj_to_location(obj, &page, &obj_idx);
get_zspage_mapping(get_first_page(page), &class_idx, &fg);
class = pool->size_class[class_idx];
- off = obj_idx_to_offset(page, obj_idx, class->size);
+ off = (class->size * obj_idx) & ~PAGE_MASK;
area = &get_cpu_var(zs_map_area);
area->vm_mm = mm;
@@ -1360,7 +1392,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
obj_to_location(obj, &page, &obj_idx);
get_zspage_mapping(get_first_page(page), &class_idx, &fg);
class = pool->size_class[class_idx];
- off = obj_idx_to_offset(page, obj_idx, class->size);
+ off = (class->size * obj_idx) & ~PAGE_MASK;
area = this_cpu_ptr(&zs_map_area);
if (off + class->size <= PAGE_SIZE)
@@ -1386,17 +1418,17 @@ static unsigned long obj_malloc(struct size_class *class,
struct link_free *link;
struct page *m_page;
- unsigned long m_objidx, m_offset;
+ unsigned long m_offset;
void *vaddr;
handle |= OBJ_ALLOCATED_TAG;
- obj = (unsigned long)first_page->freelist;
- obj_to_location(obj, &m_page, &m_objidx);
- m_offset = obj_idx_to_offset(m_page, m_objidx, class->size);
+ obj = get_freeobj(first_page);
+ objidx_to_page_and_offset(class, first_page, obj,
+ &m_page, &m_offset);
vaddr = kmap_atomic(m_page);
link = (struct link_free *)vaddr + m_offset / sizeof(*link);
- first_page->freelist = link->next;
+ set_freeobj(first_page, link->next >> OBJ_ALLOCATED_TAG);
if (!class->huge)
/* record handle in the header of allocated chunk */
link->handle = handle;
@@ -1407,6 +1439,8 @@ static unsigned long obj_malloc(struct size_class *class,
mod_zspage_inuse(first_page, 1);
zs_stat_inc(class, OBJ_USED, 1);
+ obj = location_to_obj(m_page, obj);
+
return obj;
}
@@ -1476,19 +1510,17 @@ static void obj_free(struct size_class *class, unsigned long obj)
obj &= ~OBJ_ALLOCATED_TAG;
obj_to_location(obj, &f_page, &f_objidx);
+ f_offset = (class->size * f_objidx) & ~PAGE_MASK;
first_page = get_first_page(f_page);
-
- f_offset = obj_idx_to_offset(f_page, f_objidx, class->size);
-
vaddr = kmap_atomic(f_page);
/* Insert this object in containing zspage's freelist */
link = (struct link_free *)(vaddr + f_offset);
- link->next = first_page->freelist;
+ link->next = get_freeobj(first_page) << OBJ_ALLOCATED_TAG;
if (class->huge)
set_page_private(first_page, 0);
kunmap_atomic(vaddr);
- first_page->freelist = (void *)obj;
+ set_freeobj(first_page, f_objidx);
mod_zspage_inuse(first_page, -1);
zs_stat_dec(class, OBJ_USED, 1);
}
@@ -1544,8 +1576,8 @@ static void zs_object_copy(struct size_class *class, unsigned long dst,
obj_to_location(src, &s_page, &s_objidx);
obj_to_location(dst, &d_page, &d_objidx);
- s_off = obj_idx_to_offset(s_page, s_objidx, class->size);
- d_off = obj_idx_to_offset(d_page, d_objidx, class->size);
+ s_off = (class->size * s_objidx) & ~PAGE_MASK;
+ d_off = (class->size * d_objidx) & ~PAGE_MASK;
if (s_off + class->size > PAGE_SIZE)
s_size = PAGE_SIZE - s_off;
@@ -2035,9 +2067,10 @@ static int __init zs_init(void)
goto notifier_fail;
/*
- * A zspage's class index, fullness group, inuse object count are
- * encoded in its (first)page->mapping so sizeof(struct zs_meta)
- * should be less than sizeof(page->mapping(i.e., unsigned long)).
+ * A zspage's a free object index, class index, fullness group,
+ * inuse object count are encoded in its (first)page->mapping
+ * so sizeof(struct zs_meta) should be less than
+ * sizeof(page->mapping(i.e., unsigned long)).
*/
BUILD_BUG_ON(sizeof(struct zs_meta) > sizeof(unsigned long));
--
1.9.1
For supporting migration from VM, we need to have address_space
on every page so zsmalloc shouldn't use page->mapping. So,
this patch moves zs_meta from mapping to freelist.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 0c8ccd87c084..958f27a9079d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -29,7 +29,7 @@
* Look at size_class->huge.
* page->lru: links together first pages of various zspages.
* Basically forming list of zspages in a fullness group.
- * page->mapping: override by struct zs_meta
+ * page->freelist: override by struct zs_meta
*
* Usage of struct page flags:
* PG_private: identifies the first component page
@@ -418,7 +418,7 @@ static int get_zspage_inuse(struct page *first_page)
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- m = (struct zs_meta *)&first_page->mapping;
+ m = (struct zs_meta *)&first_page->freelist;
return m->inuse;
}
@@ -429,7 +429,7 @@ static void set_zspage_inuse(struct page *first_page, int val)
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- m = (struct zs_meta *)&first_page->mapping;
+ m = (struct zs_meta *)&first_page->freelist;
m->inuse = val;
}
@@ -439,7 +439,7 @@ static void mod_zspage_inuse(struct page *first_page, int val)
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- m = (struct zs_meta *)&first_page->mapping;
+ m = (struct zs_meta *)&first_page->freelist;
m->inuse += val;
}
@@ -449,7 +449,7 @@ static void set_freeobj(struct page *first_page, int idx)
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- m = (struct zs_meta *)&first_page->mapping;
+ m = (struct zs_meta *)&first_page->freelist;
m->freeobj = idx;
}
@@ -459,7 +459,7 @@ static unsigned long get_freeobj(struct page *first_page)
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- m = (struct zs_meta *)&first_page->mapping;
+ m = (struct zs_meta *)&first_page->freelist;
return m->freeobj;
}
@@ -471,7 +471,7 @@ static void get_zspage_mapping(struct page *first_page,
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- m = (struct zs_meta *)&first_page->mapping;
+ m = (struct zs_meta *)&first_page->freelist;
*fullness = m->fullness;
*class_idx = m->class;
}
@@ -484,7 +484,7 @@ static void set_zspage_mapping(struct page *first_page,
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
- m = (struct zs_meta *)&first_page->mapping;
+ m = (struct zs_meta *)&first_page->freelist;
m->fullness = fullness;
m->class = class_idx;
}
@@ -946,7 +946,7 @@ static void reset_page(struct page *page)
clear_bit(PG_private, &page->flags);
clear_bit(PG_private_2, &page->flags);
set_page_private(page, 0);
- page->mapping = NULL;
+ page->freelist = NULL;
page_mapcount_reset(page);
}
@@ -1056,6 +1056,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
INIT_LIST_HEAD(&page->lru);
if (i == 0) { /* first page */
+ page->freelist = NULL;
SetPagePrivate(page);
set_page_private(page, 0);
first_page = page;
@@ -2068,9 +2069,9 @@ static int __init zs_init(void)
/*
* A zspage's a free object index, class index, fullness group,
- * inuse object count are encoded in its (first)page->mapping
+ * inuse object count are encoded in its (first)page->freelist
* so sizeof(struct zs_meta) should be less than
- * sizeof(page->mapping(i.e., unsigned long)).
+ * sizeof(page->freelist(i.e., void *)).
*/
BUILD_BUG_ON(sizeof(struct zs_meta) > sizeof(unsigned long));
--
1.9.1
For migration, we need to create sub-page chain of zspage
dynamically so this patch factors it out from alloc_zspage.
As a minor refactoring, it makes OBJ_ALLOCATED_TAG assign
more clear in obj_malloc(it could be another patch but it's
trivial so I want to put together in this patch).
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 80 ++++++++++++++++++++++++++++++++++-------------------------
1 file changed, 46 insertions(+), 34 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 958f27a9079d..833da8f4ffc9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -982,7 +982,9 @@ static void init_zspage(struct size_class *class, struct page *first_page)
unsigned long off = 0;
struct page *page = first_page;
- VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+ first_page->freelist = NULL;
+ INIT_LIST_HEAD(&first_page->lru);
+ set_zspage_inuse(first_page, 0);
while (page) {
struct page *next_page;
@@ -1027,13 +1029,44 @@ static void init_zspage(struct size_class *class, struct page *first_page)
set_freeobj(first_page, 0);
}
+static void create_page_chain(struct page *pages[], int nr_pages)
+{
+ int i;
+ struct page *page;
+ struct page *prev_page = NULL;
+ struct page *first_page = NULL;
+
+ for (i = 0; i < nr_pages; i++) {
+ page = pages[i];
+
+ INIT_LIST_HEAD(&page->lru);
+ if (i == 0) {
+ SetPagePrivate(page);
+ set_page_private(page, 0);
+ first_page = page;
+ }
+
+ if (i == 1)
+ set_page_private(first_page, (unsigned long)page);
+ if (i >= 1)
+ set_page_private(page, (unsigned long)first_page);
+ if (i >= 2)
+ list_add(&page->lru, &prev_page->lru);
+ if (i == nr_pages - 1)
+ SetPagePrivate2(page);
+
+ prev_page = page;
+ }
+}
+
/*
* Allocate a zspage for the given size class
*/
static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
{
- int i, error;
- struct page *first_page = NULL, *uninitialized_var(prev_page);
+ int i;
+ struct page *first_page = NULL;
+ struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
/*
* Allocate individual pages and link them together as:
@@ -1046,43 +1079,23 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
* (i.e. no other sub-page has this flag set) and PG_private_2 to
* identify the last page.
*/
- error = -ENOMEM;
for (i = 0; i < class->pages_per_zspage; i++) {
struct page *page;
page = alloc_page(flags);
- if (!page)
- goto cleanup;
-
- INIT_LIST_HEAD(&page->lru);
- if (i == 0) { /* first page */
- page->freelist = NULL;
- SetPagePrivate(page);
- set_page_private(page, 0);
- first_page = page;
- set_zspage_inuse(page, 0);
+ if (!page) {
+ while (--i >= 0)
+ __free_page(pages[i]);
+ return NULL;
}
- if (i == 1)
- set_page_private(first_page, (unsigned long)page);
- if (i >= 1)
- set_page_private(page, (unsigned long)first_page);
- if (i >= 2)
- list_add(&page->lru, &prev_page->lru);
- if (i == class->pages_per_zspage - 1) /* last page */
- SetPagePrivate2(page);
- prev_page = page;
+
+ pages[i] = page;
}
+ create_page_chain(pages, class->pages_per_zspage);
+ first_page = pages[0];
init_zspage(class, first_page);
- error = 0; /* Success */
-
-cleanup:
- if (unlikely(error) && first_page) {
- free_zspage(first_page);
- first_page = NULL;
- }
-
return first_page;
}
@@ -1422,7 +1435,6 @@ static unsigned long obj_malloc(struct size_class *class,
unsigned long m_offset;
void *vaddr;
- handle |= OBJ_ALLOCATED_TAG;
obj = get_freeobj(first_page);
objidx_to_page_and_offset(class, first_page, obj,
&m_page, &m_offset);
@@ -1432,10 +1444,10 @@ static unsigned long obj_malloc(struct size_class *class,
set_freeobj(first_page, link->next >> OBJ_ALLOCATED_TAG);
if (!class->huge)
/* record handle in the header of allocated chunk */
- link->handle = handle;
+ link->handle = handle | OBJ_ALLOCATED_TAG;
else
/* record handle in first_page->private */
- set_page_private(first_page, handle);
+ set_page_private(first_page, handle | OBJ_ALLOCATED_TAG);
kunmap_atomic(vaddr);
mod_zspage_inuse(first_page, 1);
zs_stat_inc(class, OBJ_USED, 1);
--
1.9.1
Zsmalloc is ready for page migration so zram can use __GFP_MOVABLE
from now on.
I did test to see how it helps to make higher order pages.
Test scenario is as follows.
KVM guest, 1G memory, ext4 formated zram block device,
for i in `seq 1 8`;
do
dd if=/dev/vda1 of=mnt/test$i.txt bs=128M count=1 &
done
wait `pidof dd`
for i in `seq 1 2 8`;
do
rm -rf mnt/test$i.txt
done
fstrim -v mnt
echo "init"
cat /proc/buddyinfo
echo "compaction"
echo 1 > /proc/sys/vm/compact_memory
cat /proc/buddyinfo
old:
init
Node 0, zone DMA 208 120 51 41 11 0 0 0 0 0 0
Node 0, zone DMA32 16380 13777 9184 3805 789 54 3 0 0 0 0
compaction
Node 0, zone DMA 132 82 40 39 16 2 1 0 0 0 0
Node 0, zone DMA32 5219 5526 4969 3455 1831 677 139 15 0 0 0
new:
init
Node 0, zone DMA 379 115 97 19 2 0 0 0 0 0 0
Node 0, zone DMA32 18891 16774 10862 3947 637 21 0 0 0 0 0
compaction 1
Node 0, zone DMA 214 66 87 29 10 3 0 0 0 0 0
Node 0, zone DMA32 1612 3139 3154 2469 1745 990 384 94 7 0 0
As you can see, compaction made so many high-order pages. Yay!
Reviewed-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/block/zram/zram_drv.c | 3 ++-
mm/zsmalloc.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 46055dbc4095..da8298b9f05e 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -517,7 +517,8 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, u64 disksize)
goto out_error;
}
- meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO | __GFP_HIGHMEM);
+ meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO|__GFP_HIGHMEM
+ |__GFP_MOVABLE);
if (!meta->mem_pool) {
pr_err("Error creating memory pool\n");
goto out_error;
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 35bafa0bc3f1..8557da6dbaf2 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -308,7 +308,7 @@ static void destroy_handle_cache(struct zs_pool *pool)
static unsigned long alloc_handle(struct zs_pool *pool)
{
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
- pool->flags & ~__GFP_HIGHMEM);
+ pool->flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
}
static void free_handle(struct zs_pool *pool, unsigned long handle)
--
1.9.1
This patch enables tail page migration of zspage.
In this point, I tested zsmalloc regression with micro-benchmark
which does zs_malloc/map/unmap/zs_free for all size class
in every CPU(my system is 12) during 20 sec.
It shows 1% regression which is really small when we consider
the benefit of this feature and realworkload overhead(i.e.,
most overhead comes from compression).
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 131 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 115 insertions(+), 16 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9b4b03d8f993..35bafa0bc3f1 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -551,6 +551,19 @@ static void set_zspage_mapping(struct page *first_page,
m->class = class_idx;
}
+static bool check_isolated_page(struct page *first_page)
+{
+ struct page *cursor;
+
+ for (cursor = first_page; cursor != NULL; cursor =
+ get_next_page(cursor)) {
+ if (PageIsolated(cursor))
+ return true;
+ }
+
+ return false;
+}
+
/*
* zsmalloc divides the pool into various size classes where each
* class maintains a list of zspages where each zspage is divided
@@ -1052,6 +1065,44 @@ void lock_zspage(struct page *first_page)
} while ((cursor = get_next_page(cursor)) != NULL);
}
+int trylock_zspage(struct page *first_page, struct page *locked_page)
+{
+ struct page *cursor, *fail;
+
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+ for (cursor = first_page; cursor != NULL; cursor =
+ get_next_page(cursor)) {
+ if (cursor != locked_page) {
+ if (!trylock_page(cursor)) {
+ fail = cursor;
+ goto unlock;
+ }
+ }
+ }
+
+ return 1;
+unlock:
+ for (cursor = first_page; cursor != fail; cursor =
+ get_next_page(cursor)) {
+ if (cursor != locked_page)
+ unlock_page(cursor);
+ }
+
+ return 0;
+}
+
+void unlock_zspage(struct page *first_page, struct page *locked_page)
+{
+ struct page *cursor = first_page;
+
+ for (; cursor != NULL; cursor = get_next_page(cursor)) {
+ VM_BUG_ON_PAGE(!PageLocked(cursor), cursor);
+ if (cursor != locked_page)
+ unlock_page(cursor);
+ };
+}
+
static void free_zspage(struct zs_pool *pool, struct page *first_page)
{
struct page *nextp, *tmp;
@@ -1090,16 +1141,17 @@ static void init_zspage(struct size_class *class, struct page *first_page,
first_page->freelist = NULL;
INIT_LIST_HEAD(&first_page->lru);
set_zspage_inuse(first_page, 0);
- BUG_ON(!trylock_page(first_page));
- first_page->mapping = mapping;
- __SetPageMovable(first_page);
- unlock_page(first_page);
while (page) {
struct page *next_page;
struct link_free *link;
void *vaddr;
+ BUG_ON(!trylock_page(page));
+ page->mapping = mapping;
+ __SetPageMovable(page);
+ unlock_page(page);
+
vaddr = kmap_atomic(page);
link = (struct link_free *)vaddr + off / sizeof(*link);
@@ -1850,6 +1902,7 @@ static enum fullness_group putback_zspage(struct size_class *class,
VM_BUG_ON_PAGE(!list_empty(&first_page->lru), first_page);
VM_BUG_ON_PAGE(ZsPageIsolate(first_page), first_page);
+ VM_BUG_ON_PAGE(check_isolated_page(first_page), first_page);
fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
@@ -1956,6 +2009,12 @@ static struct page *isolate_source_page(struct size_class *class)
if (!page)
continue;
+ /* To prevent race between object and page migration */
+ if (!trylock_zspage(page, NULL)) {
+ page = NULL;
+ continue;
+ }
+
remove_zspage(class, i, page);
inuse = get_zspage_inuse(page);
@@ -1964,6 +2023,7 @@ static struct page *isolate_source_page(struct size_class *class)
if (inuse != freezed) {
unfreeze_zspage(class, page, freezed);
putback_zspage(class, page);
+ unlock_zspage(page, NULL);
page = NULL;
continue;
}
@@ -1995,6 +2055,12 @@ static struct page *isolate_target_page(struct size_class *class)
if (!page)
continue;
+ /* To prevent race between object and page migration */
+ if (!trylock_zspage(page, NULL)) {
+ page = NULL;
+ continue;
+ }
+
remove_zspage(class, i, page);
inuse = get_zspage_inuse(page);
@@ -2003,6 +2069,7 @@ static struct page *isolate_target_page(struct size_class *class)
if (inuse != freezed) {
unfreeze_zspage(class, page, freezed);
putback_zspage(class, page);
+ unlock_zspage(page, NULL);
page = NULL;
continue;
}
@@ -2076,11 +2143,13 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
putback_zspage(class, dst_page);
unfreeze_zspage(class, dst_page,
class->objs_per_zspage);
+ unlock_zspage(dst_page, NULL);
spin_unlock(&class->lock);
dst_page = NULL;
}
if (zspage_empty(class, src_page)) {
+ unlock_zspage(src_page, NULL);
free_zspage(pool, src_page);
spin_lock(&class->lock);
zs_stat_dec(class, OBJ_ALLOCATED,
@@ -2103,12 +2172,14 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
putback_zspage(class, src_page);
unfreeze_zspage(class, src_page,
class->objs_per_zspage);
+ unlock_zspage(src_page, NULL);
}
if (dst_page) {
putback_zspage(class, dst_page);
unfreeze_zspage(class, dst_page,
class->objs_per_zspage);
+ unlock_zspage(dst_page, NULL);
}
spin_unlock(&class->lock);
@@ -2211,10 +2282,11 @@ bool zs_page_isolate(struct page *page, isolate_mode_t mode)
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageIsolated(page), page);
/*
- * In this implementation, it allows only first page migration.
+ * first_page will not be destroyed by PG_lock of @page but it could
+ * be migrated out. For prohibiting it, zs_page_migrate calls
+ * trylock_zspage so it closes the race.
*/
- VM_BUG_ON_PAGE(!is_first_page(page), page);
- first_page = page;
+ first_page = get_first_page(page);
/*
* Without class lock, fullness is meaningless while constant
@@ -2228,9 +2300,18 @@ bool zs_page_isolate(struct page *page, isolate_mode_t mode)
if (!spin_trylock(&class->lock))
return false;
+ if (check_isolated_page(first_page))
+ goto skip_isolate;
+
+ /*
+ * If this is first time isolation for zspage, isolate zspage from
+ * size_class to prevent further allocations from the zspage.
+ */
get_zspage_mapping(first_page, &class_idx, &fullness);
remove_zspage(class, fullness, first_page);
SetZsPageIsolate(first_page);
+
+skip_isolate:
SetPageIsolated(page);
spin_unlock(&class->lock);
@@ -2253,7 +2334,7 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
VM_BUG_ON_PAGE(!PageMovable(page), page);
VM_BUG_ON_PAGE(!PageIsolated(page), page);
- first_page = page;
+ first_page = get_first_page(page);
get_zspage_mapping(first_page, &class_idx, &fullness);
pool = page->mapping->private_data;
class = pool->size_class[class_idx];
@@ -2268,6 +2349,13 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
if (get_zspage_inuse(first_page) == 0)
goto out_class_unlock;
+ /*
+ * It prevents first_page migration during tail page opeartion for
+ * get_first_page's stability.
+ */
+ if (!trylock_zspage(first_page, page))
+ goto out_class_unlock;
+
freezed = freeze_zspage(class, first_page);
if (freezed != get_zspage_inuse(first_page))
goto out_unfreeze;
@@ -2306,21 +2394,26 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
kunmap_atomic(addr);
replace_sub_page(class, first_page, newpage, page);
- first_page = newpage;
+ first_page = get_first_page(newpage);
get_page(newpage);
VM_BUG_ON_PAGE(get_fullness_group(class, first_page) ==
ZS_EMPTY, first_page);
- ClearZsPageIsolate(first_page);
- putback_zspage(class, first_page);
+ if (!check_isolated_page(first_page)) {
+ INIT_LIST_HEAD(&first_page->lru);
+ ClearZsPageIsolate(first_page);
+ putback_zspage(class, first_page);
+ }
+
/* Migration complete. Free old page */
reset_page(page);
ClearPageIsolated(page);
put_page(page);
ret = MIGRATEPAGE_SUCCESS;
-
+ page = newpage;
out_unfreeze:
unfreeze_zspage(class, first_page, freezed);
+ unlock_zspage(first_page, page);
out_class_unlock:
spin_unlock(&class->lock);
@@ -2338,7 +2431,7 @@ void zs_page_putback(struct page *page)
VM_BUG_ON_PAGE(!PageMovable(page), page);
VM_BUG_ON_PAGE(!PageIsolated(page), page);
- first_page = page;
+ first_page = get_first_page(page);
get_zspage_mapping(first_page, &class_idx, &fullness);
pool = page->mapping->private_data;
class = pool->size_class[class_idx];
@@ -2348,11 +2441,17 @@ void zs_page_putback(struct page *page)
* in zs_free will wait the page lock of @page without
* destroying of zspage.
*/
- INIT_LIST_HEAD(&first_page->lru);
spin_lock(&class->lock);
ClearPageIsolated(page);
- ClearZsPageIsolate(first_page);
- putback_zspage(class, first_page);
+ /*
+ * putback zspage to right list if this is last isolated page
+ * putback in the zspage.
+ */
+ if (!check_isolated_page(first_page)) {
+ INIT_LIST_HEAD(&first_page->lru);
+ ClearZsPageIsolate(first_page);
+ putback_zspage(class, first_page);
+ }
spin_unlock(&class->lock);
}
--
1.9.1
This patch introduces run-time migration feature for zspage.
To begin with, it supports only head page migration for
easy review(later patches will support tail page migration).
For migration, it supports three functions
* zs_page_isolate
It isolates a zspage which includes a subpage VM want to migrate
from class so anyone cannot allocate new object from the zspage.
IOW, allocation freeze
* zs_page_migrate
First of all, it freezes zspage to prevent zspage destrunction
so anyone cannot free object. Then, It copies content from oldpage
to newpage and create new page-chain with new page.
If it was successful, drop the refcount of old page to free
and putback new zspage to right data structure of zsmalloc.
Lastly, unfreeze zspages so we allows object allocation/free
from now on.
* zs_page_putback
It returns isolated zspage to right fullness_group list
if it fails to migrate a page.
NOTE: A hurdle to support migration is that destroying zspage
while migration is going on. Once a zspage is isolated,
anyone cannot allocate object from the zspage but can deallocate
object freely so a zspage could be destroyed until all of objects
in zspage are freezed to prevent deallocation. The problem is
large window betwwen zs_page_isolate and freeze_zspage
in zs_page_migrate so the zspage could be destroyed.
A easy approach to solve the problem is that object freezing
in zs_page_isolate but it has a drawback that any object cannot
be deallocated until migration fails after isolation. However,
There is large time gab between isolation and migration so
any object freeing in other CPU should spin by pin_tag which
would cause big latency. So, this patch introduces lock_zspage
which holds PG_lock of all pages in a zspage right before
freeing the zspage. VM migration locks the page, too right
before calling ->migratepage so such race doesn't exist any more.
Signed-off-by: Minchan Kim <[email protected]>
---
include/uapi/linux/magic.h | 1 +
mm/zsmalloc.c | 329 +++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 317 insertions(+), 13 deletions(-)
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index e1fbe72c39c0..93b1affe4801 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -79,5 +79,6 @@
#define NSFS_MAGIC 0x6e736673
#define BPF_FS_MAGIC 0xcafe4a11
#define BALLOON_KVM_MAGIC 0x13661366
+#define ZSMALLOC_MAGIC 0x58295829
#endif /* __LINUX_MAGIC_H__ */
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 990d752fb65b..b3b31fdfea0f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -56,6 +56,8 @@
#include <linux/debugfs.h>
#include <linux/zsmalloc.h>
#include <linux/zpool.h>
+#include <linux/mount.h>
+#include <linux/migrate.h>
/*
* This must be power of 2 and greater than of equal to sizeof(link_free).
@@ -182,6 +184,8 @@ struct zs_size_stat {
static struct dentry *zs_stat_root;
#endif
+static struct vfsmount *zsmalloc_mnt;
+
/*
* number of size_classes
*/
@@ -263,6 +267,7 @@ struct zs_pool {
#ifdef CONFIG_ZSMALLOC_STAT
struct dentry *stat_dentry;
#endif
+ struct inode *inode;
};
struct zs_meta {
@@ -412,6 +417,29 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
}
+/*
+ * Indicate that whether zspage is isolated for page migration.
+ * Protected by size_class lock
+ */
+static void SetZsPageIsolate(struct page *first_page)
+{
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+ SetPageUptodate(first_page);
+}
+
+static int ZsPageIsolate(struct page *first_page)
+{
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+ return PageUptodate(first_page);
+}
+
+static void ClearZsPageIsolate(struct page *first_page)
+{
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+ ClearPageUptodate(first_page);
+}
+
static int get_zspage_inuse(struct page *first_page)
{
struct zs_meta *m;
@@ -783,8 +811,11 @@ static enum fullness_group fix_fullness_group(struct size_class *class,
if (newfg == currfg)
goto out;
- remove_zspage(class, currfg, first_page);
- insert_zspage(class, newfg, first_page);
+ /* Later, putback will insert page to right list */
+ if (!ZsPageIsolate(first_page)) {
+ remove_zspage(class, currfg, first_page);
+ insert_zspage(class, newfg, first_page);
+ }
set_zspage_mapping(first_page, class_idx, newfg);
out:
@@ -950,13 +981,31 @@ static void unpin_tag(unsigned long handle)
static void reset_page(struct page *page)
{
+ __ClearPageMovable(page);
clear_bit(PG_private, &page->flags);
clear_bit(PG_private_2, &page->flags);
set_page_private(page, 0);
page->freelist = NULL;
+ page->mapping = NULL;
page_mapcount_reset(page);
}
+/**
+ * lock_zspage - lock all pages in the zspage
+ * @first_page: head page of the zspage
+ *
+ * To prevent destroy during migration, zspage freeing should
+ * hold locks of all pages in a zspage
+ */
+void lock_zspage(struct page *first_page)
+{
+ struct page *cursor = first_page;
+
+ do {
+ while (!trylock_page(cursor));
+ } while ((cursor = get_next_page(cursor)) != NULL);
+}
+
static void free_zspage(struct zs_pool *pool, struct page *first_page)
{
struct page *nextp, *tmp, *head_extra;
@@ -964,26 +1013,31 @@ static void free_zspage(struct zs_pool *pool, struct page *first_page)
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page);
+ lock_zspage(first_page);
head_extra = (struct page *)page_private(first_page);
- reset_page(first_page);
- __free_page(first_page);
-
/* zspage with only 1 system page */
if (!head_extra)
- return;
+ goto out;
list_for_each_entry_safe(nextp, tmp, &head_extra->lru, lru) {
list_del(&nextp->lru);
reset_page(nextp);
+ unlock_page(nextp);
__free_page(nextp);
}
reset_page(head_extra);
+ unlock_page(head_extra);
__free_page(head_extra);
+out:
+ reset_page(first_page);
+ unlock_page(first_page);
+ __free_page(first_page);
}
/* Initialize a newly allocated zspage */
-static void init_zspage(struct size_class *class, struct page *first_page)
+static void init_zspage(struct size_class *class, struct page *first_page,
+ struct address_space *mapping)
{
int freeobj = 1;
unsigned long off = 0;
@@ -992,6 +1046,10 @@ static void init_zspage(struct size_class *class, struct page *first_page)
first_page->freelist = NULL;
INIT_LIST_HEAD(&first_page->lru);
set_zspage_inuse(first_page, 0);
+ BUG_ON(!trylock_page(first_page));
+ first_page->mapping = mapping;
+ __SetPageMovable(first_page);
+ unlock_page(first_page);
while (page) {
struct page *next_page;
@@ -1066,10 +1124,46 @@ static void create_page_chain(struct page *pages[], int nr_pages)
}
}
+static void replace_sub_page(struct size_class *class, struct page *first_page,
+ struct page *newpage, struct page *oldpage)
+{
+ struct page *page;
+ struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE] = {NULL,};
+ int idx = 0;
+
+ page = first_page;
+ do {
+ if (page == oldpage)
+ pages[idx] = newpage;
+ else
+ pages[idx] = page;
+ idx++;
+ } while ((page = get_next_page(page)) != NULL);
+
+ create_page_chain(pages, class->pages_per_zspage);
+
+ if (is_first_page(oldpage)) {
+ enum fullness_group fg;
+ int class_idx;
+
+ SetZsPageIsolate(newpage);
+ get_zspage_mapping(oldpage, &class_idx, &fg);
+ set_zspage_mapping(newpage, class_idx, fg);
+ set_freeobj(newpage, get_freeobj(oldpage));
+ set_zspage_inuse(newpage, get_zspage_inuse(oldpage));
+ if (class->huge)
+ set_page_private(newpage, page_private(oldpage));
+ }
+
+ newpage->mapping = oldpage->mapping;
+ __SetPageMovable(newpage);
+}
+
/*
* Allocate a zspage for the given size class
*/
-static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
+static struct page *alloc_zspage(struct zs_pool *pool,
+ struct size_class *class)
{
int i;
struct page *first_page = NULL;
@@ -1089,7 +1183,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
for (i = 0; i < class->pages_per_zspage; i++) {
struct page *page;
- page = alloc_page(flags);
+ page = alloc_page(pool->flags);
if (!page) {
while (--i >= 0)
__free_page(pages[i]);
@@ -1101,7 +1195,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
create_page_chain(pages, class->pages_per_zspage);
first_page = pages[0];
- init_zspage(class, first_page);
+ init_zspage(class, first_page, pool->inode->i_mapping);
return first_page;
}
@@ -1500,7 +1594,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
if (!first_page) {
spin_unlock(&class->lock);
- first_page = alloc_zspage(class, pool->flags);
+ first_page = alloc_zspage(pool, class);
if (unlikely(!first_page)) {
free_handle(pool, handle);
return 0;
@@ -1560,6 +1654,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
if (unlikely(!handle))
return;
+ /* Once handle is pinned, page|object migration cannot work */
pin_tag(handle);
obj = handle_to_obj(handle);
obj_to_location(obj, &f_page, &f_objidx);
@@ -1715,6 +1810,9 @@ static enum fullness_group putback_zspage(struct size_class *class,
{
enum fullness_group fullness;
+ VM_BUG_ON_PAGE(!list_empty(&first_page->lru), first_page);
+ VM_BUG_ON_PAGE(ZsPageIsolate(first_page), first_page);
+
fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
@@ -2060,6 +2158,173 @@ static int zs_register_shrinker(struct zs_pool *pool)
return register_shrinker(&pool->shrinker);
}
+bool zs_page_isolate(struct page *page, isolate_mode_t mode)
+{
+ struct zs_pool *pool;
+ struct size_class *class;
+ int class_idx;
+ enum fullness_group fullness;
+ struct page *first_page;
+
+ /*
+ * The page is locked so it couldn't be destroyed.
+ * For detail, look at lock_zspage in free_zspage.
+ */
+ VM_BUG_ON_PAGE(!PageLocked(page), page);
+ VM_BUG_ON_PAGE(PageIsolated(page), page);
+ /*
+ * In this implementation, it allows only first page migration.
+ */
+ VM_BUG_ON_PAGE(!is_first_page(page), page);
+ first_page = page;
+
+ /*
+ * Without class lock, fullness is meaningless while constant
+ * class_idx is okay. We will get it under class lock at below,
+ * again.
+ */
+ get_zspage_mapping(first_page, &class_idx, &fullness);
+ pool = page->mapping->private_data;
+ class = pool->size_class[class_idx];
+
+ if (!spin_trylock(&class->lock))
+ return false;
+
+ get_zspage_mapping(first_page, &class_idx, &fullness);
+ remove_zspage(class, fullness, first_page);
+ SetZsPageIsolate(first_page);
+ SetPageIsolated(page);
+ spin_unlock(&class->lock);
+
+ return true;
+}
+
+int zs_page_migrate(struct address_space *mapping, struct page *newpage,
+ struct page *page, enum migrate_mode mode)
+{
+ struct zs_pool *pool;
+ struct size_class *class;
+ int class_idx;
+ enum fullness_group fullness;
+ struct page *first_page;
+ void *s_addr, *d_addr, *addr;
+ int ret = -EBUSY;
+ int offset = 0;
+ int freezed = 0;
+
+ VM_BUG_ON_PAGE(!PageMovable(page), page);
+ VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+ first_page = page;
+ get_zspage_mapping(first_page, &class_idx, &fullness);
+ pool = page->mapping->private_data;
+ class = pool->size_class[class_idx];
+
+ /*
+ * Get stable fullness under class->lock
+ */
+ if (!spin_trylock(&class->lock))
+ return ret;
+
+ get_zspage_mapping(first_page, &class_idx, &fullness);
+ if (get_zspage_inuse(first_page) == 0)
+ goto out_class_unlock;
+
+ freezed = freeze_zspage(class, first_page);
+ if (freezed != get_zspage_inuse(first_page))
+ goto out_unfreeze;
+
+ /* copy contents from page to newpage */
+ s_addr = kmap_atomic(page);
+ d_addr = kmap_atomic(newpage);
+ memcpy(d_addr, s_addr, PAGE_SIZE);
+ kunmap_atomic(d_addr);
+ kunmap_atomic(s_addr);
+
+ if (!is_first_page(page))
+ offset = page->index;
+
+ addr = kmap_atomic(page);
+ do {
+ unsigned long handle;
+ unsigned long head;
+ unsigned long new_obj, old_obj;
+ unsigned long obj_idx;
+ struct page *dummy;
+
+ head = obj_to_head(class, page, addr + offset);
+ if (head & OBJ_ALLOCATED_TAG) {
+ handle = head & ~OBJ_ALLOCATED_TAG;
+ if (!testpin_tag(handle))
+ BUG();
+
+ old_obj = handle_to_obj(handle);
+ obj_to_location(old_obj, &dummy, &obj_idx);
+ new_obj = location_to_obj(newpage, obj_idx);
+ new_obj |= BIT(HANDLE_PIN_BIT);
+ record_obj(handle, new_obj);
+ }
+ offset += class->size;
+ } while (offset < PAGE_SIZE);
+ kunmap_atomic(addr);
+
+ replace_sub_page(class, first_page, newpage, page);
+ first_page = newpage;
+ get_page(newpage);
+ VM_BUG_ON_PAGE(get_fullness_group(class, first_page) ==
+ ZS_EMPTY, first_page);
+ ClearZsPageIsolate(first_page);
+ putback_zspage(class, first_page);
+
+ /* Migration complete. Free old page */
+ reset_page(page);
+ ClearPageIsolated(page);
+ put_page(page);
+ ret = MIGRATEPAGE_SUCCESS;
+
+out_unfreeze:
+ unfreeze_zspage(class, first_page, freezed);
+out_class_unlock:
+ spin_unlock(&class->lock);
+
+ return ret;
+}
+
+void zs_page_putback(struct page *page)
+{
+ struct zs_pool *pool;
+ struct size_class *class;
+ int class_idx;
+ enum fullness_group fullness;
+ struct page *first_page;
+
+ VM_BUG_ON_PAGE(!PageMovable(page), page);
+ VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+ first_page = page;
+ get_zspage_mapping(first_page, &class_idx, &fullness);
+ pool = page->mapping->private_data;
+ class = pool->size_class[class_idx];
+
+ /*
+ * If there is race betwwen zs_free and here, free_zspage
+ * in zs_free will wait the page lock of @page without
+ * destroying of zspage.
+ */
+ INIT_LIST_HEAD(&first_page->lru);
+ spin_lock(&class->lock);
+ ClearPageIsolated(page);
+ ClearZsPageIsolate(first_page);
+ putback_zspage(class, first_page);
+ spin_unlock(&class->lock);
+}
+
+const struct address_space_operations zsmalloc_aops = {
+ .isolate_page = zs_page_isolate,
+ .migratepage = zs_page_migrate,
+ .putback_page = zs_page_putback,
+};
+
/**
* zs_create_pool - Creates an allocation pool to work from.
* @flags: allocation flags used to allocate pool metadata
@@ -2146,6 +2411,15 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t flags)
if (zs_pool_stat_create(pool, name))
goto err;
+ pool->inode = alloc_anon_inode(zsmalloc_mnt->mnt_sb);
+ if (IS_ERR(pool->inode)) {
+ pool->inode = NULL;
+ goto err;
+ }
+
+ pool->inode->i_mapping->a_ops = &zsmalloc_aops;
+ pool->inode->i_mapping->private_data = pool;
+
/*
* Not critical, we still can use the pool
* and user can trigger compaction manually.
@@ -2165,6 +2439,8 @@ void zs_destroy_pool(struct zs_pool *pool)
int i;
zs_unregister_shrinker(pool);
+ if (pool->inode)
+ iput(pool->inode);
zs_pool_stat_destroy(pool);
for (i = 0; i < zs_size_classes; i++) {
@@ -2193,10 +2469,33 @@ void zs_destroy_pool(struct zs_pool *pool)
}
EXPORT_SYMBOL_GPL(zs_destroy_pool);
+static struct dentry *zs_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data)
+{
+ static const struct dentry_operations ops = {
+ .d_dname = simple_dname,
+ };
+
+ return mount_pseudo(fs_type, "zsmalloc:", NULL, &ops, ZSMALLOC_MAGIC);
+}
+
+static struct file_system_type zsmalloc_fs = {
+ .name = "zsmalloc",
+ .mount = zs_mount,
+ .kill_sb = kill_anon_super,
+};
+
static int __init zs_init(void)
{
- int ret = zs_register_cpu_notifier();
+ int ret;
+
+ zsmalloc_mnt = kern_mount(&zsmalloc_fs);
+ if (IS_ERR(zsmalloc_mnt)) {
+ ret = PTR_ERR(zsmalloc_mnt);
+ goto out;
+ }
+ ret = zs_register_cpu_notifier();
if (ret)
goto notifier_fail;
@@ -2219,6 +2518,7 @@ static int __init zs_init(void)
pr_err("zs stat initialization failed\n");
goto stat_fail;
}
+
return 0;
stat_fail:
@@ -2227,7 +2527,8 @@ static int __init zs_init(void)
#endif
notifier_fail:
zs_unregister_cpu_notifier();
-
+ kern_unmount(zsmalloc_mnt);
+out:
return ret;
}
@@ -2238,6 +2539,8 @@ static void __exit zs_exit(void)
#endif
zs_unregister_cpu_notifier();
+ kern_unmount(zsmalloc_mnt);
+
zs_stat_exit();
}
--
1.9.1
We have allowed migration for only LRU pages until now and it was
enough to make high-order pages. But recently, embedded system(e.g.,
webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
so we have seen several reports about troubles of small high-order
allocation. For fixing the problem, there were several efforts
(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
reserved memory, vmalloc and so on) but if there are lots of
non-movable pages in system, their solutions are void in the long run.
So, this patch is to support facility to change non-movable pages
with movable. For the feature, this patch introduces functions related
to migration to address_space_operations as well as some page flags.
Basically, this patch supports two page-flags and two functions related
to page migration. The flag and page->mapping stability are protected
by PG_lock.
PG_movable
PG_isolated
bool (*isolate_page) (struct page *, isolate_mode_t);
void (*putback_page) (struct page *);
Duty of subsystem want to make their pages as migratable are
as follows:
1. It should register address_space to page->mapping then mark
the page as PG_movable via __SetPageMovable.
2. It should mark the page as PG_isolated via SetPageIsolated
if isolation is sucessful and return true.
3. If migration is successful, it should clear PG_isolated and
PG_movable of the page for free preparation then release the
reference of the page to free.
4. If migration fails, putback function of subsystem should
clear PG_isolated via ClearPageIsolated.
Cc: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Gioh Kim <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
Documentation/filesystems/Locking | 4 +
Documentation/filesystems/vfs.txt | 5 ++
fs/proc/page.c | 3 +
include/linux/fs.h | 2 +
include/linux/migrate.h | 2 +
include/linux/page-flags.h | 29 ++++++++
include/uapi/linux/kernel-page-flags.h | 1 +
mm/compaction.c | 14 +++-
mm/migrate.c | 132 +++++++++++++++++++++++++++++----
9 files changed, 177 insertions(+), 15 deletions(-)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 619af9bfdcb3..0bb79560abb3 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -195,7 +195,9 @@ unlocks and drops the reference.
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
+ bool (*isolate_page) (struct page *, isolate_mode_t);
int (*migratepage)(struct address_space *, struct page *, struct page *);
+ void (*putback_page) (struct page *);
int (*launder_page)(struct page *);
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage: yes
releasepage: yes
freepage: yes
direct_IO:
+isolate_page: yes
migratepage: yes (both)
+putback_page: yes
launder_page: yes
is_partially_uptodate: yes
error_remove_page: yes
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index b02a7d598258..4c1b6c3b4bc8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -592,9 +592,14 @@ struct address_space_operations {
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
+ /* isolate a page for migration */
+ bool (*isolate_page) (struct page *, isolate_mode_t);
/* migrate the contents of a page to the specified target */
int (*migratepage) (struct page *, struct page *);
+ /* put the page back to right list */
+ void (*putback_page) (struct page *);
int (*launder_page) (struct page *);
+
int (*is_partially_uptodate) (struct page *, unsigned long,
unsigned long);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 712f1b9992cc..e2066e73a9b8 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page)
if (page_is_idle(page))
u |= 1 << KPF_IDLE;
+ if (PageMovable(page))
+ u |= 1 << KPF_MOVABLE;
+
u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 14a97194b34b..b7ef2e41fa4a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -401,6 +401,8 @@ struct address_space_operations {
*/
int (*migratepage) (struct address_space *,
struct page *, struct page *, enum migrate_mode);
+ bool (*isolate_page)(struct page *, isolate_mode_t);
+ void (*putback_page)(struct page *);
int (*launder_page) (struct page *);
int (*is_partially_uptodate) (struct page *, unsigned long,
unsigned long);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9b50325e4ddf..404fbfefeb33 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
+extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
+extern void putback_movable_page(struct page *page);
extern int migrate_prep(void);
extern int migrate_prep_local(void);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f4ed4f1b0c77..3885064641c4 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -129,6 +129,10 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */
PG_double_map = PG_private_2,
+
+ /* non-lru movable pages */
+ PG_movable = PG_reclaim,
+ PG_isolated = PG_owner_priv_1,
};
#ifndef __GENERATING_BOUNDS_H
@@ -614,6 +618,31 @@ static inline void __ClearPageBalloon(struct page *page)
atomic_set(&page->_mapcount, -1);
}
+#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
+
+static inline int PageMovable(struct page *page)
+{
+ return ((test_bit(PG_movable, &(page)->flags) &&
+ atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
+ || PageBalloon(page));
+}
+
+/*
+ * Caller should hold a PG_lock */
+static inline void __SetPageMovable(struct page *page)
+{
+ __set_bit(PG_movable, &page->flags);
+ atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
+}
+
+static inline void __ClearPageMovable(struct page *page)
+{
+ atomic_set(&page->_mapcount, -1);
+ __clear_bit(PG_movable, &(page)->flags);
+}
+
+PAGEFLAG(Isolated, isolated, PF_ANY);
+
/*
* If network-based swap is enabled, sl*b must keep track of whether pages
* were allocated from pfmemalloc reserves.
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index 5da5f8751ce7..a184fd2434fa 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -34,6 +34,7 @@
#define KPF_BALLOON 23
#define KPF_ZERO_PAGE 24
#define KPF_IDLE 25
+#define KPF_MOVABLE 26
#endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index ccf97b02b85f..7557aedddaee 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
/*
* Check may be lockless but that's ok as we recheck later.
- * It's possible to migrate LRU pages and balloon pages
+ * It's possible to migrate LRU and movable kernel pages.
* Skip any other type of page
*/
is_lru = PageLRU(page);
@@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
goto isolate_success;
}
}
+
+ if (unlikely(PageMovable(page)) &&
+ !PageIsolated(page)) {
+ if (locked) {
+ spin_unlock_irqrestore(&zone->lru_lock,
+ flags);
+ locked = false;
+ }
+
+ if (isolate_movable_page(page, isolate_mode))
+ goto isolate_success;
+ }
}
/*
diff --git a/mm/migrate.c b/mm/migrate.c
index b65c84267ce0..fc2842a15807 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -73,6 +73,75 @@ int migrate_prep_local(void)
return 0;
}
+bool isolate_movable_page(struct page *page, isolate_mode_t mode)
+{
+ bool ret = false;
+
+ /*
+ * Avoid burning cycles with pages that are yet under __free_pages(),
+ * or just got freed under us.
+ *
+ * In case we 'win' a race for a movable page being freed under us and
+ * raise its refcount preventing __free_pages() from doing its job
+ * the put_page() at the end of this block will take care of
+ * release this page, thus avoiding a nasty leakage.
+ */
+ if (unlikely(!get_page_unless_zero(page)))
+ goto out;
+
+ /*
+ * As movable pages are not isolated from LRU lists, concurrent
+ * compaction threads can race against page migration functions
+ * as well as race against the releasing a page.
+ *
+ * In order to avoid having an already isolated movable page
+ * being (wrongly) re-isolated while it is under migration,
+ * or to avoid attempting to isolate pages being released,
+ * lets be sure we have the page lock
+ * before proceeding with the movable page isolation steps.
+ */
+ if (unlikely(!trylock_page(page)))
+ goto out_putpage;
+
+ if (!PageMovable(page) || PageIsolated(page))
+ goto out_no_isolated;
+
+ ret = page->mapping->a_ops->isolate_page(page, mode);
+ if (!ret)
+ goto out_no_isolated;
+
+ WARN_ON_ONCE(!PageIsolated(page));
+ unlock_page(page);
+ return ret;
+
+out_no_isolated:
+ unlock_page(page);
+out_putpage:
+ put_page(page);
+out:
+ return ret;
+}
+
+void putback_movable_page(struct page *page)
+{
+ struct address_space *mapping;
+
+ /*
+ * 'lock_page()' stabilizes the page and prevents races against
+ * concurrent isolation threads attempting to re-isolate it.
+ */
+ lock_page(page);
+ mapping = page_mapping(page);
+ if (mapping) {
+ mapping->a_ops->putback_page(page);
+ WARN_ON_ONCE(PageIsolated(page));
+ }
+ unlock_page(page);
+ /* drop the extra ref count taken for movable page isolation */
+ put_page(page);
+}
+
+
/*
* Put previously isolated pages back onto the appropriate lists
* from where they were once taken off for compaction/migration.
@@ -96,6 +165,8 @@ void putback_movable_pages(struct list_head *l)
page_is_file_cache(page));
if (unlikely(isolated_balloon_page(page)))
balloon_page_putback(page);
+ else if (unlikely(PageIsolated(page)))
+ putback_movable_page(page);
else
putback_lru_page(page);
}
@@ -592,7 +663,7 @@ void migrate_page_copy(struct page *newpage, struct page *page)
***********************************************************/
/*
- * Common logic to directly migrate a single page suitable for
+ * Common logic to directly migrate a single LRU page suitable for
* pages that do not use PagePrivate/PagePrivate2.
*
* Pages are locked upon entry and exit.
@@ -755,24 +826,53 @@ static int move_to_new_page(struct page *newpage, struct page *page,
enum migrate_mode mode)
{
struct address_space *mapping;
- int rc;
+ int rc = -EAGAIN;
+ bool isolated_lru_page;
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
mapping = page_mapping(page);
- if (!mapping)
- rc = migrate_page(mapping, newpage, page, mode);
- else if (mapping->a_ops->migratepage)
+ /*
+ * In case of non-lru page, it could be released after
+ * isolation step. In that case, we shouldn't try
+ * fallback migration which was designed for LRU pages.
+ *
+ * To identify such pages, we cannot use PageMovable
+ * because owner of the page can reset it. So intead,
+ * use PG_isolated bit.
+ */
+ isolated_lru_page = !PageIsolated(page);
+
+ if (likely(isolated_lru_page)) {
+ if (!mapping)
+ rc = migrate_page(mapping, newpage, page, mode);
+ else if (mapping->a_ops->migratepage)
+ /*
+ * Most pages have a mapping and most filesystems
+ * provide a migratepage callback. Anonymous pages
+ * are part of swap space which also has its own
+ * migratepage callback. This is the most common path
+ * for page migration.
+ */
+ rc = mapping->a_ops->migratepage(mapping, newpage,
+ page, mode);
+ else
+ rc = fallback_migrate_page(mapping, newpage,
+ page, mode);
+ } else {
/*
- * Most pages have a mapping and most filesystems provide a
- * migratepage callback. Anonymous pages are part of swap
- * space which also has its own migratepage callback. This
- * is the most common path for page migration.
+ * If mapping is NULL, it returns -EAGAIN so retrial
+ * of migration will see refcount as 1 and free it,
+ * finally.
*/
- rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
- else
- rc = fallback_migrate_page(mapping, newpage, page, mode);
+ if (mapping) {
+ rc = mapping->a_ops->migratepage(mapping, newpage,
+ page, mode);
+ WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
+ PageIsolated(page));
+ }
+ }
/*
* When successful, old pagecache page->mapping must be cleared before
@@ -1000,8 +1100,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
num_poisoned_pages_inc();
}
} else {
- if (rc != -EAGAIN)
- putback_lru_page(page);
+ if (rc != -EAGAIN) {
+ if (likely(!PageIsolated(page)))
+ putback_lru_page(page);
+ else
+ putback_movable_page(page);
+ }
if (put_new_page)
put_new_page(newpage, private);
else
--
1.9.1
This patch cleans up function parameter ordering to order
higher data structure first.
Reviewed-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 50 ++++++++++++++++++++++++++------------------------
1 file changed, 26 insertions(+), 24 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 6a7b9313ee8c..16556a6db628 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -569,7 +569,7 @@ static const struct file_operations zs_stat_size_ops = {
.release = single_release,
};
-static int zs_pool_stat_create(const char *name, struct zs_pool *pool)
+static int zs_pool_stat_create(struct zs_pool *pool, const char *name)
{
struct dentry *entry;
@@ -609,7 +609,7 @@ static void __exit zs_stat_exit(void)
{
}
-static inline int zs_pool_stat_create(const char *name, struct zs_pool *pool)
+static inline int zs_pool_stat_create(struct zs_pool *pool, const char *name)
{
return 0;
}
@@ -655,8 +655,9 @@ static enum fullness_group get_fullness_group(struct page *first_page)
* have. This functions inserts the given zspage into the freelist
* identified by <class, fullness_group>.
*/
-static void insert_zspage(struct page *first_page, struct size_class *class,
- enum fullness_group fullness)
+static void insert_zspage(struct size_class *class,
+ enum fullness_group fullness,
+ struct page *first_page)
{
struct page **head;
@@ -687,8 +688,9 @@ static void insert_zspage(struct page *first_page, struct size_class *class,
* This function removes the given zspage from the freelist identified
* by <class, fullness_group>.
*/
-static void remove_zspage(struct page *first_page, struct size_class *class,
- enum fullness_group fullness)
+static void remove_zspage(struct size_class *class,
+ enum fullness_group fullness,
+ struct page *first_page)
{
struct page **head;
@@ -730,8 +732,8 @@ static enum fullness_group fix_fullness_group(struct size_class *class,
if (newfg == currfg)
goto out;
- remove_zspage(first_page, class, currfg);
- insert_zspage(first_page, class, newfg);
+ remove_zspage(class, currfg, first_page);
+ insert_zspage(class, newfg, first_page);
set_zspage_mapping(first_page, class_idx, newfg);
out:
@@ -915,7 +917,7 @@ static void free_zspage(struct page *first_page)
}
/* Initialize a newly allocated zspage */
-static void init_zspage(struct page *first_page, struct size_class *class)
+static void init_zspage(struct size_class *class, struct page *first_page)
{
unsigned long off = 0;
struct page *page = first_page;
@@ -1003,7 +1005,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags)
prev_page = page;
}
- init_zspage(first_page, class);
+ init_zspage(class, first_page);
first_page->freelist = location_to_obj(first_page, 0);
/* Maximum number of objects we can store in this zspage */
@@ -1348,8 +1350,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
}
EXPORT_SYMBOL_GPL(zs_unmap_object);
-static unsigned long obj_malloc(struct page *first_page,
- struct size_class *class, unsigned long handle)
+static unsigned long obj_malloc(struct size_class *class,
+ struct page *first_page, unsigned long handle)
{
unsigned long obj;
struct link_free *link;
@@ -1426,7 +1428,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
class->size, class->pages_per_zspage));
}
- obj = obj_malloc(first_page, class, handle);
+ obj = obj_malloc(class, first_page, handle);
/* Now move the zspage to another fullness group, if required */
fix_fullness_group(class, first_page);
record_obj(handle, obj);
@@ -1499,8 +1501,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
}
EXPORT_SYMBOL_GPL(zs_free);
-static void zs_object_copy(unsigned long dst, unsigned long src,
- struct size_class *class)
+static void zs_object_copy(struct size_class *class, unsigned long dst,
+ unsigned long src)
{
struct page *s_page, *d_page;
unsigned long s_objidx, d_objidx;
@@ -1566,8 +1568,8 @@ static void zs_object_copy(unsigned long dst, unsigned long src,
* Find alloced object in zspage from index object and
* return handle.
*/
-static unsigned long find_alloced_obj(struct page *page, int index,
- struct size_class *class)
+static unsigned long find_alloced_obj(struct size_class *class,
+ struct page *page, int index)
{
unsigned long head;
int offset = 0;
@@ -1617,7 +1619,7 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
int ret = 0;
while (1) {
- handle = find_alloced_obj(s_page, index, class);
+ handle = find_alloced_obj(class, s_page, index);
if (!handle) {
s_page = get_next_page(s_page);
if (!s_page)
@@ -1634,8 +1636,8 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
}
used_obj = handle_to_obj(handle);
- free_obj = obj_malloc(d_page, class, handle);
- zs_object_copy(free_obj, used_obj, class);
+ free_obj = obj_malloc(class, d_page, handle);
+ zs_object_copy(class, free_obj, used_obj);
index++;
/*
* record_obj updates handle's value to free_obj and it will
@@ -1664,7 +1666,7 @@ static struct page *isolate_target_page(struct size_class *class)
for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) {
page = class->fullness_list[i];
if (page) {
- remove_zspage(page, class, i);
+ remove_zspage(class, i, page);
break;
}
}
@@ -1687,7 +1689,7 @@ static enum fullness_group putback_zspage(struct zs_pool *pool,
enum fullness_group fullness;
fullness = get_fullness_group(first_page);
- insert_zspage(first_page, class, fullness);
+ insert_zspage(class, fullness, first_page);
set_zspage_mapping(first_page, class->index, fullness);
if (fullness == ZS_EMPTY) {
@@ -1712,7 +1714,7 @@ static struct page *isolate_source_page(struct size_class *class)
if (!page)
continue;
- remove_zspage(page, class, i);
+ remove_zspage(class, i, page);
break;
}
@@ -1946,7 +1948,7 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t flags)
pool->flags = flags;
- if (zs_pool_stat_create(name, pool))
+ if (zs_pool_stat_create(pool, name))
goto err;
/*
--
1.9.1
For tail page migration, we shouldn't use page->lru which
was used for page chaining because VM will use it for own
purpose so that we need another field for chaining.
For chaining, singly linked list is enough and page->index
of tail page to point first object offset in the page could
be replaced in run-time calculation.
So, this patch change page->lru list for chaining with singly
linked list via page->freelist squeeze and introduces
get_first_obj_ofs to get first object offset in a page.
With that, it could maintain page chaining without using
page->lru.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 119 ++++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 78 insertions(+), 41 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b3b31fdfea0f..9b4b03d8f993 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -17,10 +17,7 @@
*
* Usage of struct page fields:
* page->private: points to the first component (0-order) page
- * page->index (union with page->freelist): offset of the first object
- * starting in this page.
- * page->lru: links together all component pages (except the first page)
- * of a zspage
+ * page->index (union with page->freelist): override by struct zs_meta
*
* For _first_ page only:
*
@@ -271,10 +268,19 @@ struct zs_pool {
};
struct zs_meta {
- unsigned long freeobj:FREEOBJ_BITS;
- unsigned long class:CLASS_BITS;
- unsigned long fullness:FULLNESS_BITS;
- unsigned long inuse:INUSE_BITS;
+ union {
+ /* first page */
+ struct {
+ unsigned long freeobj:FREEOBJ_BITS;
+ unsigned long class:CLASS_BITS;
+ unsigned long fullness:FULLNESS_BITS;
+ unsigned long inuse:INUSE_BITS;
+ };
+ /* tail pages */
+ struct {
+ struct page *next;
+ };
+ };
};
struct mapping_area {
@@ -491,6 +497,34 @@ static unsigned long get_freeobj(struct page *first_page)
return m->freeobj;
}
+static void set_next_page(struct page *page, struct page *next)
+{
+ struct zs_meta *m;
+
+ VM_BUG_ON_PAGE(is_first_page(page), page);
+
+ m = (struct zs_meta *)&page->index;
+ m->next = next;
+}
+
+static struct page *get_next_page(struct page *page)
+{
+ struct page *next;
+
+ if (is_last_page(page))
+ next = NULL;
+ else if (is_first_page(page))
+ next = (struct page *)page_private(page);
+ else {
+ struct zs_meta *m = (struct zs_meta *)&page->index;
+
+ VM_BUG_ON(!m->next);
+ next = m->next;
+ }
+
+ return next;
+}
+
static void get_zspage_mapping(struct page *first_page,
unsigned int *class_idx,
enum fullness_group *fullness)
@@ -871,18 +905,30 @@ static struct page *get_first_page(struct page *page)
return (struct page *)page_private(page);
}
-static struct page *get_next_page(struct page *page)
+int get_first_obj_ofs(struct size_class *class, struct page *first_page,
+ struct page *page)
{
- struct page *next;
+ int pos, bound;
+ int page_idx = 0;
+ int ofs = 0;
+ struct page *cursor = first_page;
- if (is_last_page(page))
- next = NULL;
- else if (is_first_page(page))
- next = (struct page *)page_private(page);
- else
- next = list_entry(page->lru.next, struct page, lru);
+ if (first_page == page)
+ goto out;
- return next;
+ while (page != cursor) {
+ page_idx++;
+ cursor = get_next_page(cursor);
+ }
+
+ bound = PAGE_SIZE * page_idx;
+ pos = (((class->objs_per_zspage * class->size) *
+ page_idx / class->pages_per_zspage) / class->size
+ ) * class->size;
+
+ ofs = (pos + class->size) % PAGE_SIZE;
+out:
+ return ofs;
}
static void objidx_to_page_and_offset(struct size_class *class,
@@ -1008,27 +1054,25 @@ void lock_zspage(struct page *first_page)
static void free_zspage(struct zs_pool *pool, struct page *first_page)
{
- struct page *nextp, *tmp, *head_extra;
+ struct page *nextp, *tmp;
VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
VM_BUG_ON_PAGE(get_zspage_inuse(first_page), first_page);
lock_zspage(first_page);
- head_extra = (struct page *)page_private(first_page);
+ nextp = (struct page *)page_private(first_page);
/* zspage with only 1 system page */
- if (!head_extra)
+ if (!nextp)
goto out;
- list_for_each_entry_safe(nextp, tmp, &head_extra->lru, lru) {
- list_del(&nextp->lru);
- reset_page(nextp);
- unlock_page(nextp);
- __free_page(nextp);
- }
- reset_page(head_extra);
- unlock_page(head_extra);
- __free_page(head_extra);
+ do {
+ tmp = nextp;
+ nextp = get_next_page(nextp);
+ reset_page(tmp);
+ unlock_page(tmp);
+ __free_page(tmp);
+ } while (nextp);
out:
reset_page(first_page);
unlock_page(first_page);
@@ -1056,13 +1100,6 @@ static void init_zspage(struct size_class *class, struct page *first_page,
struct link_free *link;
void *vaddr;
- /*
- * page->index stores offset of first object starting
- * in the page.
- */
- if (page != first_page)
- page->index = off;
-
vaddr = kmap_atomic(page);
link = (struct link_free *)vaddr + off / sizeof(*link);
@@ -1104,7 +1141,6 @@ static void create_page_chain(struct page *pages[], int nr_pages)
for (i = 0; i < nr_pages; i++) {
page = pages[i];
- INIT_LIST_HEAD(&page->lru);
if (i == 0) {
SetPagePrivate(page);
set_page_private(page, 0);
@@ -1113,10 +1149,12 @@ static void create_page_chain(struct page *pages[], int nr_pages)
if (i == 1)
set_page_private(first_page, (unsigned long)page);
- if (i >= 1)
+ if (i >= 1) {
+ set_next_page(page, NULL);
set_page_private(page, (unsigned long)first_page);
+ }
if (i >= 2)
- list_add(&page->lru, &prev_page->lru);
+ set_next_page(prev_page, page);
if (i == nr_pages - 1)
SetPagePrivate2(page);
@@ -2241,8 +2279,7 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
kunmap_atomic(d_addr);
kunmap_atomic(s_addr);
- if (!is_first_page(page))
- offset = page->index;
+ offset = get_first_obj_ofs(class, first_page, page);
addr = kmap_atomic(page);
do {
--
1.9.1
Now, VM has a feature to migrate non-lru movable pages so
balloon doesn't need custom migration hooks in migrate.c
and compact.c. Instead, this patch implements page->mapping
->{isolate|migrate|putback} functions.
With that, we could remove hooks for ballooning in general
migration functions and make balloon compaction simple.
Cc: [email protected]
Cc: Rafael Aquini <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Signed-off-by: Gioh Kim <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/virtio/virtio_balloon.c | 45 ++++++++++++++++-
include/linux/balloon_compaction.h | 47 ++++-------------
include/linux/page-flags.h | 52 +++++++++++--------
include/uapi/linux/magic.h | 1 +
mm/balloon_compaction.c | 101 ++++++++-----------------------------
mm/compaction.c | 7 ---
mm/migrate.c | 22 ++------
mm/vmscan.c | 2 +-
8 files changed, 113 insertions(+), 164 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7b6d74f0c72f..46a69b6a0c4f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -30,6 +30,7 @@
#include <linux/oom.h>
#include <linux/wait.h>
#include <linux/mm.h>
+#include <linux/mount.h>
/*
* Balloon device works in 4K page units. So each page is pointed to by
@@ -45,6 +46,10 @@ static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
module_param(oom_pages, int, S_IRUSR | S_IWUSR);
MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
+#ifdef CONFIG_BALLOON_COMPACTION
+static struct vfsmount *balloon_mnt;
+#endif
+
struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -482,10 +487,29 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
mutex_unlock(&vb->balloon_lock);
+ ClearPageIsolated(page);
put_page(page); /* balloon reference */
return MIGRATEPAGE_SUCCESS;
}
+
+static struct dentry *balloon_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data)
+{
+ static const struct dentry_operations ops = {
+ .d_dname = simple_dname,
+ };
+
+ return mount_pseudo(fs_type, "balloon-kvm:", NULL, &ops,
+ BALLOON_KVM_MAGIC);
+}
+
+static struct file_system_type balloon_fs = {
+ .name = "balloon-kvm",
+ .mount = balloon_mount,
+ .kill_sb = kill_anon_super,
+};
+
#endif /* CONFIG_BALLOON_COMPACTION */
static int virtballoon_probe(struct virtio_device *vdev)
@@ -516,12 +540,25 @@ static int virtballoon_probe(struct virtio_device *vdev)
balloon_devinfo_init(&vb->vb_dev_info);
#ifdef CONFIG_BALLOON_COMPACTION
+ balloon_mnt = kern_mount(&balloon_fs);
+ if (IS_ERR(balloon_mnt)) {
+ err = PTR_ERR(balloon_mnt);
+ goto out_free_vb;
+ }
+
vb->vb_dev_info.migratepage = virtballoon_migratepage;
+ vb->vb_dev_info.inode = alloc_anon_inode(balloon_mnt->mnt_sb);
+ if (IS_ERR(vb->vb_dev_info.inode)) {
+ err = PTR_ERR(vb->vb_dev_info.inode);
+ vb->vb_dev_info.inode = NULL;
+ goto out_unmount;
+ }
+ vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
#endif
err = init_vqs(vb);
if (err)
- goto out_free_vb;
+ goto out_unmount;
vb->nb.notifier_call = virtballoon_oom_notify;
vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
@@ -535,6 +572,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
out_oom_notify:
vdev->config->del_vqs(vdev);
+out_unmount:
+ if (vb->vb_dev_info.inode)
+ iput(vb->vb_dev_info.inode);
+ kern_unmount(balloon_mnt);
out_free_vb:
kfree(vb);
out:
@@ -567,6 +608,8 @@ static void virtballoon_remove(struct virtio_device *vdev)
cancel_work_sync(&vb->update_balloon_stats_work);
remove_common(vb);
+ if (vb->vb_dev_info.inode)
+ iput(vb->vb_dev_info.inode);
kfree(vb);
}
diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index 9b0a15d06a4f..43a858545844 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -48,6 +48,7 @@
#include <linux/migrate.h>
#include <linux/gfp.h>
#include <linux/err.h>
+#include <linux/fs.h>
/*
* Balloon device information descriptor.
@@ -62,6 +63,7 @@ struct balloon_dev_info {
struct list_head pages; /* Pages enqueued & handled to Host */
int (*migratepage)(struct balloon_dev_info *, struct page *newpage,
struct page *page, enum migrate_mode mode);
+ struct inode *inode;
};
extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
@@ -73,45 +75,19 @@ static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
spin_lock_init(&balloon->pages_lock);
INIT_LIST_HEAD(&balloon->pages);
balloon->migratepage = NULL;
+ balloon->inode = NULL;
}
#ifdef CONFIG_BALLOON_COMPACTION
-extern bool balloon_page_isolate(struct page *page);
+extern const struct address_space_operations balloon_aops;
+extern bool balloon_page_isolate(struct page *page,
+ isolate_mode_t mode);
extern void balloon_page_putback(struct page *page);
-extern int balloon_page_migrate(struct page *newpage,
+extern int balloon_page_migrate(struct address_space *mapping,
+ struct page *newpage,
struct page *page, enum migrate_mode mode);
/*
- * __is_movable_balloon_page - helper to perform @page PageBalloon tests
- */
-static inline bool __is_movable_balloon_page(struct page *page)
-{
- return PageBalloon(page);
-}
-
-/*
- * balloon_page_movable - test PageBalloon to identify balloon pages
- * and PagePrivate to check that the page is not
- * isolated and can be moved by compaction/migration.
- *
- * As we might return false positives in the case of a balloon page being just
- * released under us, this need to be re-tested later, under the page lock.
- */
-static inline bool balloon_page_movable(struct page *page)
-{
- return PageBalloon(page) && PagePrivate(page);
-}
-
-/*
- * isolated_balloon_page - identify an isolated balloon page on private
- * compaction/migration page lists.
- */
-static inline bool isolated_balloon_page(struct page *page)
-{
- return PageBalloon(page);
-}
-
-/*
* balloon_page_insert - insert a page into the balloon's page list and make
* the page->private assignment accordingly.
* @balloon : pointer to balloon device
@@ -123,8 +99,8 @@ static inline bool isolated_balloon_page(struct page *page)
static inline void balloon_page_insert(struct balloon_dev_info *balloon,
struct page *page)
{
+ page->mapping = balloon->inode->i_mapping;
__SetPageBalloon(page);
- SetPagePrivate(page);
set_page_private(page, (unsigned long)balloon);
list_add(&page->lru, &balloon->pages);
}
@@ -140,11 +116,10 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon,
static inline void balloon_page_delete(struct page *page)
{
__ClearPageBalloon(page);
+ page->mapping = NULL;
set_page_private(page, 0);
- if (PagePrivate(page)) {
- ClearPagePrivate(page);
+ if (!PageIsolated(page))
list_del(&page->lru);
- }
}
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 3885064641c4..4853e0487175 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -599,50 +599,58 @@ static inline void __ClearPageBuddy(struct page *page)
extern bool is_free_buddy_page(struct page *page);
-#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
+#define PAGE_MOVABLE_MAPCOUNT_VALUE (-256)
+#define PAGE_BALLOON_MAPCOUNT_VALUE PAGE_MOVABLE_MAPCOUNT_VALUE
-static inline int PageBalloon(struct page *page)
+static inline int PageMovable(struct page *page)
{
- return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
+ return (test_bit(PG_movable, &(page)->flags) &&
+ atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE);
}
-static inline void __SetPageBalloon(struct page *page)
+/* Caller should hold a PG_lock */
+static inline void __SetPageMovable(struct page *page)
{
- VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
- atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
+ __set_bit(PG_movable, &page->flags);
+ atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
}
-static inline void __ClearPageBalloon(struct page *page)
+static inline void __ClearPageMovable(struct page *page)
{
- VM_BUG_ON_PAGE(!PageBalloon(page), page);
atomic_set(&page->_mapcount, -1);
+ __clear_bit(PG_movable, &(page)->flags);
}
-#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
+PAGEFLAG(Isolated, isolated, PF_ANY);
-static inline int PageMovable(struct page *page)
+static inline int PageBalloon(struct page *page)
{
- return ((test_bit(PG_movable, &(page)->flags) &&
- atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
- || PageBalloon(page));
+ return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE
+ && PagePrivate2(page);
}
-/*
- * Caller should hold a PG_lock */
-static inline void __SetPageMovable(struct page *page)
+static inline void __SetPageBalloon(struct page *page)
{
- __set_bit(PG_movable, &page->flags);
- atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
+ VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
+#ifdef CONFIG_BALLOON_COMPACTION
+ __SetPageMovable(page);
+#else
+ atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
+#endif
+ SetPagePrivate2(page);
}
-static inline void __ClearPageMovable(struct page *page)
+static inline void __ClearPageBalloon(struct page *page)
{
+ VM_BUG_ON_PAGE(!PageBalloon(page), page);
+#ifdef CONFIG_BALLOON_COMPACTION
+ __ClearPageMovable(page);
+#else
atomic_set(&page->_mapcount, -1);
- __clear_bit(PG_movable, &(page)->flags);
+#endif
+ ClearPagePrivate2(page);
}
-PAGEFLAG(Isolated, isolated, PF_ANY);
-
/*
* If network-based swap is enabled, sl*b must keep track of whether pages
* were allocated from pfmemalloc reserves.
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 0de181ad73d5..e1fbe72c39c0 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -78,5 +78,6 @@
#define BTRFS_TEST_MAGIC 0x73727279
#define NSFS_MAGIC 0x6e736673
#define BPF_FS_MAGIC 0xcafe4a11
+#define BALLOON_KVM_MAGIC 0x13661366
#endif /* __LINUX_MAGIC_H__ */
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 57b3e9bd6bc5..1fbc7fb387bb 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -70,7 +70,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
*/
if (trylock_page(page)) {
#ifdef CONFIG_BALLOON_COMPACTION
- if (!PagePrivate(page)) {
+ if (PageIsolated(page)) {
/* raced with isolation */
unlock_page(page);
continue;
@@ -106,110 +106,53 @@ EXPORT_SYMBOL_GPL(balloon_page_dequeue);
#ifdef CONFIG_BALLOON_COMPACTION
-static inline void __isolate_balloon_page(struct page *page)
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool balloon_page_isolate(struct page *page, isolate_mode_t mode)
{
struct balloon_dev_info *b_dev_info = balloon_page_device(page);
unsigned long flags;
spin_lock_irqsave(&b_dev_info->pages_lock, flags);
- ClearPagePrivate(page);
list_del(&page->lru);
b_dev_info->isolated_pages++;
spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+ SetPageIsolated(page);
+
+ return true;
}
-static inline void __putback_balloon_page(struct page *page)
+/* putback_lru_page() counterpart for a ballooned page */
+void balloon_page_putback(struct page *page)
{
struct balloon_dev_info *b_dev_info = balloon_page_device(page);
unsigned long flags;
+ ClearPageIsolated(page);
spin_lock_irqsave(&b_dev_info->pages_lock, flags);
- SetPagePrivate(page);
list_add(&page->lru, &b_dev_info->pages);
b_dev_info->isolated_pages--;
spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
}
-/* __isolate_lru_page() counterpart for a ballooned page */
-bool balloon_page_isolate(struct page *page)
-{
- /*
- * Avoid burning cycles with pages that are yet under __free_pages(),
- * or just got freed under us.
- *
- * In case we 'win' a race for a balloon page being freed under us and
- * raise its refcount preventing __free_pages() from doing its job
- * the put_page() at the end of this block will take care of
- * release this page, thus avoiding a nasty leakage.
- */
- if (likely(get_page_unless_zero(page))) {
- /*
- * As balloon pages are not isolated from LRU lists, concurrent
- * compaction threads can race against page migration functions
- * as well as race against the balloon driver releasing a page.
- *
- * In order to avoid having an already isolated balloon page
- * being (wrongly) re-isolated while it is under migration,
- * or to avoid attempting to isolate pages being released by
- * the balloon driver, lets be sure we have the page lock
- * before proceeding with the balloon page isolation steps.
- */
- if (likely(trylock_page(page))) {
- /*
- * A ballooned page, by default, has PagePrivate set.
- * Prevent concurrent compaction threads from isolating
- * an already isolated balloon page by clearing it.
- */
- if (balloon_page_movable(page)) {
- __isolate_balloon_page(page);
- unlock_page(page);
- return true;
- }
- unlock_page(page);
- }
- put_page(page);
- }
- return false;
-}
-
-/* putback_lru_page() counterpart for a ballooned page */
-void balloon_page_putback(struct page *page)
-{
- /*
- * 'lock_page()' stabilizes the page and prevents races against
- * concurrent isolation threads attempting to re-isolate it.
- */
- lock_page(page);
-
- if (__is_movable_balloon_page(page)) {
- __putback_balloon_page(page);
- /* drop the extra ref count taken for page isolation */
- put_page(page);
- } else {
- WARN_ON(1);
- dump_page(page, "not movable balloon page");
- }
- unlock_page(page);
-}
-
/* move_to_new_page() counterpart for a ballooned page */
-int balloon_page_migrate(struct page *newpage,
- struct page *page, enum migrate_mode mode)
+int balloon_page_migrate(struct address_space *mapping,
+ struct page *newpage, struct page *page,
+ enum migrate_mode mode)
{
struct balloon_dev_info *balloon = balloon_page_device(page);
- int rc = -EAGAIN;
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
+ VM_BUG_ON_PAGE(!PageMovable(page), page);
+ VM_BUG_ON_PAGE(!PageIsolated(page), page);
- if (WARN_ON(!__is_movable_balloon_page(page))) {
- dump_page(page, "not movable balloon page");
- return rc;
- }
-
- if (balloon && balloon->migratepage)
- rc = balloon->migratepage(balloon, newpage, page, mode);
-
- return rc;
+ return balloon->migratepage(balloon, newpage, page, mode);
}
+
+const struct address_space_operations balloon_aops = {
+ .migratepage = balloon_page_migrate,
+ .isolate_page = balloon_page_isolate,
+ .putback_page = balloon_page_putback,
+};
+EXPORT_SYMBOL_GPL(balloon_aops);
#endif /* CONFIG_BALLOON_COMPACTION */
diff --git a/mm/compaction.c b/mm/compaction.c
index 7557aedddaee..e336c620fd7b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -708,13 +708,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
*/
is_lru = PageLRU(page);
if (!is_lru) {
- if (unlikely(balloon_page_movable(page))) {
- if (balloon_page_isolate(page)) {
- /* Successfully isolated */
- goto isolate_success;
- }
- }
-
if (unlikely(PageMovable(page)) &&
!PageIsolated(page)) {
if (locked) {
diff --git a/mm/migrate.c b/mm/migrate.c
index fc2842a15807..631c20754ee8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -147,8 +147,8 @@ void putback_movable_page(struct page *page)
* from where they were once taken off for compaction/migration.
*
* This function shall be used whenever the isolated pageset has been
- * built from lru, balloon, hugetlbfs page. See isolate_migratepages_range()
- * and isolate_huge_page().
+ * built from lru, movable, hugetlbfs page.
+ * See isolate_migratepages_range() and isolate_huge_page().
*/
void putback_movable_pages(struct list_head *l)
{
@@ -163,9 +163,7 @@ void putback_movable_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- if (unlikely(isolated_balloon_page(page)))
- balloon_page_putback(page);
- else if (unlikely(PageIsolated(page)))
+ if (unlikely(PageIsolated(page)))
putback_movable_page(page);
else
putback_lru_page(page);
@@ -959,18 +957,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
if (unlikely(!trylock_page(newpage)))
goto out_unlock;
- if (unlikely(isolated_balloon_page(page))) {
- /*
- * A ballooned page does not need any special attention from
- * physical to virtual reverse mapping procedures.
- * Skip any attempt to unmap PTEs or to remap swap cache,
- * in order to avoid burning cycles at rmap level, and perform
- * the page migration right away (proteced by page lock).
- */
- rc = balloon_page_migrate(newpage, page, mode);
- goto out_unlock_both;
- }
-
/*
* Corner case handling:
* 1. When a new swap-cache page is read into, it is added to the LRU
@@ -1015,7 +1001,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
out:
/* If migration is scucessful, move newpage to right list */
if (rc == MIGRATEPAGE_SUCCESS) {
- if (unlikely(__is_movable_balloon_page(newpage)))
+ if (unlikely(PageMovable(newpage)))
put_page(newpage);
else
putback_lru_page(newpage);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c72032dbe8db..e5dfa0cf6fdc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1254,7 +1254,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
list_for_each_entry_safe(page, next, page_list, lru) {
if (page_is_file_cache(page) && !PageDirty(page) &&
- !isolated_balloon_page(page)) {
+ !PageIsolated(page)) {
ClearPageActive(page);
list_move(&page->lru, &clean_pages);
}
--
1.9.1
Currently, we rely on class->lock to prevent zspage destruction.
It was okay until now because the critical section is short but
with run-time migration, it could be long so class->lock is not
a good apporach any more.
So, this patch introduces [un]freeze_zspage functions which
freeze allocated objects in the zspage with pinning tag so
user cannot free using object. With those functions, this patch
redesign compaction.
Those functions will be used for implementing zspage runtime
migrations, too.
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 393 ++++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 257 insertions(+), 136 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9c0ab1e92e9b..990d752fb65b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -922,6 +922,13 @@ static unsigned long obj_to_head(struct size_class *class, struct page *page,
return *(unsigned long *)obj;
}
+static inline int testpin_tag(unsigned long handle)
+{
+ unsigned long *ptr = (unsigned long *)handle;
+
+ return test_bit(HANDLE_PIN_BIT, ptr);
+}
+
static inline int trypin_tag(unsigned long handle)
{
unsigned long *ptr = (unsigned long *)handle;
@@ -950,8 +957,7 @@ static void reset_page(struct page *page)
page_mapcount_reset(page);
}
-static void free_zspage(struct zs_pool *pool, struct size_class *class,
- struct page *first_page)
+static void free_zspage(struct zs_pool *pool, struct page *first_page)
{
struct page *nextp, *tmp, *head_extra;
@@ -974,11 +980,6 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class,
}
reset_page(head_extra);
__free_page(head_extra);
-
- zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
- class->size, class->pages_per_zspage));
- atomic_long_sub(class->pages_per_zspage,
- &pool->pages_allocated);
}
/* Initialize a newly allocated zspage */
@@ -1326,6 +1327,11 @@ static bool zspage_full(struct size_class *class, struct page *first_page)
return get_zspage_inuse(first_page) == class->objs_per_zspage;
}
+static bool zspage_empty(struct size_class *class, struct page *first_page)
+{
+ return get_zspage_inuse(first_page) == 0;
+}
+
unsigned long zs_get_total_pages(struct zs_pool *pool)
{
return atomic_long_read(&pool->pages_allocated);
@@ -1456,7 +1462,6 @@ static unsigned long obj_malloc(struct size_class *class,
set_page_private(first_page, handle | OBJ_ALLOCATED_TAG);
kunmap_atomic(vaddr);
mod_zspage_inuse(first_page, 1);
- zs_stat_inc(class, OBJ_USED, 1);
obj = location_to_obj(m_page, obj);
@@ -1511,6 +1516,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
}
obj = obj_malloc(class, first_page, handle);
+ zs_stat_inc(class, OBJ_USED, 1);
/* Now move the zspage to another fullness group, if required */
fix_fullness_group(class, first_page);
record_obj(handle, obj);
@@ -1541,7 +1547,6 @@ static void obj_free(struct size_class *class, unsigned long obj)
kunmap_atomic(vaddr);
set_freeobj(first_page, f_objidx);
mod_zspage_inuse(first_page, -1);
- zs_stat_dec(class, OBJ_USED, 1);
}
void zs_free(struct zs_pool *pool, unsigned long handle)
@@ -1565,10 +1570,19 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
spin_lock(&class->lock);
obj_free(class, obj);
+ zs_stat_dec(class, OBJ_USED, 1);
fullness = fix_fullness_group(class, first_page);
- if (fullness == ZS_EMPTY)
- free_zspage(pool, class, first_page);
+ if (fullness == ZS_EMPTY) {
+ zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
+ class->size, class->pages_per_zspage));
+ spin_unlock(&class->lock);
+ atomic_long_sub(class->pages_per_zspage,
+ &pool->pages_allocated);
+ free_zspage(pool, first_page);
+ goto out;
+ }
spin_unlock(&class->lock);
+out:
unpin_tag(handle);
free_handle(pool, handle);
@@ -1638,127 +1652,66 @@ static void zs_object_copy(struct size_class *class, unsigned long dst,
kunmap_atomic(s_addr);
}
-/*
- * Find alloced object in zspage from index object and
- * return handle.
- */
-static unsigned long find_alloced_obj(struct size_class *class,
- struct page *page, int index)
+static unsigned long handle_from_obj(struct size_class *class,
+ struct page *first_page, int obj_idx)
{
- unsigned long head;
- int offset = 0;
- unsigned long handle = 0;
- void *addr = kmap_atomic(page);
-
- if (!is_first_page(page))
- offset = page->index;
- offset += class->size * index;
-
- while (offset < PAGE_SIZE) {
- head = obj_to_head(class, page, addr + offset);
- if (head & OBJ_ALLOCATED_TAG) {
- handle = head & ~OBJ_ALLOCATED_TAG;
- if (trypin_tag(handle))
- break;
- handle = 0;
- }
+ struct page *page;
+ unsigned long offset_in_page;
+ void *addr;
+ unsigned long head, handle = 0;
- offset += class->size;
- index++;
- }
+ objidx_to_page_and_offset(class, first_page, obj_idx,
+ &page, &offset_in_page);
+ addr = kmap_atomic(page);
+ head = obj_to_head(class, page, addr + offset_in_page);
+ if (head & OBJ_ALLOCATED_TAG)
+ handle = head & ~OBJ_ALLOCATED_TAG;
kunmap_atomic(addr);
+
return handle;
}
-struct zs_compact_control {
- /* Source page for migration which could be a subpage of zspage. */
- struct page *s_page;
- /* Destination page for migration which should be a first page
- * of zspage. */
- struct page *d_page;
- /* Starting object index within @s_page which used for live object
- * in the subpage. */
- int index;
-};
-
-static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
- struct zs_compact_control *cc)
+static int migrate_zspage(struct size_class *class, struct page *dst_page,
+ struct page *src_page)
{
- unsigned long used_obj, free_obj;
unsigned long handle;
- struct page *s_page = cc->s_page;
- struct page *d_page = cc->d_page;
- unsigned long index = cc->index;
- int ret = 0;
+ unsigned long old_obj, new_obj;
+ int i;
+ int nr_migrated = 0;
- while (1) {
- handle = find_alloced_obj(class, s_page, index);
- if (!handle) {
- s_page = get_next_page(s_page);
- if (!s_page)
- break;
- index = 0;
+ for (i = 0; i < class->objs_per_zspage; i++) {
+ handle = handle_from_obj(class, src_page, i);
+ if (!handle)
continue;
- }
-
- /* Stop if there is no more space */
- if (zspage_full(class, d_page)) {
- unpin_tag(handle);
- ret = -ENOMEM;
+ if (zspage_full(class, dst_page))
break;
- }
-
- used_obj = handle_to_obj(handle);
- free_obj = obj_malloc(class, d_page, handle);
- zs_object_copy(class, free_obj, used_obj);
- index++;
+ old_obj = handle_to_obj(handle);
+ new_obj = obj_malloc(class, dst_page, handle);
+ zs_object_copy(class, new_obj, old_obj);
+ nr_migrated++;
/*
* record_obj updates handle's value to free_obj and it will
* invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which
* breaks synchronization using pin_tag(e,g, zs_free) so
* let's keep the lock bit.
*/
- free_obj |= BIT(HANDLE_PIN_BIT);
- record_obj(handle, free_obj);
- unpin_tag(handle);
- obj_free(class, used_obj);
+ new_obj |= BIT(HANDLE_PIN_BIT);
+ record_obj(handle, new_obj);
+ obj_free(class, old_obj);
}
-
- /* Remember last position in this iteration */
- cc->s_page = s_page;
- cc->index = index;
-
- return ret;
-}
-
-static struct page *isolate_target_page(struct size_class *class)
-{
- int i;
- struct page *page;
-
- for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) {
- page = class->fullness_list[i];
- if (page) {
- remove_zspage(class, i, page);
- break;
- }
- }
-
- return page;
+ return nr_migrated;
}
/*
* putback_zspage - add @first_page into right class's fullness list
- * @pool: target pool
* @class: destination class
* @first_page: target page
*
* Return @first_page's updated fullness_group
*/
-static enum fullness_group putback_zspage(struct zs_pool *pool,
- struct size_class *class,
- struct page *first_page)
+static enum fullness_group putback_zspage(struct size_class *class,
+ struct page *first_page)
{
enum fullness_group fullness;
@@ -1769,17 +1722,155 @@ static enum fullness_group putback_zspage(struct zs_pool *pool,
return fullness;
}
+/*
+ * freeze_zspage - freeze all objects in a zspage
+ * @class: size class of the page
+ * @first_page: first page of zspage
+ *
+ * Freeze all allocated objects in a zspage so objects couldn't be
+ * freed until unfreeze objects. It should be called under class->lock.
+ *
+ * RETURNS:
+ * the number of pinned objects
+ */
+static int freeze_zspage(struct size_class *class, struct page *first_page)
+{
+ unsigned long obj_idx;
+ struct page *obj_page;
+ unsigned long offset;
+ void *addr;
+ int nr_freeze = 0;
+
+ for (obj_idx = 0; obj_idx < class->objs_per_zspage; obj_idx++) {
+ unsigned long head;
+
+ objidx_to_page_and_offset(class, first_page, obj_idx,
+ &obj_page, &offset);
+ addr = kmap_atomic(obj_page);
+ head = obj_to_head(class, obj_page, addr + offset);
+ if (head & OBJ_ALLOCATED_TAG) {
+ unsigned long handle = head & ~OBJ_ALLOCATED_TAG;
+
+ if (!trypin_tag(handle)) {
+ kunmap_atomic(addr);
+ break;
+ }
+ nr_freeze++;
+ }
+ kunmap_atomic(addr);
+ }
+
+ return nr_freeze;
+}
+
+/*
+ * unfreeze_page - unfreeze objects freezed by freeze_zspage in a zspage
+ * @class: size class of the page
+ * @first_page: freezed zspage to unfreeze
+ * @nr_obj: the number of objects to unfreeze
+ *
+ * unfreeze objects in a zspage.
+ */
+static void unfreeze_zspage(struct size_class *class, struct page *first_page,
+ int nr_obj)
+{
+ unsigned long obj_idx;
+ struct page *obj_page;
+ unsigned long offset;
+ void *addr;
+ int nr_unfreeze = 0;
+
+ for (obj_idx = 0; obj_idx < class->objs_per_zspage &&
+ nr_unfreeze < nr_obj; obj_idx++) {
+ unsigned long head;
+
+ objidx_to_page_and_offset(class, first_page, obj_idx,
+ &obj_page, &offset);
+ addr = kmap_atomic(obj_page);
+ head = obj_to_head(class, obj_page, addr + offset);
+ if (head & OBJ_ALLOCATED_TAG) {
+ unsigned long handle = head & ~OBJ_ALLOCATED_TAG;
+
+ VM_BUG_ON(!testpin_tag(handle));
+ unpin_tag(handle);
+ nr_unfreeze++;
+ }
+ kunmap_atomic(addr);
+ }
+}
+
+/*
+ * isolate_source_page - isolate a zspage for migration source
+ * @class: size class of zspage for isolation
+ *
+ * Returns a zspage which are isolated from list so anyone can
+ * allocate a object from that page. As well, freeze all objects
+ * allocated in the zspage so anyone cannot access that objects
+ * (e.g., zs_map_object, zs_free).
+ */
static struct page *isolate_source_page(struct size_class *class)
{
int i;
struct page *page = NULL;
for (i = ZS_ALMOST_EMPTY; i >= ZS_ALMOST_FULL; i--) {
+ int inuse, freezed;
+
page = class->fullness_list[i];
if (!page)
continue;
remove_zspage(class, i, page);
+
+ inuse = get_zspage_inuse(page);
+ freezed = freeze_zspage(class, page);
+
+ if (inuse != freezed) {
+ unfreeze_zspage(class, page, freezed);
+ putback_zspage(class, page);
+ page = NULL;
+ continue;
+ }
+
+ break;
+ }
+
+ return page;
+}
+
+/*
+ * isolate_target_page - isolate a zspage for migration target
+ * @class: size class of zspage for isolation
+ *
+ * Returns a zspage which are isolated from list so anyone can
+ * allocate a object from that page. As well, freeze all objects
+ * allocated in the zspage so anyone cannot access that objects
+ * (e.g., zs_map_object, zs_free).
+ */
+static struct page *isolate_target_page(struct size_class *class)
+{
+ int i;
+ struct page *page;
+
+ for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) {
+ int inuse, freezed;
+
+ page = class->fullness_list[i];
+ if (!page)
+ continue;
+
+ remove_zspage(class, i, page);
+
+ inuse = get_zspage_inuse(page);
+ freezed = freeze_zspage(class, page);
+
+ if (inuse != freezed) {
+ unfreeze_zspage(class, page, freezed);
+ putback_zspage(class, page);
+ page = NULL;
+ continue;
+ }
+
break;
}
@@ -1794,9 +1885,11 @@ static struct page *isolate_source_page(struct size_class *class)
static unsigned long zs_can_compact(struct size_class *class)
{
unsigned long obj_wasted;
+ unsigned long obj_allocated, obj_used;
- obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
- zs_stat_get(class, OBJ_USED);
+ obj_allocated = zs_stat_get(class, OBJ_ALLOCATED);
+ obj_used = zs_stat_get(class, OBJ_USED);
+ obj_wasted = obj_allocated - obj_used;
obj_wasted /= get_maxobj_per_zspage(class->size,
class->pages_per_zspage);
@@ -1806,53 +1899,81 @@ static unsigned long zs_can_compact(struct size_class *class)
static void __zs_compact(struct zs_pool *pool, struct size_class *class)
{
- struct zs_compact_control cc;
- struct page *src_page;
+ struct page *src_page = NULL;
struct page *dst_page = NULL;
- spin_lock(&class->lock);
- while ((src_page = isolate_source_page(class))) {
+ while (1) {
+ int nr_migrated;
- if (!zs_can_compact(class))
+ spin_lock(&class->lock);
+ if (!zs_can_compact(class)) {
+ spin_unlock(&class->lock);
break;
+ }
- cc.index = 0;
- cc.s_page = src_page;
+ /*
+ * Isolate source page and freeze all objects in a zspage
+ * to prevent zspage destroying.
+ */
+ if (!src_page) {
+ src_page = isolate_source_page(class);
+ if (!src_page) {
+ spin_unlock(&class->lock);
+ break;
+ }
+ }
- while ((dst_page = isolate_target_page(class))) {
- cc.d_page = dst_page;
- /*
- * If there is no more space in dst_page, resched
- * and see if anyone had allocated another zspage.
- */
- if (!migrate_zspage(pool, class, &cc))
+ /* Isolate target page and freeze all objects in the zspage */
+ if (!dst_page) {
+ dst_page = isolate_target_page(class);
+ if (!dst_page) {
+ spin_unlock(&class->lock);
break;
+ }
+ }
+ spin_unlock(&class->lock);
+
+ nr_migrated = migrate_zspage(class, dst_page, src_page);
- VM_BUG_ON_PAGE(putback_zspage(pool, class,
- dst_page) == ZS_EMPTY, dst_page);
+ if (zspage_full(class, dst_page)) {
+ spin_lock(&class->lock);
+ putback_zspage(class, dst_page);
+ unfreeze_zspage(class, dst_page,
+ class->objs_per_zspage);
+ spin_unlock(&class->lock);
+ dst_page = NULL;
}
- /* Stop if we couldn't find slot */
- if (dst_page == NULL)
- break;
+ if (zspage_empty(class, src_page)) {
+ free_zspage(pool, src_page);
+ spin_lock(&class->lock);
+ zs_stat_dec(class, OBJ_ALLOCATED,
+ get_maxobj_per_zspage(
+ class->size, class->pages_per_zspage));
+ atomic_long_sub(class->pages_per_zspage,
+ &pool->pages_allocated);
- VM_BUG_ON_PAGE(putback_zspage(pool, class,
- dst_page) == ZS_EMPTY, dst_page);
- if (putback_zspage(pool, class, src_page) == ZS_EMPTY) {
pool->stats.pages_compacted += class->pages_per_zspage;
spin_unlock(&class->lock);
- free_zspage(pool, class, src_page);
- } else {
- spin_unlock(&class->lock);
+ src_page = NULL;
}
+ }
- cond_resched();
- spin_lock(&class->lock);
+ if (!src_page && !dst_page)
+ return;
+
+ spin_lock(&class->lock);
+ if (src_page) {
+ putback_zspage(class, src_page);
+ unfreeze_zspage(class, src_page,
+ class->objs_per_zspage);
}
- if (src_page)
- VM_BUG_ON_PAGE(putback_zspage(pool, class,
- src_page) == ZS_EMPTY, src_page);
+ if (dst_page) {
+ putback_zspage(class, dst_page);
+ unfreeze_zspage(class, dst_page,
+ class->objs_per_zspage);
+ }
spin_unlock(&class->lock);
}
--
1.9.1
Hi Minchan,
[auto build test ERROR on next-20160318]
[cannot apply to v4.5-rc7 v4.5-rc6 v4.5-rc5 v4.5]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
url: https://github.com/0day-ci/linux/commits/Minchan-Kim/Support-non-lru-page-migration/20160321-143339
config: x86_64-randconfig-x000-201612 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All error/warnings (new ones prefixed by >>):
drivers/virtio/virtio_balloon.c: In function 'virtballoon_probe':
>> drivers/virtio/virtio_balloon.c:578:15: error: 'balloon_mnt' undeclared (first use in this function)
kern_unmount(balloon_mnt);
^
drivers/virtio/virtio_balloon.c:578:15: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/virtio/virtio_balloon.c:579:1: warning: label 'out_free_vb' defined but not used [-Wunused-label]
out_free_vb:
^
vim +/balloon_mnt +578 drivers/virtio/virtio_balloon.c
572
573 out_oom_notify:
574 vdev->config->del_vqs(vdev);
575 out_unmount:
576 if (vb->vb_dev_info.inode)
577 iput(vb->vb_dev_info.inode);
> 578 kern_unmount(balloon_mnt);
> 579 out_free_vb:
580 kfree(vb);
581 out:
582 return err;
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
mm/zsmalloc.c:1103:2-3: Unneeded semicolon
Remove unneeded semicolon.
Generated by: scripts/coccinelle/misc/semicolon.cocci
CC: Minchan Kim <[email protected]>
Signed-off-by: Fengguang Wu <[email protected]>
---
zsmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1100,7 +1100,7 @@ void unlock_zspage(struct page *first_pa
VM_BUG_ON_PAGE(!PageLocked(cursor), cursor);
if (cursor != locked_page)
unlock_page(cursor);
- };
+ }
}
static void free_zspage(struct zs_pool *pool, struct page *first_page)
Hi Minchan,
[auto build test WARNING on next-20160318]
[cannot apply to v4.5-rc7 v4.5-rc6 v4.5-rc5 v4.5]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
url: https://github.com/0day-ci/linux/commits/Minchan-Kim/Support-non-lru-page-migration/20160321-143339
coccinelle warnings: (new ones prefixed by >>)
>> mm/zsmalloc.c:1103:2-3: Unneeded semicolon
Please review and possibly fold the followup patch.
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Mon, Mar 21, 2016 at 04:29:55PM +0800, kbuild test robot wrote:
> Hi Minchan,
>
> [auto build test ERROR on next-20160318]
> [cannot apply to v4.5-rc7 v4.5-rc6 v4.5-rc5 v4.5]
> [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
>
> url: https://github.com/0day-ci/linux/commits/Minchan-Kim/Support-non-lru-page-migration/20160321-143339
> config: x86_64-randconfig-x000-201612 (attached as .config)
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All error/warnings (new ones prefixed by >>):
>
> drivers/virtio/virtio_balloon.c: In function 'virtballoon_probe':
> >> drivers/virtio/virtio_balloon.c:578:15: error: 'balloon_mnt' undeclared (first use in this function)
> kern_unmount(balloon_mnt);
> ^
> drivers/virtio/virtio_balloon.c:578:15: note: each undeclared identifier is reported only once for each function it appears in
> >> drivers/virtio/virtio_balloon.c:579:1: warning: label 'out_free_vb' defined but not used [-Wunused-label]
> out_free_vb:
> ^
>
> vim +/balloon_mnt +578 drivers/virtio/virtio_balloon.c
>
> 572
> 573 out_oom_notify:
> 574 vdev->config->del_vqs(vdev);
> 575 out_unmount:
> 576 if (vb->vb_dev_info.inode)
> 577 iput(vb->vb_dev_info.inode);
> > 578 kern_unmount(balloon_mnt);
> > 579 out_free_vb:
> 580 kfree(vb);
> 581 out:
> 582 return err;
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
Thanks, kbuild.
Fixed.
>From 7006a7ee62bb09273f96d8cb45c32e42453ab931 Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Thu, 3 Mar 2016 14:28:45 +0900
Subject: [PATCH] mm/balloon: use general movable page feature into balloon
Now, VM has a feature to migrate non-lru movable pages so
balloon doesn't need custom migration hooks in migrate.c
and compact.c. Instead, this patch implements page->mapping
->{isolate|migrate|putback} functions.
With that, we could remove hooks for ballooning in general
migration functions and make balloon compaction simple.
Cc: [email protected]
Cc: Rafael Aquini <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Signed-off-by: Gioh Kim <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/virtio/virtio_balloon.c | 53 ++++++++++++++++---
include/linux/balloon_compaction.h | 47 ++++-------------
include/linux/page-flags.h | 52 +++++++++++--------
include/uapi/linux/magic.h | 1 +
mm/balloon_compaction.c | 101 ++++++++-----------------------------
mm/compaction.c | 7 ---
mm/migrate.c | 22 ++------
mm/vmscan.c | 2 +-
8 files changed, 116 insertions(+), 169 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7b6d74f0c72f..0c16192d2684 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -30,6 +30,7 @@
#include <linux/oom.h>
#include <linux/wait.h>
#include <linux/mm.h>
+#include <linux/mount.h>
/*
* Balloon device works in 4K page units. So each page is pointed to by
@@ -45,6 +46,10 @@ static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
module_param(oom_pages, int, S_IRUSR | S_IWUSR);
MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
+#ifdef CONFIG_BALLOON_COMPACTION
+static struct vfsmount *balloon_mnt;
+#endif
+
struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -482,10 +487,29 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
mutex_unlock(&vb->balloon_lock);
+ ClearPageIsolated(page);
put_page(page); /* balloon reference */
return MIGRATEPAGE_SUCCESS;
}
+
+static struct dentry *balloon_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data)
+{
+ static const struct dentry_operations ops = {
+ .d_dname = simple_dname,
+ };
+
+ return mount_pseudo(fs_type, "balloon-kvm:", NULL, &ops,
+ BALLOON_KVM_MAGIC);
+}
+
+static struct file_system_type balloon_fs = {
+ .name = "balloon-kvm",
+ .mount = balloon_mount,
+ .kill_sb = kill_anon_super,
+};
+
#endif /* CONFIG_BALLOON_COMPACTION */
static int virtballoon_probe(struct virtio_device *vdev)
@@ -515,10 +539,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->vdev = vdev;
balloon_devinfo_init(&vb->vb_dev_info);
-#ifdef CONFIG_BALLOON_COMPACTION
- vb->vb_dev_info.migratepage = virtballoon_migratepage;
-#endif
-
err = init_vqs(vb);
if (err)
goto out_free_vb;
@@ -527,13 +547,32 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
err = register_oom_notifier(&vb->nb);
if (err < 0)
- goto out_oom_notify;
+ goto out_del_vqs;
+
+#ifdef CONFIG_BALLOON_COMPACTION
+ balloon_mnt = kern_mount(&balloon_fs);
+ if (IS_ERR(balloon_mnt)) {
+ err = PTR_ERR(balloon_mnt);
+ unregister_oom_notifier(&vb->nb);
+ goto out_del_vqs;
+ }
+ vb->vb_dev_info.migratepage = virtballoon_migratepage;
+ vb->vb_dev_info.inode = alloc_anon_inode(balloon_mnt->mnt_sb);
+ if (IS_ERR(vb->vb_dev_info.inode)) {
+ err = PTR_ERR(vb->vb_dev_info.inode);
+ kern_unmount(balloon_mnt);
+ unregister_oom_notifier(&vb->nb);
+ vb->vb_dev_info.inode = NULL;
+ goto out_del_vqs;
+ }
+ vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
+#endif
virtio_device_ready(vdev);
return 0;
-out_oom_notify:
+out_del_vqs:
vdev->config->del_vqs(vdev);
out_free_vb:
kfree(vb);
@@ -567,6 +606,8 @@ static void virtballoon_remove(struct virtio_device *vdev)
cancel_work_sync(&vb->update_balloon_stats_work);
remove_common(vb);
+ if (vb->vb_dev_info.inode)
+ iput(vb->vb_dev_info.inode);
kfree(vb);
}
diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index 9b0a15d06a4f..43a858545844 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -48,6 +48,7 @@
#include <linux/migrate.h>
#include <linux/gfp.h>
#include <linux/err.h>
+#include <linux/fs.h>
/*
* Balloon device information descriptor.
@@ -62,6 +63,7 @@ struct balloon_dev_info {
struct list_head pages; /* Pages enqueued & handled to Host */
int (*migratepage)(struct balloon_dev_info *, struct page *newpage,
struct page *page, enum migrate_mode mode);
+ struct inode *inode;
};
extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
@@ -73,45 +75,19 @@ static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
spin_lock_init(&balloon->pages_lock);
INIT_LIST_HEAD(&balloon->pages);
balloon->migratepage = NULL;
+ balloon->inode = NULL;
}
#ifdef CONFIG_BALLOON_COMPACTION
-extern bool balloon_page_isolate(struct page *page);
+extern const struct address_space_operations balloon_aops;
+extern bool balloon_page_isolate(struct page *page,
+ isolate_mode_t mode);
extern void balloon_page_putback(struct page *page);
-extern int balloon_page_migrate(struct page *newpage,
+extern int balloon_page_migrate(struct address_space *mapping,
+ struct page *newpage,
struct page *page, enum migrate_mode mode);
/*
- * __is_movable_balloon_page - helper to perform @page PageBalloon tests
- */
-static inline bool __is_movable_balloon_page(struct page *page)
-{
- return PageBalloon(page);
-}
-
-/*
- * balloon_page_movable - test PageBalloon to identify balloon pages
- * and PagePrivate to check that the page is not
- * isolated and can be moved by compaction/migration.
- *
- * As we might return false positives in the case of a balloon page being just
- * released under us, this need to be re-tested later, under the page lock.
- */
-static inline bool balloon_page_movable(struct page *page)
-{
- return PageBalloon(page) && PagePrivate(page);
-}
-
-/*
- * isolated_balloon_page - identify an isolated balloon page on private
- * compaction/migration page lists.
- */
-static inline bool isolated_balloon_page(struct page *page)
-{
- return PageBalloon(page);
-}
-
-/*
* balloon_page_insert - insert a page into the balloon's page list and make
* the page->private assignment accordingly.
* @balloon : pointer to balloon device
@@ -123,8 +99,8 @@ static inline bool isolated_balloon_page(struct page *page)
static inline void balloon_page_insert(struct balloon_dev_info *balloon,
struct page *page)
{
+ page->mapping = balloon->inode->i_mapping;
__SetPageBalloon(page);
- SetPagePrivate(page);
set_page_private(page, (unsigned long)balloon);
list_add(&page->lru, &balloon->pages);
}
@@ -140,11 +116,10 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon,
static inline void balloon_page_delete(struct page *page)
{
__ClearPageBalloon(page);
+ page->mapping = NULL;
set_page_private(page, 0);
- if (PagePrivate(page)) {
- ClearPagePrivate(page);
+ if (!PageIsolated(page))
list_del(&page->lru);
- }
}
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 3885064641c4..4853e0487175 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -599,50 +599,58 @@ static inline void __ClearPageBuddy(struct page *page)
extern bool is_free_buddy_page(struct page *page);
-#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
+#define PAGE_MOVABLE_MAPCOUNT_VALUE (-256)
+#define PAGE_BALLOON_MAPCOUNT_VALUE PAGE_MOVABLE_MAPCOUNT_VALUE
-static inline int PageBalloon(struct page *page)
+static inline int PageMovable(struct page *page)
{
- return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
+ return (test_bit(PG_movable, &(page)->flags) &&
+ atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE);
}
-static inline void __SetPageBalloon(struct page *page)
+/* Caller should hold a PG_lock */
+static inline void __SetPageMovable(struct page *page)
{
- VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
- atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
+ __set_bit(PG_movable, &page->flags);
+ atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
}
-static inline void __ClearPageBalloon(struct page *page)
+static inline void __ClearPageMovable(struct page *page)
{
- VM_BUG_ON_PAGE(!PageBalloon(page), page);
atomic_set(&page->_mapcount, -1);
+ __clear_bit(PG_movable, &(page)->flags);
}
-#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
+PAGEFLAG(Isolated, isolated, PF_ANY);
-static inline int PageMovable(struct page *page)
+static inline int PageBalloon(struct page *page)
{
- return ((test_bit(PG_movable, &(page)->flags) &&
- atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
- || PageBalloon(page));
+ return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE
+ && PagePrivate2(page);
}
-/*
- * Caller should hold a PG_lock */
-static inline void __SetPageMovable(struct page *page)
+static inline void __SetPageBalloon(struct page *page)
{
- __set_bit(PG_movable, &page->flags);
- atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
+ VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
+#ifdef CONFIG_BALLOON_COMPACTION
+ __SetPageMovable(page);
+#else
+ atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
+#endif
+ SetPagePrivate2(page);
}
-static inline void __ClearPageMovable(struct page *page)
+static inline void __ClearPageBalloon(struct page *page)
{
+ VM_BUG_ON_PAGE(!PageBalloon(page), page);
+#ifdef CONFIG_BALLOON_COMPACTION
+ __ClearPageMovable(page);
+#else
atomic_set(&page->_mapcount, -1);
- __clear_bit(PG_movable, &(page)->flags);
+#endif
+ ClearPagePrivate2(page);
}
-PAGEFLAG(Isolated, isolated, PF_ANY);
-
/*
* If network-based swap is enabled, sl*b must keep track of whether pages
* were allocated from pfmemalloc reserves.
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 0de181ad73d5..e1fbe72c39c0 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -78,5 +78,6 @@
#define BTRFS_TEST_MAGIC 0x73727279
#define NSFS_MAGIC 0x6e736673
#define BPF_FS_MAGIC 0xcafe4a11
+#define BALLOON_KVM_MAGIC 0x13661366
#endif /* __LINUX_MAGIC_H__ */
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 57b3e9bd6bc5..1fbc7fb387bb 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -70,7 +70,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
*/
if (trylock_page(page)) {
#ifdef CONFIG_BALLOON_COMPACTION
- if (!PagePrivate(page)) {
+ if (PageIsolated(page)) {
/* raced with isolation */
unlock_page(page);
continue;
@@ -106,110 +106,53 @@ EXPORT_SYMBOL_GPL(balloon_page_dequeue);
#ifdef CONFIG_BALLOON_COMPACTION
-static inline void __isolate_balloon_page(struct page *page)
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool balloon_page_isolate(struct page *page, isolate_mode_t mode)
{
struct balloon_dev_info *b_dev_info = balloon_page_device(page);
unsigned long flags;
spin_lock_irqsave(&b_dev_info->pages_lock, flags);
- ClearPagePrivate(page);
list_del(&page->lru);
b_dev_info->isolated_pages++;
spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+ SetPageIsolated(page);
+
+ return true;
}
-static inline void __putback_balloon_page(struct page *page)
+/* putback_lru_page() counterpart for a ballooned page */
+void balloon_page_putback(struct page *page)
{
struct balloon_dev_info *b_dev_info = balloon_page_device(page);
unsigned long flags;
+ ClearPageIsolated(page);
spin_lock_irqsave(&b_dev_info->pages_lock, flags);
- SetPagePrivate(page);
list_add(&page->lru, &b_dev_info->pages);
b_dev_info->isolated_pages--;
spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
}
-/* __isolate_lru_page() counterpart for a ballooned page */
-bool balloon_page_isolate(struct page *page)
-{
- /*
- * Avoid burning cycles with pages that are yet under __free_pages(),
- * or just got freed under us.
- *
- * In case we 'win' a race for a balloon page being freed under us and
- * raise its refcount preventing __free_pages() from doing its job
- * the put_page() at the end of this block will take care of
- * release this page, thus avoiding a nasty leakage.
- */
- if (likely(get_page_unless_zero(page))) {
- /*
- * As balloon pages are not isolated from LRU lists, concurrent
- * compaction threads can race against page migration functions
- * as well as race against the balloon driver releasing a page.
- *
- * In order to avoid having an already isolated balloon page
- * being (wrongly) re-isolated while it is under migration,
- * or to avoid attempting to isolate pages being released by
- * the balloon driver, lets be sure we have the page lock
- * before proceeding with the balloon page isolation steps.
- */
- if (likely(trylock_page(page))) {
- /*
- * A ballooned page, by default, has PagePrivate set.
- * Prevent concurrent compaction threads from isolating
- * an already isolated balloon page by clearing it.
- */
- if (balloon_page_movable(page)) {
- __isolate_balloon_page(page);
- unlock_page(page);
- return true;
- }
- unlock_page(page);
- }
- put_page(page);
- }
- return false;
-}
-
-/* putback_lru_page() counterpart for a ballooned page */
-void balloon_page_putback(struct page *page)
-{
- /*
- * 'lock_page()' stabilizes the page and prevents races against
- * concurrent isolation threads attempting to re-isolate it.
- */
- lock_page(page);
-
- if (__is_movable_balloon_page(page)) {
- __putback_balloon_page(page);
- /* drop the extra ref count taken for page isolation */
- put_page(page);
- } else {
- WARN_ON(1);
- dump_page(page, "not movable balloon page");
- }
- unlock_page(page);
-}
-
/* move_to_new_page() counterpart for a ballooned page */
-int balloon_page_migrate(struct page *newpage,
- struct page *page, enum migrate_mode mode)
+int balloon_page_migrate(struct address_space *mapping,
+ struct page *newpage, struct page *page,
+ enum migrate_mode mode)
{
struct balloon_dev_info *balloon = balloon_page_device(page);
- int rc = -EAGAIN;
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
+ VM_BUG_ON_PAGE(!PageMovable(page), page);
+ VM_BUG_ON_PAGE(!PageIsolated(page), page);
- if (WARN_ON(!__is_movable_balloon_page(page))) {
- dump_page(page, "not movable balloon page");
- return rc;
- }
-
- if (balloon && balloon->migratepage)
- rc = balloon->migratepage(balloon, newpage, page, mode);
-
- return rc;
+ return balloon->migratepage(balloon, newpage, page, mode);
}
+
+const struct address_space_operations balloon_aops = {
+ .migratepage = balloon_page_migrate,
+ .isolate_page = balloon_page_isolate,
+ .putback_page = balloon_page_putback,
+};
+EXPORT_SYMBOL_GPL(balloon_aops);
#endif /* CONFIG_BALLOON_COMPACTION */
diff --git a/mm/compaction.c b/mm/compaction.c
index 7557aedddaee..e336c620fd7b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -708,13 +708,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
*/
is_lru = PageLRU(page);
if (!is_lru) {
- if (unlikely(balloon_page_movable(page))) {
- if (balloon_page_isolate(page)) {
- /* Successfully isolated */
- goto isolate_success;
- }
- }
-
if (unlikely(PageMovable(page)) &&
!PageIsolated(page)) {
if (locked) {
diff --git a/mm/migrate.c b/mm/migrate.c
index ab87ef45a3b9..3234c14ed1cd 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -147,8 +147,8 @@ void putback_movable_page(struct page *page)
* from where they were once taken off for compaction/migration.
*
* This function shall be used whenever the isolated pageset has been
- * built from lru, balloon, hugetlbfs page. See isolate_migratepages_range()
- * and isolate_huge_page().
+ * built from lru, movable, hugetlbfs page.
+ * See isolate_migratepages_range() and isolate_huge_page().
*/
void putback_movable_pages(struct list_head *l)
{
@@ -163,9 +163,7 @@ void putback_movable_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- if (unlikely(isolated_balloon_page(page)))
- balloon_page_putback(page);
- else if (unlikely(PageIsolated(page)))
+ if (unlikely(PageIsolated(page)))
putback_movable_page(page);
else
putback_lru_page(page);
@@ -959,18 +957,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
if (unlikely(!trylock_page(newpage)))
goto out_unlock;
- if (unlikely(isolated_balloon_page(page))) {
- /*
- * A ballooned page does not need any special attention from
- * physical to virtual reverse mapping procedures.
- * Skip any attempt to unmap PTEs or to remap swap cache,
- * in order to avoid burning cycles at rmap level, and perform
- * the page migration right away (proteced by page lock).
- */
- rc = balloon_page_migrate(newpage, page, mode);
- goto out_unlock_both;
- }
-
/*
* Corner case handling:
* 1. When a new swap-cache page is read into, it is added to the LRU
@@ -1015,7 +1001,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
out:
/* If migration is successful, move newpage to right list */
if (rc == MIGRATEPAGE_SUCCESS) {
- if (unlikely(__is_movable_balloon_page(newpage)))
+ if (unlikely(PageMovable(newpage)))
put_page(newpage);
else
putback_lru_page(newpage);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c72032dbe8db..e5dfa0cf6fdc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1254,7 +1254,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
list_for_each_entry_safe(page, next, page_list, lru) {
if (page_is_file_cache(page) && !PageDirty(page) &&
- !isolated_balloon_page(page)) {
+ !PageIsolated(page)) {
ClearPageActive(page);
list_move(&page->lru, &clean_pages);
}
--
1.9.1
On Mon, Mar 21, 2016 at 05:48:25PM +0800, kbuild test robot wrote:
> mm/zsmalloc.c:1103:2-3: Unneeded semicolon
>
>
> Remove unneeded semicolon.
>
> Generated by: scripts/coccinelle/misc/semicolon.cocci
>
> CC: Minchan Kim <[email protected]>
> Signed-off-by: Fengguang Wu <[email protected]>
> ---
>
> zsmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1100,7 +1100,7 @@ void unlock_zspage(struct page *first_pa
> VM_BUG_ON_PAGE(!PageLocked(cursor), cursor);
> if (cursor != locked_page)
> unlock_page(cursor);
> - };
> + }
> }
>
> static void free_zspage(struct zs_pool *pool, struct page *first_page)
Thanks.
Fixed.
>From bb46f8265b55228f31b8096bd1c13dd6e6ee1bc4 Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Wed, 9 Mar 2016 09:37:57 +0900
Subject: [PATCH] zsmalloc: migrate tail pages in zspage
This patch enables tail page migration of zspage.
In this point, I tested zsmalloc regression with micro-benchmark
which does zs_malloc/map/unmap/zs_free for all size class
in every CPU(my system is 12) during 20 sec.
It shows 1% regression which is really small when we consider
the benefit of this feature and realworkload overhead(i.e.,
most overhead comes from compression).
Signed-off-by: Minchan Kim <[email protected]>
---
mm/zsmalloc.c | 131 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 115 insertions(+), 16 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9b4b03d8f993..3f1d488633e1 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -551,6 +551,19 @@ static void set_zspage_mapping(struct page *first_page,
m->class = class_idx;
}
+static bool check_isolated_page(struct page *first_page)
+{
+ struct page *cursor;
+
+ for (cursor = first_page; cursor != NULL; cursor =
+ get_next_page(cursor)) {
+ if (PageIsolated(cursor))
+ return true;
+ }
+
+ return false;
+}
+
/*
* zsmalloc divides the pool into various size classes where each
* class maintains a list of zspages where each zspage is divided
@@ -1052,6 +1065,44 @@ void lock_zspage(struct page *first_page)
} while ((cursor = get_next_page(cursor)) != NULL);
}
+int trylock_zspage(struct page *first_page, struct page *locked_page)
+{
+ struct page *cursor, *fail;
+
+ VM_BUG_ON_PAGE(!is_first_page(first_page), first_page);
+
+ for (cursor = first_page; cursor != NULL; cursor =
+ get_next_page(cursor)) {
+ if (cursor != locked_page) {
+ if (!trylock_page(cursor)) {
+ fail = cursor;
+ goto unlock;
+ }
+ }
+ }
+
+ return 1;
+unlock:
+ for (cursor = first_page; cursor != fail; cursor =
+ get_next_page(cursor)) {
+ if (cursor != locked_page)
+ unlock_page(cursor);
+ }
+
+ return 0;
+}
+
+void unlock_zspage(struct page *first_page, struct page *locked_page)
+{
+ struct page *cursor = first_page;
+
+ for (; cursor != NULL; cursor = get_next_page(cursor)) {
+ VM_BUG_ON_PAGE(!PageLocked(cursor), cursor);
+ if (cursor != locked_page)
+ unlock_page(cursor);
+ }
+}
+
static void free_zspage(struct zs_pool *pool, struct page *first_page)
{
struct page *nextp, *tmp;
@@ -1090,16 +1141,17 @@ static void init_zspage(struct size_class *class, struct page *first_page,
first_page->freelist = NULL;
INIT_LIST_HEAD(&first_page->lru);
set_zspage_inuse(first_page, 0);
- BUG_ON(!trylock_page(first_page));
- first_page->mapping = mapping;
- __SetPageMovable(first_page);
- unlock_page(first_page);
while (page) {
struct page *next_page;
struct link_free *link;
void *vaddr;
+ BUG_ON(!trylock_page(page));
+ page->mapping = mapping;
+ __SetPageMovable(page);
+ unlock_page(page);
+
vaddr = kmap_atomic(page);
link = (struct link_free *)vaddr + off / sizeof(*link);
@@ -1850,6 +1902,7 @@ static enum fullness_group putback_zspage(struct size_class *class,
VM_BUG_ON_PAGE(!list_empty(&first_page->lru), first_page);
VM_BUG_ON_PAGE(ZsPageIsolate(first_page), first_page);
+ VM_BUG_ON_PAGE(check_isolated_page(first_page), first_page);
fullness = get_fullness_group(class, first_page);
insert_zspage(class, fullness, first_page);
@@ -1956,6 +2009,12 @@ static struct page *isolate_source_page(struct size_class *class)
if (!page)
continue;
+ /* To prevent race between object and page migration */
+ if (!trylock_zspage(page, NULL)) {
+ page = NULL;
+ continue;
+ }
+
remove_zspage(class, i, page);
inuse = get_zspage_inuse(page);
@@ -1964,6 +2023,7 @@ static struct page *isolate_source_page(struct size_class *class)
if (inuse != freezed) {
unfreeze_zspage(class, page, freezed);
putback_zspage(class, page);
+ unlock_zspage(page, NULL);
page = NULL;
continue;
}
@@ -1995,6 +2055,12 @@ static struct page *isolate_target_page(struct size_class *class)
if (!page)
continue;
+ /* To prevent race between object and page migration */
+ if (!trylock_zspage(page, NULL)) {
+ page = NULL;
+ continue;
+ }
+
remove_zspage(class, i, page);
inuse = get_zspage_inuse(page);
@@ -2003,6 +2069,7 @@ static struct page *isolate_target_page(struct size_class *class)
if (inuse != freezed) {
unfreeze_zspage(class, page, freezed);
putback_zspage(class, page);
+ unlock_zspage(page, NULL);
page = NULL;
continue;
}
@@ -2076,11 +2143,13 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
putback_zspage(class, dst_page);
unfreeze_zspage(class, dst_page,
class->objs_per_zspage);
+ unlock_zspage(dst_page, NULL);
spin_unlock(&class->lock);
dst_page = NULL;
}
if (zspage_empty(class, src_page)) {
+ unlock_zspage(src_page, NULL);
free_zspage(pool, src_page);
spin_lock(&class->lock);
zs_stat_dec(class, OBJ_ALLOCATED,
@@ -2103,12 +2172,14 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
putback_zspage(class, src_page);
unfreeze_zspage(class, src_page,
class->objs_per_zspage);
+ unlock_zspage(src_page, NULL);
}
if (dst_page) {
putback_zspage(class, dst_page);
unfreeze_zspage(class, dst_page,
class->objs_per_zspage);
+ unlock_zspage(dst_page, NULL);
}
spin_unlock(&class->lock);
@@ -2211,10 +2282,11 @@ bool zs_page_isolate(struct page *page, isolate_mode_t mode)
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageIsolated(page), page);
/*
- * In this implementation, it allows only first page migration.
+ * first_page will not be destroyed by PG_lock of @page but it could
+ * be migrated out. For prohibiting it, zs_page_migrate calls
+ * trylock_zspage so it closes the race.
*/
- VM_BUG_ON_PAGE(!is_first_page(page), page);
- first_page = page;
+ first_page = get_first_page(page);
/*
* Without class lock, fullness is meaningless while constant
@@ -2228,9 +2300,18 @@ bool zs_page_isolate(struct page *page, isolate_mode_t mode)
if (!spin_trylock(&class->lock))
return false;
+ if (check_isolated_page(first_page))
+ goto skip_isolate;
+
+ /*
+ * If this is first time isolation for zspage, isolate zspage from
+ * size_class to prevent further allocations from the zspage.
+ */
get_zspage_mapping(first_page, &class_idx, &fullness);
remove_zspage(class, fullness, first_page);
SetZsPageIsolate(first_page);
+
+skip_isolate:
SetPageIsolated(page);
spin_unlock(&class->lock);
@@ -2253,7 +2334,7 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
VM_BUG_ON_PAGE(!PageMovable(page), page);
VM_BUG_ON_PAGE(!PageIsolated(page), page);
- first_page = page;
+ first_page = get_first_page(page);
get_zspage_mapping(first_page, &class_idx, &fullness);
pool = page->mapping->private_data;
class = pool->size_class[class_idx];
@@ -2268,6 +2349,13 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
if (get_zspage_inuse(first_page) == 0)
goto out_class_unlock;
+ /*
+ * It prevents first_page migration during tail page opeartion for
+ * get_first_page's stability.
+ */
+ if (!trylock_zspage(first_page, page))
+ goto out_class_unlock;
+
freezed = freeze_zspage(class, first_page);
if (freezed != get_zspage_inuse(first_page))
goto out_unfreeze;
@@ -2306,21 +2394,26 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
kunmap_atomic(addr);
replace_sub_page(class, first_page, newpage, page);
- first_page = newpage;
+ first_page = get_first_page(newpage);
get_page(newpage);
VM_BUG_ON_PAGE(get_fullness_group(class, first_page) ==
ZS_EMPTY, first_page);
- ClearZsPageIsolate(first_page);
- putback_zspage(class, first_page);
+ if (!check_isolated_page(first_page)) {
+ INIT_LIST_HEAD(&first_page->lru);
+ ClearZsPageIsolate(first_page);
+ putback_zspage(class, first_page);
+ }
+
/* Migration complete. Free old page */
reset_page(page);
ClearPageIsolated(page);
put_page(page);
ret = MIGRATEPAGE_SUCCESS;
-
+ page = newpage;
out_unfreeze:
unfreeze_zspage(class, first_page, freezed);
+ unlock_zspage(first_page, page);
out_class_unlock:
spin_unlock(&class->lock);
@@ -2338,7 +2431,7 @@ void zs_page_putback(struct page *page)
VM_BUG_ON_PAGE(!PageMovable(page), page);
VM_BUG_ON_PAGE(!PageIsolated(page), page);
- first_page = page;
+ first_page = get_first_page(page);
get_zspage_mapping(first_page, &class_idx, &fullness);
pool = page->mapping->private_data;
class = pool->size_class[class_idx];
@@ -2348,11 +2441,17 @@ void zs_page_putback(struct page *page)
* in zs_free will wait the page lock of @page without
* destroying of zspage.
*/
- INIT_LIST_HEAD(&first_page->lru);
spin_lock(&class->lock);
ClearPageIsolated(page);
- ClearZsPageIsolate(first_page);
- putback_zspage(class, first_page);
+ /*
+ * putback zspage to right list if this is last isolated page
+ * putback in the zspage.
+ */
+ if (!check_isolated_page(first_page)) {
+ INIT_LIST_HEAD(&first_page->lru);
+ ClearZsPageIsolate(first_page);
+ putback_zspage(class, first_page);
+ }
spin_unlock(&class->lock);
}
--
1.9.1
On Mon, Mar 21, 2016 at 03:31:02PM +0900, Minchan Kim wrote:
> We have allowed migration for only LRU pages until now and it was
> enough to make high-order pages. But recently, embedded system(e.g.,
> webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
> so we have seen several reports about troubles of small high-order
> allocation. For fixing the problem, there were several efforts
> (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
> reserved memory, vmalloc and so on) but if there are lots of
> non-movable pages in system, their solutions are void in the long run.
>
> So, this patch is to support facility to change non-movable pages
> with movable. For the feature, this patch introduces functions related
> to migration to address_space_operations as well as some page flags.
>
> Basically, this patch supports two page-flags and two functions related
> to page migration. The flag and page->mapping stability are protected
> by PG_lock.
>
> PG_movable
> PG_isolated
>
> bool (*isolate_page) (struct page *, isolate_mode_t);
> void (*putback_page) (struct page *);
>
> Duty of subsystem want to make their pages as migratable are
> as follows:
>
> 1. It should register address_space to page->mapping then mark
> the page as PG_movable via __SetPageMovable.
>
> 2. It should mark the page as PG_isolated via SetPageIsolated
> if isolation is sucessful and return true.
>
> 3. If migration is successful, it should clear PG_isolated and
> PG_movable of the page for free preparation then release the
> reference of the page to free.
>
> 4. If migration fails, putback function of subsystem should
> clear PG_isolated via ClearPageIsolated.
I think that this feature needs a separate document to describe
requirement of each step in more detail. For example, #1 can be
possible without holding a lock? I'm not sure because you lock
the page when implementing zsmalloc page migration in 15th patch.
#3 also need more explanation. Before release, we need to
unregister address_space. I guess that it needs to be done
in migratepage() but there is no explanation.
>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Gioh Kim <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> Documentation/filesystems/Locking | 4 +
> Documentation/filesystems/vfs.txt | 5 ++
> fs/proc/page.c | 3 +
> include/linux/fs.h | 2 +
> include/linux/migrate.h | 2 +
> include/linux/page-flags.h | 29 ++++++++
> include/uapi/linux/kernel-page-flags.h | 1 +
> mm/compaction.c | 14 +++-
> mm/migrate.c | 132 +++++++++++++++++++++++++++++----
> 9 files changed, 177 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index 619af9bfdcb3..0bb79560abb3 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -195,7 +195,9 @@ unlocks and drops the reference.
> int (*releasepage) (struct page *, int);
> void (*freepage)(struct page *);
> int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
> + bool (*isolate_page) (struct page *, isolate_mode_t);
> int (*migratepage)(struct address_space *, struct page *, struct page *);
> + void (*putback_page) (struct page *);
> int (*launder_page)(struct page *);
> int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
> int (*error_remove_page)(struct address_space *, struct page *);
> @@ -219,7 +221,9 @@ invalidatepage: yes
> releasepage: yes
> freepage: yes
> direct_IO:
> +isolate_page: yes
> migratepage: yes (both)
> +putback_page: yes
> launder_page: yes
> is_partially_uptodate: yes
> error_remove_page: yes
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index b02a7d598258..4c1b6c3b4bc8 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -592,9 +592,14 @@ struct address_space_operations {
> int (*releasepage) (struct page *, int);
> void (*freepage)(struct page *);
> ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
> + /* isolate a page for migration */
> + bool (*isolate_page) (struct page *, isolate_mode_t);
> /* migrate the contents of a page to the specified target */
> int (*migratepage) (struct page *, struct page *);
> + /* put the page back to right list */
> + void (*putback_page) (struct page *);
> int (*launder_page) (struct page *);
> +
> int (*is_partially_uptodate) (struct page *, unsigned long,
> unsigned long);
> void (*is_dirty_writeback) (struct page *, bool *, bool *);
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 712f1b9992cc..e2066e73a9b8 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page)
> if (page_is_idle(page))
> u |= 1 << KPF_IDLE;
>
> + if (PageMovable(page))
> + u |= 1 << KPF_MOVABLE;
> +
> u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
>
> u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 14a97194b34b..b7ef2e41fa4a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -401,6 +401,8 @@ struct address_space_operations {
> */
> int (*migratepage) (struct address_space *,
> struct page *, struct page *, enum migrate_mode);
> + bool (*isolate_page)(struct page *, isolate_mode_t);
> + void (*putback_page)(struct page *);
> int (*launder_page) (struct page *);
> int (*is_partially_uptodate) (struct page *, unsigned long,
> unsigned long);
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 9b50325e4ddf..404fbfefeb33 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
> struct page *, struct page *, enum migrate_mode);
> extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
> unsigned long private, enum migrate_mode mode, int reason);
> +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
> +extern void putback_movable_page(struct page *page);
>
> extern int migrate_prep(void);
> extern int migrate_prep_local(void);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f4ed4f1b0c77..3885064641c4 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -129,6 +129,10 @@ enum pageflags {
>
> /* Compound pages. Stored in first tail page's flags */
> PG_double_map = PG_private_2,
> +
> + /* non-lru movable pages */
> + PG_movable = PG_reclaim,
> + PG_isolated = PG_owner_priv_1,
> };
>
> #ifndef __GENERATING_BOUNDS_H
> @@ -614,6 +618,31 @@ static inline void __ClearPageBalloon(struct page *page)
> atomic_set(&page->_mapcount, -1);
> }
>
> +#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
> +
> +static inline int PageMovable(struct page *page)
> +{
> + return ((test_bit(PG_movable, &(page)->flags) &&
> + atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
> + || PageBalloon(page));
> +}
> +
> +/*
> + * Caller should hold a PG_lock */
> +static inline void __SetPageMovable(struct page *page)
> +{
> + __set_bit(PG_movable, &page->flags);
> + atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
> +}
I think there is no big benefit to use non-atomic version here.
PageMovable() is speculatively checked without holding a PG_lock
so some cpu can miss this flag set if we use non-atomic version.
> +
> +static inline void __ClearPageMovable(struct page *page)
> +{
> + atomic_set(&page->_mapcount, -1);
> + __clear_bit(PG_movable, &(page)->flags);
> +}
> +
> +PAGEFLAG(Isolated, isolated, PF_ANY);
> +
> /*
> * If network-based swap is enabled, sl*b must keep track of whether pages
> * were allocated from pfmemalloc reserves.
> diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> index 5da5f8751ce7..a184fd2434fa 100644
> --- a/include/uapi/linux/kernel-page-flags.h
> +++ b/include/uapi/linux/kernel-page-flags.h
> @@ -34,6 +34,7 @@
> #define KPF_BALLOON 23
> #define KPF_ZERO_PAGE 24
> #define KPF_IDLE 25
> +#define KPF_MOVABLE 26
>
>
> #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ccf97b02b85f..7557aedddaee 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>
> /*
> * Check may be lockless but that's ok as we recheck later.
> - * It's possible to migrate LRU pages and balloon pages
> + * It's possible to migrate LRU and movable kernel pages.
> * Skip any other type of page
> */
> is_lru = PageLRU(page);
> @@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> goto isolate_success;
> }
> }
> +
> + if (unlikely(PageMovable(page)) &&
> + !PageIsolated(page)) {
> + if (locked) {
> + spin_unlock_irqrestore(&zone->lru_lock,
> + flags);
> + locked = false;
> + }
> +
> + if (isolate_movable_page(page, isolate_mode))
> + goto isolate_success;
> + }
> }
>
> /*
> diff --git a/mm/migrate.c b/mm/migrate.c
> index b65c84267ce0..fc2842a15807 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -73,6 +73,75 @@ int migrate_prep_local(void)
> return 0;
> }
>
> +bool isolate_movable_page(struct page *page, isolate_mode_t mode)
> +{
> + bool ret = false;
> +
> + /*
> + * Avoid burning cycles with pages that are yet under __free_pages(),
> + * or just got freed under us.
> + *
> + * In case we 'win' a race for a movable page being freed under us and
> + * raise its refcount preventing __free_pages() from doing its job
> + * the put_page() at the end of this block will take care of
> + * release this page, thus avoiding a nasty leakage.
> + */
> + if (unlikely(!get_page_unless_zero(page)))
> + goto out;
After getting the ref counter, we need to re-check PageMovable()
to ensure that we indeed handle PageMovable() type page. Without it,
the page we handle can be freed and re-allocated to someone else
that isn't related to PageMovable() before grabbing the page. Trying
trylock_page() in this case could cause a problem.
> + /*
> + * As movable pages are not isolated from LRU lists, concurrent
> + * compaction threads can race against page migration functions
> + * as well as race against the releasing a page.
> + *
> + * In order to avoid having an already isolated movable page
> + * being (wrongly) re-isolated while it is under migration,
> + * or to avoid attempting to isolate pages being released,
> + * lets be sure we have the page lock
> + * before proceeding with the movable page isolation steps.
> + */
> + if (unlikely(!trylock_page(page)))
> + goto out_putpage;
> +
> + if (!PageMovable(page) || PageIsolated(page))
> + goto out_no_isolated;
> +
> + ret = page->mapping->a_ops->isolate_page(page, mode);
> + if (!ret)
> + goto out_no_isolated;
> +
> + WARN_ON_ONCE(!PageIsolated(page));
> + unlock_page(page);
> + return ret;
> +
> +out_no_isolated:
> + unlock_page(page);
> +out_putpage:
> + put_page(page);
> +out:
> + return ret;
> +}
> +
> +void putback_movable_page(struct page *page)
> +{
> + struct address_space *mapping;
> +
> + /*
> + * 'lock_page()' stabilizes the page and prevents races against
> + * concurrent isolation threads attempting to re-isolate it.
> + */
> + lock_page(page);
> + mapping = page_mapping(page);
> + if (mapping) {
> + mapping->a_ops->putback_page(page);
> + WARN_ON_ONCE(PageIsolated(page));
> + }
> + unlock_page(page);
> + /* drop the extra ref count taken for movable page isolation */
> + put_page(page);
> +}
This is complicated part for me. mapping can disappear? In this case,
who clear PageIsolated()?
> +
> +
> /*
> * Put previously isolated pages back onto the appropriate lists
> * from where they were once taken off for compaction/migration.
> @@ -96,6 +165,8 @@ void putback_movable_pages(struct list_head *l)
> page_is_file_cache(page));
> if (unlikely(isolated_balloon_page(page)))
> balloon_page_putback(page);
> + else if (unlikely(PageIsolated(page)))
> + putback_movable_page(page);
> else
> putback_lru_page(page);
> }
I think that this will not work. You uses PG_owner_priv_1 as
PG_isolated and it is possible that some lru pages has this flag.
I guess you need to add PageMovable() check but it seems that mapping
and this flag can be cleared by others.
> @@ -592,7 +663,7 @@ void migrate_page_copy(struct page *newpage, struct page *page)
> ***********************************************************/
>
> /*
> - * Common logic to directly migrate a single page suitable for
> + * Common logic to directly migrate a single LRU page suitable for
> * pages that do not use PagePrivate/PagePrivate2.
> *
> * Pages are locked upon entry and exit.
> @@ -755,24 +826,53 @@ static int move_to_new_page(struct page *newpage, struct page *page,
> enum migrate_mode mode)
> {
> struct address_space *mapping;
> - int rc;
> + int rc = -EAGAIN;
> + bool isolated_lru_page;
>
> VM_BUG_ON_PAGE(!PageLocked(page), page);
> VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
>
> mapping = page_mapping(page);
> - if (!mapping)
> - rc = migrate_page(mapping, newpage, page, mode);
> - else if (mapping->a_ops->migratepage)
> + /*
> + * In case of non-lru page, it could be released after
> + * isolation step. In that case, we shouldn't try
> + * fallback migration which was designed for LRU pages.
So, page we try to migrate can be released during migration.
In this case, who does clear mapping and flag? And, without mapping,
how we can clear PageIsolated()?
> + * To identify such pages, we cannot use PageMovable
> + * because owner of the page can reset it. So intead,
> + * use PG_isolated bit.
> + */
> + isolated_lru_page = !PageIsolated(page);
Ditto. PageIsolated() isn't sufficient to distinguish non-lru page.
My comment mainly points out rules about when/who clear mapping
and flag. Maybe, you need to answer just one of them. :)
Thanks.
On Tue, Mar 22, 2016 at 02:50:37PM +0900, Joonsoo Kim wrote:
> On Mon, Mar 21, 2016 at 03:31:02PM +0900, Minchan Kim wrote:
> > We have allowed migration for only LRU pages until now and it was
> > enough to make high-order pages. But recently, embedded system(e.g.,
> > webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
> > so we have seen several reports about troubles of small high-order
> > allocation. For fixing the problem, there were several efforts
> > (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
> > reserved memory, vmalloc and so on) but if there are lots of
> > non-movable pages in system, their solutions are void in the long run.
> >
> > So, this patch is to support facility to change non-movable pages
> > with movable. For the feature, this patch introduces functions related
> > to migration to address_space_operations as well as some page flags.
> >
> > Basically, this patch supports two page-flags and two functions related
> > to page migration. The flag and page->mapping stability are protected
> > by PG_lock.
> >
> > PG_movable
> > PG_isolated
> >
> > bool (*isolate_page) (struct page *, isolate_mode_t);
> > void (*putback_page) (struct page *);
> >
> > Duty of subsystem want to make their pages as migratable are
> > as follows:
> >
> > 1. It should register address_space to page->mapping then mark
> > the page as PG_movable via __SetPageMovable.
> >
> > 2. It should mark the page as PG_isolated via SetPageIsolated
> > if isolation is sucessful and return true.
> >
> > 3. If migration is successful, it should clear PG_isolated and
> > PG_movable of the page for free preparation then release the
> > reference of the page to free.
> >
> > 4. If migration fails, putback function of subsystem should
> > clear PG_isolated via ClearPageIsolated.
>
> I think that this feature needs a separate document to describe
> requirement of each step in more detail. For example, #1 can be
> possible without holding a lock? I'm not sure because you lock
> the page when implementing zsmalloc page migration in 15th patch.
Yes, we needs PG_lock because install page->mapping and PG_movable
should be atomic and PG_lock protects it.
Better interface might be
void __SetPageMovable(struct page *page, sruct address_space *mapping);
>
> #3 also need more explanation. Before release, we need to
> unregister address_space. I guess that it needs to be done
> in migratepage() but there is no explanation.
Okay, we can unregister address_space in __ClearPageMovable.
I will change it.
>
> >
> > Cc: Vlastimil Babka <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Hugh Dickins <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Gioh Kim <[email protected]>
> > Signed-off-by: Minchan Kim <[email protected]>
> > ---
> > Documentation/filesystems/Locking | 4 +
> > Documentation/filesystems/vfs.txt | 5 ++
> > fs/proc/page.c | 3 +
> > include/linux/fs.h | 2 +
> > include/linux/migrate.h | 2 +
> > include/linux/page-flags.h | 29 ++++++++
> > include/uapi/linux/kernel-page-flags.h | 1 +
> > mm/compaction.c | 14 +++-
> > mm/migrate.c | 132 +++++++++++++++++++++++++++++----
> > 9 files changed, 177 insertions(+), 15 deletions(-)
> >
> > diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> > index 619af9bfdcb3..0bb79560abb3 100644
> > --- a/Documentation/filesystems/Locking
> > +++ b/Documentation/filesystems/Locking
> > @@ -195,7 +195,9 @@ unlocks and drops the reference.
> > int (*releasepage) (struct page *, int);
> > void (*freepage)(struct page *);
> > int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
> > + bool (*isolate_page) (struct page *, isolate_mode_t);
> > int (*migratepage)(struct address_space *, struct page *, struct page *);
> > + void (*putback_page) (struct page *);
> > int (*launder_page)(struct page *);
> > int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
> > int (*error_remove_page)(struct address_space *, struct page *);
> > @@ -219,7 +221,9 @@ invalidatepage: yes
> > releasepage: yes
> > freepage: yes
> > direct_IO:
> > +isolate_page: yes
> > migratepage: yes (both)
> > +putback_page: yes
> > launder_page: yes
> > is_partially_uptodate: yes
> > error_remove_page: yes
> > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> > index b02a7d598258..4c1b6c3b4bc8 100644
> > --- a/Documentation/filesystems/vfs.txt
> > +++ b/Documentation/filesystems/vfs.txt
> > @@ -592,9 +592,14 @@ struct address_space_operations {
> > int (*releasepage) (struct page *, int);
> > void (*freepage)(struct page *);
> > ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
> > + /* isolate a page for migration */
> > + bool (*isolate_page) (struct page *, isolate_mode_t);
> > /* migrate the contents of a page to the specified target */
> > int (*migratepage) (struct page *, struct page *);
> > + /* put the page back to right list */
> > + void (*putback_page) (struct page *);
> > int (*launder_page) (struct page *);
> > +
> > int (*is_partially_uptodate) (struct page *, unsigned long,
> > unsigned long);
> > void (*is_dirty_writeback) (struct page *, bool *, bool *);
> > diff --git a/fs/proc/page.c b/fs/proc/page.c
> > index 712f1b9992cc..e2066e73a9b8 100644
> > --- a/fs/proc/page.c
> > +++ b/fs/proc/page.c
> > @@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page)
> > if (page_is_idle(page))
> > u |= 1 << KPF_IDLE;
> >
> > + if (PageMovable(page))
> > + u |= 1 << KPF_MOVABLE;
> > +
> > u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
> >
> > u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 14a97194b34b..b7ef2e41fa4a 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -401,6 +401,8 @@ struct address_space_operations {
> > */
> > int (*migratepage) (struct address_space *,
> > struct page *, struct page *, enum migrate_mode);
> > + bool (*isolate_page)(struct page *, isolate_mode_t);
> > + void (*putback_page)(struct page *);
> > int (*launder_page) (struct page *);
> > int (*is_partially_uptodate) (struct page *, unsigned long,
> > unsigned long);
> > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > index 9b50325e4ddf..404fbfefeb33 100644
> > --- a/include/linux/migrate.h
> > +++ b/include/linux/migrate.h
> > @@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
> > struct page *, struct page *, enum migrate_mode);
> > extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
> > unsigned long private, enum migrate_mode mode, int reason);
> > +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
> > +extern void putback_movable_page(struct page *page);
> >
> > extern int migrate_prep(void);
> > extern int migrate_prep_local(void);
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index f4ed4f1b0c77..3885064641c4 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -129,6 +129,10 @@ enum pageflags {
> >
> > /* Compound pages. Stored in first tail page's flags */
> > PG_double_map = PG_private_2,
> > +
> > + /* non-lru movable pages */
> > + PG_movable = PG_reclaim,
> > + PG_isolated = PG_owner_priv_1,
> > };
> >
> > #ifndef __GENERATING_BOUNDS_H
> > @@ -614,6 +618,31 @@ static inline void __ClearPageBalloon(struct page *page)
> > atomic_set(&page->_mapcount, -1);
> > }
> >
> > +#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
> > +
> > +static inline int PageMovable(struct page *page)
> > +{
> > + return ((test_bit(PG_movable, &(page)->flags) &&
> > + atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
> > + || PageBalloon(page));
> > +}
> > +
> > +/*
> > + * Caller should hold a PG_lock */
> > +static inline void __SetPageMovable(struct page *page)
> > +{
> > + __set_bit(PG_movable, &page->flags);
> > + atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
> > +}
>
> I think there is no big benefit to use non-atomic version here.
> PageMovable() is speculatively checked without holding a PG_lock
> so some cpu can miss this flag set if we use non-atomic version.
I wanted to show that double underscore is non-atomic so caller
should take care of the lock(i.e., PG_lock).
If we use atomic version, what kinds of benefit do we have?
Without holding PG_lock, atomic version could be raced, too.
>
> > +
> > +static inline void __ClearPageMovable(struct page *page)
> > +{
> > + atomic_set(&page->_mapcount, -1);
> > + __clear_bit(PG_movable, &(page)->flags);
> > +}
> > +
> > +PAGEFLAG(Isolated, isolated, PF_ANY);
> > +
> > /*
> > * If network-based swap is enabled, sl*b must keep track of whether pages
> > * were allocated from pfmemalloc reserves.
> > diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> > index 5da5f8751ce7..a184fd2434fa 100644
> > --- a/include/uapi/linux/kernel-page-flags.h
> > +++ b/include/uapi/linux/kernel-page-flags.h
> > @@ -34,6 +34,7 @@
> > #define KPF_BALLOON 23
> > #define KPF_ZERO_PAGE 24
> > #define KPF_IDLE 25
> > +#define KPF_MOVABLE 26
> >
> >
> > #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index ccf97b02b85f..7557aedddaee 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >
> > /*
> > * Check may be lockless but that's ok as we recheck later.
> > - * It's possible to migrate LRU pages and balloon pages
> > + * It's possible to migrate LRU and movable kernel pages.
> > * Skip any other type of page
> > */
> > is_lru = PageLRU(page);
> > @@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > goto isolate_success;
> > }
> > }
> > +
> > + if (unlikely(PageMovable(page)) &&
> > + !PageIsolated(page)) {
> > + if (locked) {
> > + spin_unlock_irqrestore(&zone->lru_lock,
> > + flags);
> > + locked = false;
> > + }
> > +
> > + if (isolate_movable_page(page, isolate_mode))
> > + goto isolate_success;
> > + }
> > }
> >
> > /*
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index b65c84267ce0..fc2842a15807 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -73,6 +73,75 @@ int migrate_prep_local(void)
> > return 0;
> > }
> >
> > +bool isolate_movable_page(struct page *page, isolate_mode_t mode)
> > +{
> > + bool ret = false;
> > +
> > + /*
> > + * Avoid burning cycles with pages that are yet under __free_pages(),
> > + * or just got freed under us.
> > + *
> > + * In case we 'win' a race for a movable page being freed under us and
> > + * raise its refcount preventing __free_pages() from doing its job
> > + * the put_page() at the end of this block will take care of
> > + * release this page, thus avoiding a nasty leakage.
> > + */
> > + if (unlikely(!get_page_unless_zero(page)))
> > + goto out;
>
> After getting the ref counter, we need to re-check PageMovable()
> to ensure that we indeed handle PageMovable() type page. Without it,
> the page we handle can be freed and re-allocated to someone else
> that isn't related to PageMovable() before grabbing the page. Trying
> trylock_page() in this case could cause a problem.
I don't get it. Why do you think trylock_page could cause a problem?
Could you elaborate it more?
>
> > + /*
> > + * As movable pages are not isolated from LRU lists, concurrent
> > + * compaction threads can race against page migration functions
> > + * as well as race against the releasing a page.
> > + *
> > + * In order to avoid having an already isolated movable page
> > + * being (wrongly) re-isolated while it is under migration,
> > + * or to avoid attempting to isolate pages being released,
> > + * lets be sure we have the page lock
> > + * before proceeding with the movable page isolation steps.
> > + */
> > + if (unlikely(!trylock_page(page)))
> > + goto out_putpage;
> > +
> > + if (!PageMovable(page) || PageIsolated(page))
> > + goto out_no_isolated;
> > +
> > + ret = page->mapping->a_ops->isolate_page(page, mode);
> > + if (!ret)
> > + goto out_no_isolated;
> > +
> > + WARN_ON_ONCE(!PageIsolated(page));
> > + unlock_page(page);
> > + return ret;
> > +
> > +out_no_isolated:
> > + unlock_page(page);
> > +out_putpage:
> > + put_page(page);
> > +out:
> > + return ret;
> > +}
> > +
> > +void putback_movable_page(struct page *page)
> > +{
> > + struct address_space *mapping;
> > +
> > + /*
> > + * 'lock_page()' stabilizes the page and prevents races against
> > + * concurrent isolation threads attempting to re-isolate it.
> > + */
> > + lock_page(page);
> > + mapping = page_mapping(page);
> > + if (mapping) {
> > + mapping->a_ops->putback_page(page);
> > + WARN_ON_ONCE(PageIsolated(page));
> > + }
> > + unlock_page(page);
> > + /* drop the extra ref count taken for movable page isolation */
> > + put_page(page);
> > +}
>
> This is complicated part for me. mapping can disappear? In this case,
> who clear PageIsolated()?
Page's owner, for exmaple, zsmalloc, virtio-balloon.
They can free page whenever they want once it holds a PG_lock.
They should clear mapping and PG_movable with PG_lock.
>
> > +
> > +
> > /*
> > * Put previously isolated pages back onto the appropriate lists
> > * from where they were once taken off for compaction/migration.
> > @@ -96,6 +165,8 @@ void putback_movable_pages(struct list_head *l)
> > page_is_file_cache(page));
> > if (unlikely(isolated_balloon_page(page)))
> > balloon_page_putback(page);
> > + else if (unlikely(PageIsolated(page)))
> > + putback_movable_page(page);
> > else
> > putback_lru_page(page);
> > }
>
> I think that this will not work. You uses PG_owner_priv_1 as
> PG_isolated and it is possible that some lru pages has this flag.
> I guess you need to add PageMovable() check but it seems that mapping
> and this flag can be cleared by others.
Hmm, PageMovable check may work because If PageMovable check fails,
it means page's owner free the page so we can simple put the page to
release refcount in here.
I will check it.
>
> > @@ -592,7 +663,7 @@ void migrate_page_copy(struct page *newpage, struct page *page)
> > ***********************************************************/
> >
> > /*
> > - * Common logic to directly migrate a single page suitable for
> > + * Common logic to directly migrate a single LRU page suitable for
> > * pages that do not use PagePrivate/PagePrivate2.
> > *
> > * Pages are locked upon entry and exit.
> > @@ -755,24 +826,53 @@ static int move_to_new_page(struct page *newpage, struct page *page,
> > enum migrate_mode mode)
> > {
> > struct address_space *mapping;
> > - int rc;
> > + int rc = -EAGAIN;
> > + bool isolated_lru_page;
> >
> > VM_BUG_ON_PAGE(!PageLocked(page), page);
> > VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
> >
> > mapping = page_mapping(page);
> > - if (!mapping)
> > - rc = migrate_page(mapping, newpage, page, mode);
> > - else if (mapping->a_ops->migratepage)
> > + /*
> > + * In case of non-lru page, it could be released after
> > + * isolation step. In that case, we shouldn't try
> > + * fallback migration which was designed for LRU pages.
>
> So, page we try to migrate can be released during migration.
> In this case, who does clear mapping and flag? And, without mapping,
> how we can clear PageIsolated()?
As I wrote down in description, it's role of user.
>
> > + * To identify such pages, we cannot use PageMovable
> > + * because owner of the page can reset it. So intead,
> > + * use PG_isolated bit.
> > + */
> > + isolated_lru_page = !PageIsolated(page);
>
> Ditto. PageIsolated() isn't sufficient to distinguish non-lru page.
Okay, I will see it.
>
> My comment mainly points out rules about when/who clear mapping
> and flag. Maybe, you need to answer just one of them. :)
I agree document is really lack of information at the moment.
I will put more words.
Thanks for the review, Joonsoo!
On Tue, Mar 22, 2016 at 11:55:45PM +0900, Minchan Kim wrote:
> On Tue, Mar 22, 2016 at 02:50:37PM +0900, Joonsoo Kim wrote:
> > On Mon, Mar 21, 2016 at 03:31:02PM +0900, Minchan Kim wrote:
> > > We have allowed migration for only LRU pages until now and it was
> > > enough to make high-order pages. But recently, embedded system(e.g.,
> > > webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
> > > so we have seen several reports about troubles of small high-order
> > > allocation. For fixing the problem, there were several efforts
> > > (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
> > > reserved memory, vmalloc and so on) but if there are lots of
> > > non-movable pages in system, their solutions are void in the long run.
> > >
> > > So, this patch is to support facility to change non-movable pages
> > > with movable. For the feature, this patch introduces functions related
> > > to migration to address_space_operations as well as some page flags.
> > >
> > > Basically, this patch supports two page-flags and two functions related
> > > to page migration. The flag and page->mapping stability are protected
> > > by PG_lock.
> > >
> > > PG_movable
> > > PG_isolated
> > >
> > > bool (*isolate_page) (struct page *, isolate_mode_t);
> > > void (*putback_page) (struct page *);
> > >
> > > Duty of subsystem want to make their pages as migratable are
> > > as follows:
> > >
> > > 1. It should register address_space to page->mapping then mark
> > > the page as PG_movable via __SetPageMovable.
> > >
> > > 2. It should mark the page as PG_isolated via SetPageIsolated
> > > if isolation is sucessful and return true.
> > >
> > > 3. If migration is successful, it should clear PG_isolated and
> > > PG_movable of the page for free preparation then release the
> > > reference of the page to free.
> > >
> > > 4. If migration fails, putback function of subsystem should
> > > clear PG_isolated via ClearPageIsolated.
> >
> > I think that this feature needs a separate document to describe
> > requirement of each step in more detail. For example, #1 can be
> > possible without holding a lock? I'm not sure because you lock
> > the page when implementing zsmalloc page migration in 15th patch.
>
> Yes, we needs PG_lock because install page->mapping and PG_movable
> should be atomic and PG_lock protects it.
>
> Better interface might be
>
> void __SetPageMovable(struct page *page, sruct address_space *mapping);
>
> >
> > #3 also need more explanation. Before release, we need to
> > unregister address_space. I guess that it needs to be done
> > in migratepage() but there is no explanation.
>
> Okay, we can unregister address_space in __ClearPageMovable.
> I will change it.
>
> >
> > >
> > > Cc: Vlastimil Babka <[email protected]>
> > > Cc: Mel Gorman <[email protected]>
> > > Cc: Hugh Dickins <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Signed-off-by: Gioh Kim <[email protected]>
> > > Signed-off-by: Minchan Kim <[email protected]>
> > > ---
> > > Documentation/filesystems/Locking | 4 +
> > > Documentation/filesystems/vfs.txt | 5 ++
> > > fs/proc/page.c | 3 +
> > > include/linux/fs.h | 2 +
> > > include/linux/migrate.h | 2 +
> > > include/linux/page-flags.h | 29 ++++++++
> > > include/uapi/linux/kernel-page-flags.h | 1 +
> > > mm/compaction.c | 14 +++-
> > > mm/migrate.c | 132 +++++++++++++++++++++++++++++----
> > > 9 files changed, 177 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> > > index 619af9bfdcb3..0bb79560abb3 100644
> > > --- a/Documentation/filesystems/Locking
> > > +++ b/Documentation/filesystems/Locking
> > > @@ -195,7 +195,9 @@ unlocks and drops the reference.
> > > int (*releasepage) (struct page *, int);
> > > void (*freepage)(struct page *);
> > > int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
> > > + bool (*isolate_page) (struct page *, isolate_mode_t);
> > > int (*migratepage)(struct address_space *, struct page *, struct page *);
> > > + void (*putback_page) (struct page *);
> > > int (*launder_page)(struct page *);
> > > int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
> > > int (*error_remove_page)(struct address_space *, struct page *);
> > > @@ -219,7 +221,9 @@ invalidatepage: yes
> > > releasepage: yes
> > > freepage: yes
> > > direct_IO:
> > > +isolate_page: yes
> > > migratepage: yes (both)
> > > +putback_page: yes
> > > launder_page: yes
> > > is_partially_uptodate: yes
> > > error_remove_page: yes
> > > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> > > index b02a7d598258..4c1b6c3b4bc8 100644
> > > --- a/Documentation/filesystems/vfs.txt
> > > +++ b/Documentation/filesystems/vfs.txt
> > > @@ -592,9 +592,14 @@ struct address_space_operations {
> > > int (*releasepage) (struct page *, int);
> > > void (*freepage)(struct page *);
> > > ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
> > > + /* isolate a page for migration */
> > > + bool (*isolate_page) (struct page *, isolate_mode_t);
> > > /* migrate the contents of a page to the specified target */
> > > int (*migratepage) (struct page *, struct page *);
> > > + /* put the page back to right list */
> > > + void (*putback_page) (struct page *);
> > > int (*launder_page) (struct page *);
> > > +
> > > int (*is_partially_uptodate) (struct page *, unsigned long,
> > > unsigned long);
> > > void (*is_dirty_writeback) (struct page *, bool *, bool *);
> > > diff --git a/fs/proc/page.c b/fs/proc/page.c
> > > index 712f1b9992cc..e2066e73a9b8 100644
> > > --- a/fs/proc/page.c
> > > +++ b/fs/proc/page.c
> > > @@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page)
> > > if (page_is_idle(page))
> > > u |= 1 << KPF_IDLE;
> > >
> > > + if (PageMovable(page))
> > > + u |= 1 << KPF_MOVABLE;
> > > +
> > > u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
> > >
> > > u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index 14a97194b34b..b7ef2e41fa4a 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -401,6 +401,8 @@ struct address_space_operations {
> > > */
> > > int (*migratepage) (struct address_space *,
> > > struct page *, struct page *, enum migrate_mode);
> > > + bool (*isolate_page)(struct page *, isolate_mode_t);
> > > + void (*putback_page)(struct page *);
> > > int (*launder_page) (struct page *);
> > > int (*is_partially_uptodate) (struct page *, unsigned long,
> > > unsigned long);
> > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > > index 9b50325e4ddf..404fbfefeb33 100644
> > > --- a/include/linux/migrate.h
> > > +++ b/include/linux/migrate.h
> > > @@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
> > > struct page *, struct page *, enum migrate_mode);
> > > extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
> > > unsigned long private, enum migrate_mode mode, int reason);
> > > +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
> > > +extern void putback_movable_page(struct page *page);
> > >
> > > extern int migrate_prep(void);
> > > extern int migrate_prep_local(void);
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index f4ed4f1b0c77..3885064641c4 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -129,6 +129,10 @@ enum pageflags {
> > >
> > > /* Compound pages. Stored in first tail page's flags */
> > > PG_double_map = PG_private_2,
> > > +
> > > + /* non-lru movable pages */
> > > + PG_movable = PG_reclaim,
> > > + PG_isolated = PG_owner_priv_1,
> > > };
> > >
> > > #ifndef __GENERATING_BOUNDS_H
> > > @@ -614,6 +618,31 @@ static inline void __ClearPageBalloon(struct page *page)
> > > atomic_set(&page->_mapcount, -1);
> > > }
> > >
> > > +#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
> > > +
> > > +static inline int PageMovable(struct page *page)
> > > +{
> > > + return ((test_bit(PG_movable, &(page)->flags) &&
> > > + atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
> > > + || PageBalloon(page));
> > > +}
> > > +
> > > +/*
> > > + * Caller should hold a PG_lock */
> > > +static inline void __SetPageMovable(struct page *page)
> > > +{
> > > + __set_bit(PG_movable, &page->flags);
> > > + atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
> > > +}
> >
> > I think there is no big benefit to use non-atomic version here.
> > PageMovable() is speculatively checked without holding a PG_lock
> > so some cpu can miss this flag set if we use non-atomic version.
>
> I wanted to show that double underscore is non-atomic so caller
> should take care of the lock(i.e., PG_lock).
> If we use atomic version, what kinds of benefit do we have?
> Without holding PG_lock, atomic version could be raced, too.
My suggestion is holding PG_lock + atomic set. Compaction first
checks PageMovable() without PG_lock so it can miss PageMovable() if
non-atomic version is used.
>
> >
> > > +
> > > +static inline void __ClearPageMovable(struct page *page)
> > > +{
> > > + atomic_set(&page->_mapcount, -1);
> > > + __clear_bit(PG_movable, &(page)->flags);
> > > +}
> > > +
> > > +PAGEFLAG(Isolated, isolated, PF_ANY);
> > > +
> > > /*
> > > * If network-based swap is enabled, sl*b must keep track of whether pages
> > > * were allocated from pfmemalloc reserves.
> > > diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> > > index 5da5f8751ce7..a184fd2434fa 100644
> > > --- a/include/uapi/linux/kernel-page-flags.h
> > > +++ b/include/uapi/linux/kernel-page-flags.h
> > > @@ -34,6 +34,7 @@
> > > #define KPF_BALLOON 23
> > > #define KPF_ZERO_PAGE 24
> > > #define KPF_IDLE 25
> > > +#define KPF_MOVABLE 26
> > >
> > >
> > > #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > index ccf97b02b85f..7557aedddaee 100644
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > >
> > > /*
> > > * Check may be lockless but that's ok as we recheck later.
> > > - * It's possible to migrate LRU pages and balloon pages
> > > + * It's possible to migrate LRU and movable kernel pages.
> > > * Skip any other type of page
> > > */
> > > is_lru = PageLRU(page);
> > > @@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > > goto isolate_success;
> > > }
> > > }
> > > +
> > > + if (unlikely(PageMovable(page)) &&
> > > + !PageIsolated(page)) {
> > > + if (locked) {
> > > + spin_unlock_irqrestore(&zone->lru_lock,
> > > + flags);
> > > + locked = false;
> > > + }
> > > +
> > > + if (isolate_movable_page(page, isolate_mode))
> > > + goto isolate_success;
> > > + }
> > > }
> > >
> > > /*
> > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > index b65c84267ce0..fc2842a15807 100644
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -73,6 +73,75 @@ int migrate_prep_local(void)
> > > return 0;
> > > }
> > >
> > > +bool isolate_movable_page(struct page *page, isolate_mode_t mode)
> > > +{
> > > + bool ret = false;
> > > +
> > > + /*
> > > + * Avoid burning cycles with pages that are yet under __free_pages(),
> > > + * or just got freed under us.
> > > + *
> > > + * In case we 'win' a race for a movable page being freed under us and
> > > + * raise its refcount preventing __free_pages() from doing its job
> > > + * the put_page() at the end of this block will take care of
> > > + * release this page, thus avoiding a nasty leakage.
> > > + */
> > > + if (unlikely(!get_page_unless_zero(page)))
> > > + goto out;
> >
> > After getting the ref counter, we need to re-check PageMovable()
> > to ensure that we indeed handle PageMovable() type page. Without it,
> > the page we handle can be freed and re-allocated to someone else
> > that isn't related to PageMovable() before grabbing the page. Trying
> > trylock_page() in this case could cause a problem.
>
> I don't get it. Why do you think trylock_page could cause a problem?
> Could you elaborate it more?
Okay. Consider following sequence.
CPU-A CPU-B
check PageMovable() in compacton
... free the page
... allocate the page for other usecase
... (maybe for file cache or slub)
get unless 0 in isolate_movable_page()
trylock (success)
(try) lock! failed!
In this case, someone can see failure even if they are owner of the
page. IIUC, this also can happen in zsmalloc. See init_zspage() in
15th patch. It assume that allocated page can be locked
unconditionally.
> >
> > > + /*
> > > + * As movable pages are not isolated from LRU lists, concurrent
> > > + * compaction threads can race against page migration functions
> > > + * as well as race against the releasing a page.
> > > + *
> > > + * In order to avoid having an already isolated movable page
> > > + * being (wrongly) re-isolated while it is under migration,
> > > + * or to avoid attempting to isolate pages being released,
> > > + * lets be sure we have the page lock
> > > + * before proceeding with the movable page isolation steps.
> > > + */
> > > + if (unlikely(!trylock_page(page)))
> > > + goto out_putpage;
> > > +
> > > + if (!PageMovable(page) || PageIsolated(page))
> > > + goto out_no_isolated;
> > > +
> > > + ret = page->mapping->a_ops->isolate_page(page, mode);
> > > + if (!ret)
> > > + goto out_no_isolated;
> > > +
> > > + WARN_ON_ONCE(!PageIsolated(page));
> > > + unlock_page(page);
> > > + return ret;
> > > +
> > > +out_no_isolated:
> > > + unlock_page(page);
> > > +out_putpage:
> > > + put_page(page);
> > > +out:
> > > + return ret;
> > > +}
> > > +
> > > +void putback_movable_page(struct page *page)
> > > +{
> > > + struct address_space *mapping;
> > > +
> > > + /*
> > > + * 'lock_page()' stabilizes the page and prevents races against
> > > + * concurrent isolation threads attempting to re-isolate it.
> > > + */
> > > + lock_page(page);
> > > + mapping = page_mapping(page);
> > > + if (mapping) {
> > > + mapping->a_ops->putback_page(page);
> > > + WARN_ON_ONCE(PageIsolated(page));
> > > + }
> > > + unlock_page(page);
> > > + /* drop the extra ref count taken for movable page isolation */
> > > + put_page(page);
> > > +}
> >
> > This is complicated part for me. mapping can disappear? In this case,
> > who clear PageIsolated()?
>
> Page's owner, for exmaple, zsmalloc, virtio-balloon.
> They can free page whenever they want once it holds a PG_lock.
> They should clear mapping and PG_movable with PG_lock.
>
> >
> > > +
> > > +
> > > /*
> > > * Put previously isolated pages back onto the appropriate lists
> > > * from where they were once taken off for compaction/migration.
> > > @@ -96,6 +165,8 @@ void putback_movable_pages(struct list_head *l)
> > > page_is_file_cache(page));
> > > if (unlikely(isolated_balloon_page(page)))
> > > balloon_page_putback(page);
> > > + else if (unlikely(PageIsolated(page)))
> > > + putback_movable_page(page);
> > > else
> > > putback_lru_page(page);
> > > }
> >
> > I think that this will not work. You uses PG_owner_priv_1 as
> > PG_isolated and it is possible that some lru pages has this flag.
> > I guess you need to add PageMovable() check but it seems that mapping
> > and this flag can be cleared by others.
>
> Hmm, PageMovable check may work because If PageMovable check fails,
> it means page's owner free the page so we can simple put the page to
> release refcount in here.
> I will check it.
Hmmm... But, in failure case, is it safe to call putback_lru_page() for them?
And, PageIsolated() would be left. Is it okay? It's not symmetric that
isolated page can be freed by decreasing ref count without calling
putback function. This should be clarified and documented.
Thanks.
On Wed, Mar 23, 2016 at 02:05:11PM +0900, Joonsoo Kim wrote:
> On Tue, Mar 22, 2016 at 11:55:45PM +0900, Minchan Kim wrote:
> > On Tue, Mar 22, 2016 at 02:50:37PM +0900, Joonsoo Kim wrote:
> > > On Mon, Mar 21, 2016 at 03:31:02PM +0900, Minchan Kim wrote:
> > > > We have allowed migration for only LRU pages until now and it was
> > > > enough to make high-order pages. But recently, embedded system(e.g.,
> > > > webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
> > > > so we have seen several reports about troubles of small high-order
> > > > allocation. For fixing the problem, there were several efforts
> > > > (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
> > > > reserved memory, vmalloc and so on) but if there are lots of
> > > > non-movable pages in system, their solutions are void in the long run.
> > > >
> > > > So, this patch is to support facility to change non-movable pages
> > > > with movable. For the feature, this patch introduces functions related
> > > > to migration to address_space_operations as well as some page flags.
> > > >
> > > > Basically, this patch supports two page-flags and two functions related
> > > > to page migration. The flag and page->mapping stability are protected
> > > > by PG_lock.
> > > >
> > > > PG_movable
> > > > PG_isolated
> > > >
> > > > bool (*isolate_page) (struct page *, isolate_mode_t);
> > > > void (*putback_page) (struct page *);
> > > >
> > > > Duty of subsystem want to make their pages as migratable are
> > > > as follows:
> > > >
> > > > 1. It should register address_space to page->mapping then mark
> > > > the page as PG_movable via __SetPageMovable.
> > > >
> > > > 2. It should mark the page as PG_isolated via SetPageIsolated
> > > > if isolation is sucessful and return true.
> > > >
> > > > 3. If migration is successful, it should clear PG_isolated and
> > > > PG_movable of the page for free preparation then release the
> > > > reference of the page to free.
> > > >
> > > > 4. If migration fails, putback function of subsystem should
> > > > clear PG_isolated via ClearPageIsolated.
> > >
> > > I think that this feature needs a separate document to describe
> > > requirement of each step in more detail. For example, #1 can be
> > > possible without holding a lock? I'm not sure because you lock
> > > the page when implementing zsmalloc page migration in 15th patch.
> >
> > Yes, we needs PG_lock because install page->mapping and PG_movable
> > should be atomic and PG_lock protects it.
> >
> > Better interface might be
> >
> > void __SetPageMovable(struct page *page, sruct address_space *mapping);
> >
> > >
> > > #3 also need more explanation. Before release, we need to
> > > unregister address_space. I guess that it needs to be done
> > > in migratepage() but there is no explanation.
> >
> > Okay, we can unregister address_space in __ClearPageMovable.
> > I will change it.
> >
> > >
> > > >
> > > > Cc: Vlastimil Babka <[email protected]>
> > > > Cc: Mel Gorman <[email protected]>
> > > > Cc: Hugh Dickins <[email protected]>
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > Signed-off-by: Gioh Kim <[email protected]>
> > > > Signed-off-by: Minchan Kim <[email protected]>
> > > > ---
> > > > Documentation/filesystems/Locking | 4 +
> > > > Documentation/filesystems/vfs.txt | 5 ++
> > > > fs/proc/page.c | 3 +
> > > > include/linux/fs.h | 2 +
> > > > include/linux/migrate.h | 2 +
> > > > include/linux/page-flags.h | 29 ++++++++
> > > > include/uapi/linux/kernel-page-flags.h | 1 +
> > > > mm/compaction.c | 14 +++-
> > > > mm/migrate.c | 132 +++++++++++++++++++++++++++++----
> > > > 9 files changed, 177 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> > > > index 619af9bfdcb3..0bb79560abb3 100644
> > > > --- a/Documentation/filesystems/Locking
> > > > +++ b/Documentation/filesystems/Locking
> > > > @@ -195,7 +195,9 @@ unlocks and drops the reference.
> > > > int (*releasepage) (struct page *, int);
> > > > void (*freepage)(struct page *);
> > > > int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
> > > > + bool (*isolate_page) (struct page *, isolate_mode_t);
> > > > int (*migratepage)(struct address_space *, struct page *, struct page *);
> > > > + void (*putback_page) (struct page *);
> > > > int (*launder_page)(struct page *);
> > > > int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
> > > > int (*error_remove_page)(struct address_space *, struct page *);
> > > > @@ -219,7 +221,9 @@ invalidatepage: yes
> > > > releasepage: yes
> > > > freepage: yes
> > > > direct_IO:
> > > > +isolate_page: yes
> > > > migratepage: yes (both)
> > > > +putback_page: yes
> > > > launder_page: yes
> > > > is_partially_uptodate: yes
> > > > error_remove_page: yes
> > > > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> > > > index b02a7d598258..4c1b6c3b4bc8 100644
> > > > --- a/Documentation/filesystems/vfs.txt
> > > > +++ b/Documentation/filesystems/vfs.txt
> > > > @@ -592,9 +592,14 @@ struct address_space_operations {
> > > > int (*releasepage) (struct page *, int);
> > > > void (*freepage)(struct page *);
> > > > ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
> > > > + /* isolate a page for migration */
> > > > + bool (*isolate_page) (struct page *, isolate_mode_t);
> > > > /* migrate the contents of a page to the specified target */
> > > > int (*migratepage) (struct page *, struct page *);
> > > > + /* put the page back to right list */
> > > > + void (*putback_page) (struct page *);
> > > > int (*launder_page) (struct page *);
> > > > +
> > > > int (*is_partially_uptodate) (struct page *, unsigned long,
> > > > unsigned long);
> > > > void (*is_dirty_writeback) (struct page *, bool *, bool *);
> > > > diff --git a/fs/proc/page.c b/fs/proc/page.c
> > > > index 712f1b9992cc..e2066e73a9b8 100644
> > > > --- a/fs/proc/page.c
> > > > +++ b/fs/proc/page.c
> > > > @@ -157,6 +157,9 @@ u64 stable_page_flags(struct page *page)
> > > > if (page_is_idle(page))
> > > > u |= 1 << KPF_IDLE;
> > > >
> > > > + if (PageMovable(page))
> > > > + u |= 1 << KPF_MOVABLE;
> > > > +
> > > > u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
> > > >
> > > > u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
> > > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > > index 14a97194b34b..b7ef2e41fa4a 100644
> > > > --- a/include/linux/fs.h
> > > > +++ b/include/linux/fs.h
> > > > @@ -401,6 +401,8 @@ struct address_space_operations {
> > > > */
> > > > int (*migratepage) (struct address_space *,
> > > > struct page *, struct page *, enum migrate_mode);
> > > > + bool (*isolate_page)(struct page *, isolate_mode_t);
> > > > + void (*putback_page)(struct page *);
> > > > int (*launder_page) (struct page *);
> > > > int (*is_partially_uptodate) (struct page *, unsigned long,
> > > > unsigned long);
> > > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > > > index 9b50325e4ddf..404fbfefeb33 100644
> > > > --- a/include/linux/migrate.h
> > > > +++ b/include/linux/migrate.h
> > > > @@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
> > > > struct page *, struct page *, enum migrate_mode);
> > > > extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
> > > > unsigned long private, enum migrate_mode mode, int reason);
> > > > +extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
> > > > +extern void putback_movable_page(struct page *page);
> > > >
> > > > extern int migrate_prep(void);
> > > > extern int migrate_prep_local(void);
> > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > index f4ed4f1b0c77..3885064641c4 100644
> > > > --- a/include/linux/page-flags.h
> > > > +++ b/include/linux/page-flags.h
> > > > @@ -129,6 +129,10 @@ enum pageflags {
> > > >
> > > > /* Compound pages. Stored in first tail page's flags */
> > > > PG_double_map = PG_private_2,
> > > > +
> > > > + /* non-lru movable pages */
> > > > + PG_movable = PG_reclaim,
> > > > + PG_isolated = PG_owner_priv_1,
> > > > };
> > > >
> > > > #ifndef __GENERATING_BOUNDS_H
> > > > @@ -614,6 +618,31 @@ static inline void __ClearPageBalloon(struct page *page)
> > > > atomic_set(&page->_mapcount, -1);
> > > > }
> > > >
> > > > +#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
> > > > +
> > > > +static inline int PageMovable(struct page *page)
> > > > +{
> > > > + return ((test_bit(PG_movable, &(page)->flags) &&
> > > > + atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
> > > > + || PageBalloon(page));
> > > > +}
> > > > +
> > > > +/*
> > > > + * Caller should hold a PG_lock */
> > > > +static inline void __SetPageMovable(struct page *page)
> > > > +{
> > > > + __set_bit(PG_movable, &page->flags);
> > > > + atomic_set(&page->_mapcount, PAGE_MOVABLE_MAPCOUNT_VALUE);
> > > > +}
> > >
> > > I think there is no big benefit to use non-atomic version here.
> > > PageMovable() is speculatively checked without holding a PG_lock
> > > so some cpu can miss this flag set if we use non-atomic version.
> >
> > I wanted to show that double underscore is non-atomic so caller
> > should take care of the lock(i.e., PG_lock).
> > If we use atomic version, what kinds of benefit do we have?
> > Without holding PG_lock, atomic version could be raced, too.
>
> My suggestion is holding PG_lock + atomic set. Compaction first
> checks PageMovable() without PG_lock so it can miss PageMovable() if
> non-atomic version is used.
I think it's really unlikely to race so the chance it can miss PageMovable
is really small so we don't need to add unncessary overhead caused by
atomic op in PG_movable setting side. As well, I want to use double
understcore to show the intention it's not atomic so user should take
care of the function.
>
> >
> > >
> > > > +
> > > > +static inline void __ClearPageMovable(struct page *page)
> > > > +{
> > > > + atomic_set(&page->_mapcount, -1);
> > > > + __clear_bit(PG_movable, &(page)->flags);
> > > > +}
> > > > +
> > > > +PAGEFLAG(Isolated, isolated, PF_ANY);
> > > > +
> > > > /*
> > > > * If network-based swap is enabled, sl*b must keep track of whether pages
> > > > * were allocated from pfmemalloc reserves.
> > > > diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> > > > index 5da5f8751ce7..a184fd2434fa 100644
> > > > --- a/include/uapi/linux/kernel-page-flags.h
> > > > +++ b/include/uapi/linux/kernel-page-flags.h
> > > > @@ -34,6 +34,7 @@
> > > > #define KPF_BALLOON 23
> > > > #define KPF_ZERO_PAGE 24
> > > > #define KPF_IDLE 25
> > > > +#define KPF_MOVABLE 26
> > > >
> > > >
> > > > #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
> > > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > > index ccf97b02b85f..7557aedddaee 100644
> > > > --- a/mm/compaction.c
> > > > +++ b/mm/compaction.c
> > > > @@ -703,7 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > > >
> > > > /*
> > > > * Check may be lockless but that's ok as we recheck later.
> > > > - * It's possible to migrate LRU pages and balloon pages
> > > > + * It's possible to migrate LRU and movable kernel pages.
> > > > * Skip any other type of page
> > > > */
> > > > is_lru = PageLRU(page);
> > > > @@ -714,6 +714,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > > > goto isolate_success;
> > > > }
> > > > }
> > > > +
> > > > + if (unlikely(PageMovable(page)) &&
> > > > + !PageIsolated(page)) {
> > > > + if (locked) {
> > > > + spin_unlock_irqrestore(&zone->lru_lock,
> > > > + flags);
> > > > + locked = false;
> > > > + }
> > > > +
> > > > + if (isolate_movable_page(page, isolate_mode))
> > > > + goto isolate_success;
> > > > + }
> > > > }
> > > >
> > > > /*
> > > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > > index b65c84267ce0..fc2842a15807 100644
> > > > --- a/mm/migrate.c
> > > > +++ b/mm/migrate.c
> > > > @@ -73,6 +73,75 @@ int migrate_prep_local(void)
> > > > return 0;
> > > > }
> > > >
> > > > +bool isolate_movable_page(struct page *page, isolate_mode_t mode)
> > > > +{
> > > > + bool ret = false;
> > > > +
> > > > + /*
> > > > + * Avoid burning cycles with pages that are yet under __free_pages(),
> > > > + * or just got freed under us.
> > > > + *
> > > > + * In case we 'win' a race for a movable page being freed under us and
> > > > + * raise its refcount preventing __free_pages() from doing its job
> > > > + * the put_page() at the end of this block will take care of
> > > > + * release this page, thus avoiding a nasty leakage.
> > > > + */
> > > > + if (unlikely(!get_page_unless_zero(page)))
> > > > + goto out;
> > >
> > > After getting the ref counter, we need to re-check PageMovable()
> > > to ensure that we indeed handle PageMovable() type page. Without it,
> > > the page we handle can be freed and re-allocated to someone else
> > > that isn't related to PageMovable() before grabbing the page. Trying
> > > trylock_page() in this case could cause a problem.
> >
> > I don't get it. Why do you think trylock_page could cause a problem?
> > Could you elaborate it more?
>
> Okay. Consider following sequence.
>
> CPU-A CPU-B
> check PageMovable() in compacton
> ... free the page
> ... allocate the page for other usecase
> ... (maybe for file cache or slub)
> get unless 0 in isolate_movable_page()
> trylock (success)
> (try) lock! failed!
>
> In this case, someone can see failure even if they are owner of the
> page. IIUC, this also can happen in zsmalloc. See init_zspage() in
> 15th patch. It assume that allocated page can be locked
> unconditionally.
Oops, As you pointed out, other places already assumed page owner's
trylock will always succeed. :(
Thanks for the notice.
I will add PageMovable check right before trylock.
>
> > >
> > > > + /*
> > > > + * As movable pages are not isolated from LRU lists, concurrent
> > > > + * compaction threads can race against page migration functions
> > > > + * as well as race against the releasing a page.
> > > > + *
> > > > + * In order to avoid having an already isolated movable page
> > > > + * being (wrongly) re-isolated while it is under migration,
> > > > + * or to avoid attempting to isolate pages being released,
> > > > + * lets be sure we have the page lock
> > > > + * before proceeding with the movable page isolation steps.
> > > > + */
> > > > + if (unlikely(!trylock_page(page)))
> > > > + goto out_putpage;
> > > > +
> > > > + if (!PageMovable(page) || PageIsolated(page))
> > > > + goto out_no_isolated;
> > > > +
> > > > + ret = page->mapping->a_ops->isolate_page(page, mode);
> > > > + if (!ret)
> > > > + goto out_no_isolated;
> > > > +
> > > > + WARN_ON_ONCE(!PageIsolated(page));
> > > > + unlock_page(page);
> > > > + return ret;
> > > > +
> > > > +out_no_isolated:
> > > > + unlock_page(page);
> > > > +out_putpage:
> > > > + put_page(page);
> > > > +out:
> > > > + return ret;
> > > > +}
> > > > +
> > > > +void putback_movable_page(struct page *page)
> > > > +{
> > > > + struct address_space *mapping;
> > > > +
> > > > + /*
> > > > + * 'lock_page()' stabilizes the page and prevents races against
> > > > + * concurrent isolation threads attempting to re-isolate it.
> > > > + */
> > > > + lock_page(page);
> > > > + mapping = page_mapping(page);
> > > > + if (mapping) {
> > > > + mapping->a_ops->putback_page(page);
> > > > + WARN_ON_ONCE(PageIsolated(page));
> > > > + }
> > > > + unlock_page(page);
> > > > + /* drop the extra ref count taken for movable page isolation */
> > > > + put_page(page);
> > > > +}
> > >
> > > This is complicated part for me. mapping can disappear? In this case,
> > > who clear PageIsolated()?
> >
> > Page's owner, for exmaple, zsmalloc, virtio-balloon.
> > They can free page whenever they want once it holds a PG_lock.
> > They should clear mapping and PG_movable with PG_lock.
> >
> > >
> > > > +
> > > > +
> > > > /*
> > > > * Put previously isolated pages back onto the appropriate lists
> > > > * from where they were once taken off for compaction/migration.
> > > > @@ -96,6 +165,8 @@ void putback_movable_pages(struct list_head *l)
> > > > page_is_file_cache(page));
> > > > if (unlikely(isolated_balloon_page(page)))
> > > > balloon_page_putback(page);
> > > > + else if (unlikely(PageIsolated(page)))
> > > > + putback_movable_page(page);
> > > > else
> > > > putback_lru_page(page);
> > > > }
> > >
> > > I think that this will not work. You uses PG_owner_priv_1 as
> > > PG_isolated and it is possible that some lru pages has this flag.
> > > I guess you need to add PageMovable() check but it seems that mapping
> > > and this flag can be cleared by others.
> >
> > Hmm, PageMovable check may work because If PageMovable check fails,
> > it means page's owner free the page so we can simple put the page to
> > release refcount in here.
> > I will check it.
>
> Hmmm... But, in failure case, is it safe to call putback_lru_page() for them?
At the moment, it's safe to work. It is just added to lrupvec to release
although it was not LRU page. Yes, it's error-pronce so someday it hurts
someone. I will think more about it.
> And, PageIsolated() would be left. Is it okay? It's not symmetric that
Page owner should clear PageIsolated once he freed the page.
> isolated page can be freed by decreasing ref count without calling
> putback function. This should be clarified and documented.
Yes, I will add it on document.
Thanks!
Hello Andrew,
On Mon, Mar 21, 2016 at 03:30:49PM +0900, Minchan Kim wrote:
> Recently, I got many reports about perfermance degradation
> in embedded system(Android mobile phone, webOS TV and so on)
> and failed to fork easily.
>
> The problem was fragmentation caused by zram and GPU driver
> pages. Their pages cannot be migrated so compaction cannot
> work well, either so reclaimer ends up shrinking all of working
> set pages. It made system very slow and even to fail to fork
> easily.
>
> Other pain point is that they cannot work with CMA.
> Most of CMA memory space could be idle(ie, it could be used
> for movable pages unless driver is using) but if driver(i.e.,
> zram) cannot migrate his page, that memory space could be
> wasted. In our product which has big CMA memory, it reclaims
> zones too exccessively although there are lots of free space
> in CMA so system was very slow easily.
>
> To solve these problem, this patch try to add facility to
> migrate non-lru pages via introducing new friend functions
> of migratepage in address_space_operation and new page flags.
>
> (isolate_page, putback_page)
> (PG_movable, PG_isolated)
>
> For details, please read description in
> "mm/compaction: support non-lru movable page migration".
>
> Originally, Gioh Kim tried to support this feature but he moved
> so I took over the work. But I took many code from his work and
> changed a little bit.
> Thanks, Gioh!
>
> And I should mention Konstantin Khlebnikov. He really heped Gioh
> at that time so he should deserve to have many credit, too.
> Thanks, Konstantin!
>
> This patchset consists of five parts
>
> 1. clean up migration
> mm: use put_page to free page instead of putback_lru_page
>
> 2. zsmalloc clean-up for preparing page migration
> zsmalloc: use first_page rather than page
> zsmalloc: clean up many BUG_ON
> zsmalloc: reordering function parameter
> zsmalloc: remove unused pool param in obj_free
> zsmalloc: keep max_object in size_class
> zsmalloc: squeeze inuse into page->mapping
> zsmalloc: squeeze freelist into page->mapping
> zsmalloc: move struct zs_meta from mapping to freelist
> zsmalloc: factor page chain functionality out
> zsmalloc: separate free_zspage from putback_zspage
> zsmalloc: zs_compact refactoring
In this series, [2-5] are clean up regardless of goal of the patchset
so it could be merged independently.
I want to reduce patchset size in next post.
If anyone are not against, could you merge cleanup patchset?
zsmalloc: use first_page rather than page
zsmalloc: clean up many BUG_ON
zsmalloc: reordering function parameter
zsmalloc: remove unused pool param in obj_free
Thanks.