From: Gioh Kim <[email protected]>
Hello,
This series try to enable migration of non-LRU pages, such as driver's page.
My ARM-based platform occured severe fragmentation problem after long-term
(several days) test. Sometimes even order-3 page allocation failed. It has
memory size 512MB ~ 1024MB. 30% ~ 40% memory is consumed for graphic processing
and 20~30 memory is reserved for zram.
I found that many pages of GPU driver and zram are non-movable pages. So I
reported Minchan Kim, the maintainer of zram, and he made the internal
compaction logic of zram. And I made the internal compaction of GPU driver.
They reduced some fragmentation but they are not enough effective.
They are activated by its own interface, /sys, so they are not cooperative
with kernel compaction. If there is too much fragmentation and kernel starts
to compaction, zram and GPU driver cannot work with the kernel compaction.
So I thought there needs a interface to combine driver and kernel compaction.
This patch adds a generic isolate/migrate/putback callbacks for page
address-space and a new interface to create anon-inode to manage
address_space_operation. The zram and GPU, and any other modules can create
anon_inode and register its own migration method. The kernel compaction can
call the registered migration when it does compaction.
My GPU driver source is not in-kernel driver so that I apply the interface
into balloon driver. The balloon driver is already merged
into the kernel compaction as a corner-case. This patch have the balloon
driver migration be called by the generic interface.
This patch set combines 4 patches.
1. patch 1/4: get inode from anon_inodes
This patch adds new interface to create inode from anon_inodes.
2. patch 2/4: framework to isolate/migrate/putback page
Add isolatepage, putbackpage into address_space_operations
and wrapper function to call them.
3. patch 3/4: apply the framework into balloon driver
The balloon driver is applied into the framework. It gets a inode
from anon_inodes and register operations in the inode.
The kernel compaction calls generic interfaces, not balloon
driver interfaces.
Any other drivers can register operations via inode like this
to migrate it's pages.
4. patch 4/4: remove direct calling of migration of driver pages
Non-lru pages are also migrated with lru pages by move_to_new_page().
This patch set is tested:
- turn on Ubuntu 14.04 with 1G memory on qemu.
- do kernel building
- after several seconds check more than 512MB is used with free command
- command "balloon 512" in qemu monitor
- check hundreds MB of pages are migrated
My thanks to Konstantin Khlebnikov for his reviews of the RFC patch set.
Most of the changes were based on his feedback.
This patch-set is based on v4.1
Gioh Kim (4):
fs/anon_inodes: new interface to create new inode
mm/compaction: enable mobile-page migration
mm/balloon: apply mobile page migratable into balloon
mm: remove direct calling of migration
drivers/virtio/virtio_balloon.c | 3 ++
fs/anon_inodes.c | 6 +++
fs/proc/page.c | 3 ++
include/linux/anon_inodes.h | 1 +
include/linux/balloon_compaction.h | 15 +++++--
include/linux/compaction.h | 80 ++++++++++++++++++++++++++++++++++
include/linux/fs.h | 2 +
include/linux/page-flags.h | 19 ++++++++
include/uapi/linux/kernel-page-flags.h | 1 +
mm/balloon_compaction.c | 72 ++++++++++--------------------
mm/compaction.c | 8 ++--
mm/migrate.c | 24 +++-------
12 files changed, 160 insertions(+), 74 deletions(-)
--
2.1.4
From: Gioh Kim <[email protected]>
The anon_inodes has already complete interfaces to create manage
many anonymous inodes but don't have interface to get
new inode. Other sub-modules can create anonymous inode
without creating and mounting it's own pseudo filesystem.
Signed-off-by: Gioh Kim <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
---
fs/anon_inodes.c | 6 ++++++
include/linux/anon_inodes.h | 1 +
2 files changed, 7 insertions(+)
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 80ef38c..1d51f96 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -162,6 +162,12 @@ err_put_unused_fd:
}
EXPORT_SYMBOL_GPL(anon_inode_getfd);
+struct inode *anon_inode_new(void)
+{
+ return alloc_anon_inode(anon_inode_mnt->mnt_sb);
+}
+EXPORT_SYMBOL_GPL(anon_inode_new);
+
static int __init anon_inode_init(void)
{
anon_inode_mnt = kern_mount(&anon_inode_fs_type);
diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
index 8013a45..ddbd67f 100644
--- a/include/linux/anon_inodes.h
+++ b/include/linux/anon_inodes.h
@@ -15,6 +15,7 @@ struct file *anon_inode_getfile(const char *name,
void *priv, int flags);
int anon_inode_getfd(const char *name, const struct file_operations *fops,
void *priv, int flags);
+struct inode *anon_inode_new(void);
#endif /* _LINUX_ANON_INODES_H */
--
2.1.4
From: Gioh Kim <[email protected]>
Add framework to register callback functions and check page mobility.
There are some modes for page isolation so that isolate interface
has arguments of page address and isolation mode while putback
interface has only page address as argument.
Signed-off-by: Gioh Kim <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
---
fs/proc/page.c | 3 ++
include/linux/compaction.h | 80 ++++++++++++++++++++++++++++++++++
include/linux/fs.h | 2 +
include/linux/page-flags.h | 19 ++++++++
include/uapi/linux/kernel-page-flags.h | 1 +
5 files changed, 105 insertions(+)
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 7eee2d8..a4f5a00 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -146,6 +146,9 @@ u64 stable_page_flags(struct page *page)
if (PageBalloon(page))
u |= 1 << KPF_BALLOON;
+ if (PageMobile(page))
+ u |= 1 << KPF_MOBILE;
+
u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index aa8f61c..f693072 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -1,6 +1,9 @@
#ifndef _LINUX_COMPACTION_H
#define _LINUX_COMPACTION_H
+#include <linux/page-flags.h>
+#include <linux/pagemap.h>
+
/* Return values for compact_zone() and try_to_compact_pages() */
/* compaction didn't start as it was deferred due to past failures */
#define COMPACT_DEFERRED 0
@@ -51,6 +54,70 @@ extern void compaction_defer_reset(struct zone *zone, int order,
bool alloc_success);
extern bool compaction_restarting(struct zone *zone, int order);
+static inline bool mobile_page(struct page *page)
+{
+ return page->mapping && (PageMobile(page) || PageBalloon(page));
+}
+
+static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
+{
+ bool ret = false;
+
+ /*
+ * Avoid burning cycles with pages that are yet under __free_pages(),
+ * or just got freed under us.
+ *
+ * In case we 'win' a race for a mobile page being freed under us and
+ * raise its refcount preventing __free_pages() from doing its job
+ * the put_page() at the end of this block will take care of
+ * release this page, thus avoiding a nasty leakage.
+ */
+ if (unlikely(!get_page_unless_zero(page)))
+ goto out;
+
+ /*
+ * As mobile pages are not isolated from LRU lists, concurrent
+ * compaction threads can race against page migration functions
+ * as well as race against the releasing a page.
+ *
+ * In order to avoid having an already isolated mobile page
+ * being (wrongly) re-isolated while it is under migration,
+ * or to avoid attempting to isolate pages being released,
+ * lets be sure we have the page lock
+ * before proceeding with the mobile page isolation steps.
+ */
+ if (unlikely(!trylock_page(page)))
+ goto out_putpage;
+
+ if (!(mobile_page(page) && page->mapping->a_ops->isolatepage))
+ goto out_not_isolated;
+ ret = page->mapping->a_ops->isolatepage(page, mode);
+ if (!ret)
+ goto out_not_isolated;
+ unlock_page(page);
+ return ret;
+
+out_not_isolated:
+ unlock_page(page);
+out_putpage:
+ put_page(page);
+out:
+ return ret;
+}
+
+static inline void putback_mobilepage(struct page *page)
+{
+ /*
+ * 'lock_page()' stabilizes the page and prevents races against
+ * concurrent isolation threads attempting to re-isolate it.
+ */
+ lock_page(page);
+ if (page->mapping && page->mapping->a_ops->putbackpage)
+ page->mapping->a_ops->putbackpage(page);
+ unlock_page(page);
+ /* drop the extra ref count taken for mobile page isolation */
+ put_page(page);
+}
#else
static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
unsigned int order, int alloc_flags,
@@ -83,6 +150,19 @@ static inline bool compaction_deferred(struct zone *zone, int order)
return true;
}
+static inline bool mobile_page(struct page *page)
+{
+ return false;
+}
+
+static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
+{
+ return false;
+}
+
+static inline void putback_mobilepage(struct page *page)
+{
+}
#endif /* CONFIG_COMPACTION */
#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a0653e5..2cc4b24 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -396,6 +396,8 @@ struct address_space_operations {
*/
int (*migratepage) (struct address_space *,
struct page *, struct page *, enum migrate_mode);
+ bool (*isolatepage) (struct page *, isolate_mode_t);
+ void (*putbackpage) (struct page *);
int (*launder_page) (struct page *);
int (*is_partially_uptodate) (struct page *, unsigned long,
unsigned long);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f34e040..abef145 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -582,6 +582,25 @@ static inline void __ClearPageBalloon(struct page *page)
atomic_set(&page->_mapcount, -1);
}
+#define PAGE_MOBILE_MAPCOUNT_VALUE (-255)
+
+static inline int PageMobile(struct page *page)
+{
+ return atomic_read(&page->_mapcount) == PAGE_MOBILE_MAPCOUNT_VALUE;
+}
+
+static inline void __SetPageMobile(struct page *page)
+{
+ VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
+ atomic_set(&page->_mapcount, PAGE_MOBILE_MAPCOUNT_VALUE);
+}
+
+static inline void __ClearPageMobile(struct page *page)
+{
+ VM_BUG_ON_PAGE(!PageMobile(page), page);
+ atomic_set(&page->_mapcount, -1);
+}
+
/*
* If network-based swap is enabled, sl*b must keep track of whether pages
* were allocated from pfmemalloc reserves.
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index a6c4962..d50d9e8 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -33,6 +33,7 @@
#define KPF_THP 22
#define KPF_BALLOON 23
#define KPF_ZERO_PAGE 24
+#define KPF_MOBILE 25
#endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
--
2.1.4
From: Gioh Kim <[email protected]>
Apply mobile page migration into balloon driver.
The balloong driver has an anonymous inode that manages
address_space_operation for page migration.
Compaction calls interfaces of mobile page migration
instead of calling balloon migration directly.
Signed-off-by: Gioh Kim <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
---
drivers/virtio/virtio_balloon.c | 3 ++
include/linux/balloon_compaction.h | 15 ++++++--
mm/balloon_compaction.c | 72 ++++++++++++--------------------------
mm/compaction.c | 8 ++---
mm/migrate.c | 21 ++++++-----
5 files changed, 54 insertions(+), 65 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 82e80e0..ef5b9b5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -30,6 +30,7 @@
#include <linux/balloon_compaction.h>
#include <linux/oom.h>
#include <linux/wait.h>
+#include <linux/anon_inodes.h>
/*
* Balloon device works in 4K page units. So each page is pointed to by
@@ -505,6 +506,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
balloon_devinfo_init(&vb->vb_dev_info);
#ifdef CONFIG_BALLOON_COMPACTION
vb->vb_dev_info.migratepage = virtballoon_migratepage;
+ vb->vb_dev_info.inode = anon_inode_new();
+ vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
#endif
err = init_vqs(vb);
diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index 9b0a15d..a9e0bde 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -48,6 +48,7 @@
#include <linux/migrate.h>
#include <linux/gfp.h>
#include <linux/err.h>
+#include <linux/fs.h>
/*
* Balloon device information descriptor.
@@ -62,6 +63,7 @@ struct balloon_dev_info {
struct list_head pages; /* Pages enqueued & handled to Host */
int (*migratepage)(struct balloon_dev_info *, struct page *newpage,
struct page *page, enum migrate_mode mode);
+ struct inode *inode;
};
extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
@@ -73,12 +75,16 @@ static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
spin_lock_init(&balloon->pages_lock);
INIT_LIST_HEAD(&balloon->pages);
balloon->migratepage = NULL;
+ balloon->inode = NULL;
}
#ifdef CONFIG_BALLOON_COMPACTION
-extern bool balloon_page_isolate(struct page *page);
+extern const struct address_space_operations balloon_aops;
+extern bool balloon_page_isolate(struct page *page,
+ isolate_mode_t mode);
extern void balloon_page_putback(struct page *page);
-extern int balloon_page_migrate(struct page *newpage,
+extern int balloon_page_migrate(struct address_space *mapping,
+ struct page *newpage,
struct page *page, enum migrate_mode mode);
/*
@@ -124,6 +130,7 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon,
struct page *page)
{
__SetPageBalloon(page);
+ page->mapping = balloon->inode->i_mapping;
SetPagePrivate(page);
set_page_private(page, (unsigned long)balloon);
list_add(&page->lru, &balloon->pages);
@@ -140,6 +147,7 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon,
static inline void balloon_page_delete(struct page *page)
{
__ClearPageBalloon(page);
+ page->mapping = NULL;
set_page_private(page, 0);
if (PagePrivate(page)) {
ClearPagePrivate(page);
@@ -191,7 +199,8 @@ static inline bool isolated_balloon_page(struct page *page)
return false;
}
-static inline bool balloon_page_isolate(struct page *page)
+static inline bool balloon_page_isolate(struct page *page,
+ isolate_mode_t mode)
{
return false;
}
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index fcad832..8fbcf9c 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -131,43 +131,16 @@ static inline void __putback_balloon_page(struct page *page)
}
/* __isolate_lru_page() counterpart for a ballooned page */
-bool balloon_page_isolate(struct page *page)
+bool balloon_page_isolate(struct page *page, isolate_mode_t mode)
{
/*
- * Avoid burning cycles with pages that are yet under __free_pages(),
- * or just got freed under us.
- *
- * In case we 'win' a race for a balloon page being freed under us and
- * raise its refcount preventing __free_pages() from doing its job
- * the put_page() at the end of this block will take care of
- * release this page, thus avoiding a nasty leakage.
+ * A ballooned page, by default, has PagePrivate set.
+ * Prevent concurrent compaction threads from isolating
+ * an already isolated balloon page by clearing it.
*/
- if (likely(get_page_unless_zero(page))) {
- /*
- * As balloon pages are not isolated from LRU lists, concurrent
- * compaction threads can race against page migration functions
- * as well as race against the balloon driver releasing a page.
- *
- * In order to avoid having an already isolated balloon page
- * being (wrongly) re-isolated while it is under migration,
- * or to avoid attempting to isolate pages being released by
- * the balloon driver, lets be sure we have the page lock
- * before proceeding with the balloon page isolation steps.
- */
- if (likely(trylock_page(page))) {
- /*
- * A ballooned page, by default, has PagePrivate set.
- * Prevent concurrent compaction threads from isolating
- * an already isolated balloon page by clearing it.
- */
- if (balloon_page_movable(page)) {
- __isolate_balloon_page(page);
- unlock_page(page);
- return true;
- }
- unlock_page(page);
- }
- put_page(page);
+ if (balloon_page_movable(page)) {
+ __isolate_balloon_page(page);
+ return true;
}
return false;
}
@@ -175,37 +148,31 @@ bool balloon_page_isolate(struct page *page)
/* putback_lru_page() counterpart for a ballooned page */
void balloon_page_putback(struct page *page)
{
- /*
- * 'lock_page()' stabilizes the page and prevents races against
- * concurrent isolation threads attempting to re-isolate it.
- */
- lock_page(page);
+ if (!isolated_balloon_page(page))
+ return;
if (__is_movable_balloon_page(page)) {
__putback_balloon_page(page);
- /* drop the extra ref count taken for page isolation */
- put_page(page);
} else {
WARN_ON(1);
dump_page(page, "not movable balloon page");
}
- unlock_page(page);
}
/* move_to_new_page() counterpart for a ballooned page */
-int balloon_page_migrate(struct page *newpage,
+int balloon_page_migrate(struct address_space *mapping,
+ struct page *newpage,
struct page *page, enum migrate_mode mode)
{
struct balloon_dev_info *balloon = balloon_page_device(page);
int rc = -EAGAIN;
+ if (!isolated_balloon_page(page))
+ return rc;
+
/*
- * Block others from accessing the 'newpage' when we get around to
- * establishing additional references. We should be the only one
- * holding a reference to the 'newpage' at this point.
+ * newpage and page should be already locked.
*/
- BUG_ON(!trylock_page(newpage));
-
if (WARN_ON(!__is_movable_balloon_page(page))) {
dump_page(page, "not movable balloon page");
unlock_page(newpage);
@@ -215,7 +182,14 @@ int balloon_page_migrate(struct page *newpage,
if (balloon && balloon->migratepage)
rc = balloon->migratepage(balloon, newpage, page, mode);
- unlock_page(newpage);
return rc;
}
+
+/* define the balloon_mapping->a_ops callback to allow balloon page migration */
+const struct address_space_operations balloon_aops = {
+ .migratepage = balloon_page_migrate,
+ .isolatepage = balloon_page_isolate,
+ .putbackpage = balloon_page_putback,
+};
+EXPORT_SYMBOL_GPL(balloon_aops);
#endif /* CONFIG_BALLOON_COMPACTION */
diff --git a/mm/compaction.c b/mm/compaction.c
index 018f08d..60e4cbb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,7 +14,7 @@
#include <linux/backing-dev.h>
#include <linux/sysctl.h>
#include <linux/sysfs.h>
-#include <linux/balloon_compaction.h>
+#include <linux/compaction.h>
#include <linux/page-isolation.h>
#include <linux/kasan.h>
#include "internal.h"
@@ -714,12 +714,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
/*
* Check may be lockless but that's ok as we recheck later.
- * It's possible to migrate LRU pages and balloon pages
+ * It's possible to migrate LRU pages and mobile pages
* Skip any other type of page
*/
if (!PageLRU(page)) {
- if (unlikely(balloon_page_movable(page))) {
- if (balloon_page_isolate(page)) {
+ if (unlikely(mobile_page(page))) {
+ if (isolate_mobilepage(page, isolate_mode)) {
/* Successfully isolated */
goto isolate_success;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index ee401e4..53f0081d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -35,7 +35,7 @@
#include <linux/hugetlb.h>
#include <linux/hugetlb_cgroup.h>
#include <linux/gfp.h>
-#include <linux/balloon_compaction.h>
+#include <linux/compaction.h>
#include <linux/mmu_notifier.h>
#include <asm/tlbflush.h>
@@ -76,7 +76,7 @@ int migrate_prep_local(void)
* from where they were once taken off for compaction/migration.
*
* This function shall be used whenever the isolated pageset has been
- * built from lru, balloon, hugetlbfs page. See isolate_migratepages_range()
+ * built from lru, mobile, hugetlbfs page. See isolate_migratepages_range()
* and isolate_huge_page().
*/
void putback_movable_pages(struct list_head *l)
@@ -92,8 +92,8 @@ void putback_movable_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- if (unlikely(isolated_balloon_page(page)))
- balloon_page_putback(page);
+ if (unlikely(mobile_page(page)))
+ putback_mobilepage(page);
else
putback_lru_page(page);
}
@@ -844,15 +844,18 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
}
}
- if (unlikely(isolated_balloon_page(page))) {
+ if (unlikely(mobile_page(page))) {
/*
- * A ballooned page does not need any special attention from
+ * A mobile page does not need any special attention from
* physical to virtual reverse mapping procedures.
* Skip any attempt to unmap PTEs or to remap swap cache,
* in order to avoid burning cycles at rmap level, and perform
* the page migration right away (proteced by page lock).
*/
- rc = balloon_page_migrate(newpage, page, mode);
+ lock_page(newpage);
+ rc = page->mapping->a_ops->migratepage(page->mapping,
+ newpage, page, mode);
+ unlock_page(newpage);
goto out_unlock;
}
@@ -962,8 +965,8 @@ out:
if (rc != MIGRATEPAGE_SUCCESS && put_new_page) {
ClearPageSwapBacked(newpage);
put_new_page(newpage, private);
- } else if (unlikely(__is_movable_balloon_page(newpage))) {
- /* drop our reference, page already in the balloon */
+ } else if (unlikely(mobile_page(newpage))) {
+ /* drop our reference */
put_page(newpage);
} else
putback_lru_page(newpage);
--
2.1.4
From: Gioh Kim <[email protected]>
Migration is completely generalized so that migrating mobile page
is processed with lru-pages in move_to_new_page.
Signed-off-by: Gioh Kim <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
---
mm/migrate.c | 15 ---------------
1 file changed, 15 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 53f0081d..e6644ac 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -844,21 +844,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
}
}
- if (unlikely(mobile_page(page))) {
- /*
- * A mobile page does not need any special attention from
- * physical to virtual reverse mapping procedures.
- * Skip any attempt to unmap PTEs or to remap swap cache,
- * in order to avoid burning cycles at rmap level, and perform
- * the page migration right away (proteced by page lock).
- */
- lock_page(newpage);
- rc = page->mapping->a_ops->migratepage(page->mapping,
- newpage, page, mode);
- unlock_page(newpage);
- goto out_unlock;
- }
-
/*
* Corner case handling:
* 1. When a new swap-cache page is read into, it is added to the LRU
--
2.1.4
On Mon, Jul 13, 2015 at 11:35 AM, Gioh Kim <[email protected]> wrote:
> From: Gioh Kim <[email protected]>
>
> Hello,
>
> This series try to enable migration of non-LRU pages, such as driver's page.
>
> My ARM-based platform occured severe fragmentation problem after long-term
> (several days) test. Sometimes even order-3 page allocation failed. It has
> memory size 512MB ~ 1024MB. 30% ~ 40% memory is consumed for graphic processing
> and 20~30 memory is reserved for zram.
>
> I found that many pages of GPU driver and zram are non-movable pages. So I
> reported Minchan Kim, the maintainer of zram, and he made the internal
> compaction logic of zram. And I made the internal compaction of GPU driver.
>
> They reduced some fragmentation but they are not enough effective.
> They are activated by its own interface, /sys, so they are not cooperative
> with kernel compaction. If there is too much fragmentation and kernel starts
> to compaction, zram and GPU driver cannot work with the kernel compaction.
>
> So I thought there needs a interface to combine driver and kernel compaction.
> This patch adds a generic isolate/migrate/putback callbacks for page
> address-space and a new interface to create anon-inode to manage
> address_space_operation. The zram and GPU, and any other modules can create
> anon_inode and register its own migration method. The kernel compaction can
> call the registered migration when it does compaction.
>
> My GPU driver source is not in-kernel driver so that I apply the interface
> into balloon driver. The balloon driver is already merged
> into the kernel compaction as a corner-case. This patch have the balloon
> driver migration be called by the generic interface.
>
>
> This patch set combines 4 patches.
>
> 1. patch 1/4: get inode from anon_inodes
> This patch adds new interface to create inode from anon_inodes.
>
> 2. patch 2/4: framework to isolate/migrate/putback page
> Add isolatepage, putbackpage into address_space_operations
> and wrapper function to call them.
>
> 3. patch 3/4: apply the framework into balloon driver
> The balloon driver is applied into the framework. It gets a inode
> from anon_inodes and register operations in the inode.
> The kernel compaction calls generic interfaces, not balloon
> driver interfaces.
> Any other drivers can register operations via inode like this
> to migrate it's pages.
>
> 4. patch 4/4: remove direct calling of migration of driver pages
> Non-lru pages are also migrated with lru pages by move_to_new_page().
The whole patchset looks good.
Reviewed-by: Konstantin Khlebnikov <[email protected]>
>
> This patch set is tested:
> - turn on Ubuntu 14.04 with 1G memory on qemu.
> - do kernel building
> - after several seconds check more than 512MB is used with free command
> - command "balloon 512" in qemu monitor
> - check hundreds MB of pages are migrated
Another simple test is several instances of
tools/testing/selftests/vm/transhuge-stress.c
runnng in parallel with balloon inflating/deflating.
(transparent huge pages must be enabled of course)
That catched a lot of races in ballooning code.
>
> My thanks to Konstantin Khlebnikov for his reviews of the RFC patch set.
> Most of the changes were based on his feedback.
>
> This patch-set is based on v4.1
>
>
> Gioh Kim (4):
> fs/anon_inodes: new interface to create new inode
> mm/compaction: enable mobile-page migration
> mm/balloon: apply mobile page migratable into balloon
> mm: remove direct calling of migration
>
> drivers/virtio/virtio_balloon.c | 3 ++
> fs/anon_inodes.c | 6 +++
> fs/proc/page.c | 3 ++
> include/linux/anon_inodes.h | 1 +
> include/linux/balloon_compaction.h | 15 +++++--
> include/linux/compaction.h | 80 ++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 2 +
> include/linux/page-flags.h | 19 ++++++++
> include/uapi/linux/kernel-page-flags.h | 1 +
> mm/balloon_compaction.c | 72 ++++++++++--------------------
> mm/compaction.c | 8 ++--
> mm/migrate.c | 24 +++-------
> 12 files changed, 160 insertions(+), 74 deletions(-)
>
> --
> 2.1.4
>
2015-07-13 오후 6:24에 Konstantin Khlebnikov 이(가) 쓴 글:
> On Mon, Jul 13, 2015 at 11:35 AM, Gioh Kim <[email protected]> wrote:
>> From: Gioh Kim <[email protected]>
>>
>> Hello,
>>
>> This series try to enable migration of non-LRU pages, such as driver's page.
>>
>> My ARM-based platform occured severe fragmentation problem after long-term
>> (several days) test. Sometimes even order-3 page allocation failed. It has
>> memory size 512MB ~ 1024MB. 30% ~ 40% memory is consumed for graphic processing
>> and 20~30 memory is reserved for zram.
>>
>> I found that many pages of GPU driver and zram are non-movable pages. So I
>> reported Minchan Kim, the maintainer of zram, and he made the internal
>> compaction logic of zram. And I made the internal compaction of GPU driver.
>>
>> They reduced some fragmentation but they are not enough effective.
>> They are activated by its own interface, /sys, so they are not cooperative
>> with kernel compaction. If there is too much fragmentation and kernel starts
>> to compaction, zram and GPU driver cannot work with the kernel compaction.
>>
>> So I thought there needs a interface to combine driver and kernel compaction.
>> This patch adds a generic isolate/migrate/putback callbacks for page
>> address-space and a new interface to create anon-inode to manage
>> address_space_operation. The zram and GPU, and any other modules can create
>> anon_inode and register its own migration method. The kernel compaction can
>> call the registered migration when it does compaction.
>>
>> My GPU driver source is not in-kernel driver so that I apply the interface
>> into balloon driver. The balloon driver is already merged
>> into the kernel compaction as a corner-case. This patch have the balloon
>> driver migration be called by the generic interface.
>>
>>
>> This patch set combines 4 patches.
>>
>> 1. patch 1/4: get inode from anon_inodes
>> This patch adds new interface to create inode from anon_inodes.
>>
>> 2. patch 2/4: framework to isolate/migrate/putback page
>> Add isolatepage, putbackpage into address_space_operations
>> and wrapper function to call them.
>>
>> 3. patch 3/4: apply the framework into balloon driver
>> The balloon driver is applied into the framework. It gets a inode
>> from anon_inodes and register operations in the inode.
>> The kernel compaction calls generic interfaces, not balloon
>> driver interfaces.
>> Any other drivers can register operations via inode like this
>> to migrate it's pages.
>>
>> 4. patch 4/4: remove direct calling of migration of driver pages
>> Non-lru pages are also migrated with lru pages by move_to_new_page().
>
> The whole patchset looks good.
>
> Reviewed-by: Konstantin Khlebnikov <[email protected]>
>
>>
>> This patch set is tested:
>> - turn on Ubuntu 14.04 with 1G memory on qemu.
>> - do kernel building
>> - after several seconds check more than 512MB is used with free command
>> - command "balloon 512" in qemu monitor
>> - check hundreds MB of pages are migrated
>
> Another simple test is several instances of
> tools/testing/selftests/vm/transhuge-stress.c
> runnng in parallel with balloon inflating/deflating.
> (transparent huge pages must be enabled of course)
> That catched a lot of races in ballooning code.
>
Great!
I'll do it and inform you the result in this week.
On Mon, Jul 13, 2015 at 05:35:15PM +0900, Gioh Kim wrote:
> From: Gioh Kim <[email protected]>
>
> Hello,
>
> This series try to enable migration of non-LRU pages, such as driver's page.
>
> My ARM-based platform occured severe fragmentation problem after long-term
> (several days) test. Sometimes even order-3 page allocation failed. It has
> memory size 512MB ~ 1024MB. 30% ~ 40% memory is consumed for graphic processing
> and 20~30 memory is reserved for zram.
>
> I found that many pages of GPU driver and zram are non-movable pages. So I
> reported Minchan Kim, the maintainer of zram, and he made the internal
> compaction logic of zram. And I made the internal compaction of GPU driver.
>
> They reduced some fragmentation but they are not enough effective.
> They are activated by its own interface, /sys, so they are not cooperative
> with kernel compaction. If there is too much fragmentation and kernel starts
> to compaction, zram and GPU driver cannot work with the kernel compaction.
>
> So I thought there needs a interface to combine driver and kernel compaction.
> This patch adds a generic isolate/migrate/putback callbacks for page
> address-space and a new interface to create anon-inode to manage
> address_space_operation. The zram and GPU, and any other modules can create
> anon_inode and register its own migration method. The kernel compaction can
> call the registered migration when it does compaction.
>
> My GPU driver source is not in-kernel driver so that I apply the interface
> into balloon driver. The balloon driver is already merged
> into the kernel compaction as a corner-case. This patch have the balloon
> driver migration be called by the generic interface.
>
>
> This patch set combines 4 patches.
>
> 1. patch 1/4: get inode from anon_inodes
> This patch adds new interface to create inode from anon_inodes.
>
> 2. patch 2/4: framework to isolate/migrate/putback page
> Add isolatepage, putbackpage into address_space_operations
> and wrapper function to call them.
>
> 3. patch 3/4: apply the framework into balloon driver
> The balloon driver is applied into the framework. It gets a inode
> from anon_inodes and register operations in the inode.
> The kernel compaction calls generic interfaces, not balloon
> driver interfaces.
> Any other drivers can register operations via inode like this
> to migrate it's pages.
>
> 4. patch 4/4: remove direct calling of migration of driver pages
> Non-lru pages are also migrated with lru pages by move_to_new_page().
>
> This patch set is tested:
> - turn on Ubuntu 14.04 with 1G memory on qemu.
> - do kernel building
> - after several seconds check more than 512MB is used with free command
> - command "balloon 512" in qemu monitor
> - check hundreds MB of pages are migrated
>
> My thanks to Konstantin Khlebnikov for his reviews of the RFC patch set.
> Most of the changes were based on his feedback.
>
> This patch-set is based on v4.1
>
>
> Gioh Kim (4):
> fs/anon_inodes: new interface to create new inode
> mm/compaction: enable mobile-page migration
> mm/balloon: apply mobile page migratable into balloon
> mm: remove direct calling of migration
>
> drivers/virtio/virtio_balloon.c | 3 ++
> fs/anon_inodes.c | 6 +++
> fs/proc/page.c | 3 ++
> include/linux/anon_inodes.h | 1 +
> include/linux/balloon_compaction.h | 15 +++++--
> include/linux/compaction.h | 80 ++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 2 +
> include/linux/page-flags.h | 19 ++++++++
> include/uapi/linux/kernel-page-flags.h | 1 +
> mm/balloon_compaction.c | 72 ++++++++++--------------------
> mm/compaction.c | 8 ++--
> mm/migrate.c | 24 +++-------
> 12 files changed, 160 insertions(+), 74 deletions(-)
>
> --
> 2.1.4
>
Acked-by: Rafael Aquini <[email protected]>
On 07/13/2015 10:35 AM, Gioh Kim wrote:
> From: Gioh Kim <[email protected]>
>
> Add framework to register callback functions and check page mobility.
> There are some modes for page isolation so that isolate interface
> has arguments of page address and isolation mode while putback
> interface has only page address as argument.
Note that unlike what subject suggest, this doesn't really enable
mobile-page migration inside compaction, since that only happens with
patch 3. This might theoretically affect some cherry-pick backports that
don't care about balloon pages. I can imagine that can easily happen in
the world of mobile devices?
It would thus be somewhat cleaner if this patch was complete in that sense.
> Signed-off-by: Gioh Kim <[email protected]>
> Acked-by: Rafael Aquini <[email protected]>
> ---
> fs/proc/page.c | 3 ++
> include/linux/compaction.h | 80 ++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 2 +
> include/linux/page-flags.h | 19 ++++++++
> include/uapi/linux/kernel-page-flags.h | 1 +
> 5 files changed, 105 insertions(+)
>
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 7eee2d8..a4f5a00 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -146,6 +146,9 @@ u64 stable_page_flags(struct page *page)
> if (PageBalloon(page))
> u |= 1 << KPF_BALLOON;
>
> + if (PageMobile(page))
> + u |= 1 << KPF_MOBILE;
PageMovable() would probably be as good a name and correspond to
MIGRATE_MOVABLE somewhat, unlike a completely new term. Whatever driver
starts to using this should probably change allocation flags to allocate
MIGRATE_MOVABLE, so that it works fine with what fragmentation avoidance
expects. Guess I should have said that earlier, but can you still
reconsider?
> +
> u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
>
> u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index aa8f61c..f693072 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -1,6 +1,9 @@
> #ifndef _LINUX_COMPACTION_H
> #define _LINUX_COMPACTION_H
>
> +#include <linux/page-flags.h>
> +#include <linux/pagemap.h>
> +
> /* Return values for compact_zone() and try_to_compact_pages() */
> /* compaction didn't start as it was deferred due to past failures */
> #define COMPACT_DEFERRED 0
> @@ -51,6 +54,70 @@ extern void compaction_defer_reset(struct zone *zone, int order,
> bool alloc_success);
> extern bool compaction_restarting(struct zone *zone, int order);
>
> +static inline bool mobile_page(struct page *page)
> +{
> + return page->mapping && (PageMobile(page) || PageBalloon(page));
> +}
I would put this definition to linux/page-flags.h and rename it to
page_mobile (or better page_movable()), which is more common ordering.
> +
> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
Does this have to be in compaction.h? The only user is compaction.c so
probably move it there, and if there ever is another module using this
in the future, we can move it to a more appropriate place and declare it
in e.g. mm/internal.h.
> +{
> + bool ret = false;
> +
> + /*
> + * Avoid burning cycles with pages that are yet under __free_pages(),
> + * or just got freed under us.
> + *
> + * In case we 'win' a race for a mobile page being freed under us and
> + * raise its refcount preventing __free_pages() from doing its job
> + * the put_page() at the end of this block will take care of
> + * release this page, thus avoiding a nasty leakage.
> + */
> + if (unlikely(!get_page_unless_zero(page)))
> + goto out;
> +
> + /*
> + * As mobile pages are not isolated from LRU lists, concurrent
> + * compaction threads can race against page migration functions
> + * as well as race against the releasing a page.
> + *
> + * In order to avoid having an already isolated mobile page
> + * being (wrongly) re-isolated while it is under migration,
> + * or to avoid attempting to isolate pages being released,
> + * lets be sure we have the page lock
> + * before proceeding with the mobile page isolation steps.
> + */
> + if (unlikely(!trylock_page(page)))
> + goto out_putpage;
> +
> + if (!(mobile_page(page) && page->mapping->a_ops->isolatepage))
> + goto out_not_isolated;
> + ret = page->mapping->a_ops->isolatepage(page, mode);
> + if (!ret)
> + goto out_not_isolated;
> + unlock_page(page);
> + return ret;
> +
> +out_not_isolated:
> + unlock_page(page);
> +out_putpage:
> + put_page(page);
> +out:
> + return ret;
> +}
> +
> +static inline void putback_mobilepage(struct page *page)
Likewise, this could go to migrate.c. Or maybe together with
isolate_mobilepage() if you don't want to split them.
> +{
> + /*
> + * 'lock_page()' stabilizes the page and prevents races against
> + * concurrent isolation threads attempting to re-isolate it.
> + */
> + lock_page(page);
> + if (page->mapping && page->mapping->a_ops->putbackpage)
> + page->mapping->a_ops->putbackpage(page);
> + unlock_page(page);
> + /* drop the extra ref count taken for mobile page isolation */
> + put_page(page);
> +}
> #else
> static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
> unsigned int order, int alloc_flags,
> @@ -83,6 +150,19 @@ static inline bool compaction_deferred(struct zone *zone, int order)
> return true;
> }
>
> +static inline bool mobile_page(struct page *page)
> +{
> + return false;
> +}
> +
> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
> +{
> + return false;
> +}
> +
> +static inline void putback_mobilepage(struct page *page)
> +{
> +}
> #endif /* CONFIG_COMPACTION */
>
> #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a0653e5..2cc4b24 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -396,6 +396,8 @@ struct address_space_operations {
> */
> int (*migratepage) (struct address_space *,
> struct page *, struct page *, enum migrate_mode);
> + bool (*isolatepage) (struct page *, isolate_mode_t);
> + void (*putbackpage) (struct page *);
> int (*launder_page) (struct page *);
> int (*is_partially_uptodate) (struct page *, unsigned long,
> unsigned long);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f34e040..abef145 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -582,6 +582,25 @@ static inline void __ClearPageBalloon(struct page *page)
> atomic_set(&page->_mapcount, -1);
> }
>
> +#define PAGE_MOBILE_MAPCOUNT_VALUE (-255)
> +
> +static inline int PageMobile(struct page *page)
> +{
> + return atomic_read(&page->_mapcount) == PAGE_MOBILE_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageMobile(struct page *page)
> +{
> + VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> + atomic_set(&page->_mapcount, PAGE_MOBILE_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageMobile(struct page *page)
> +{
> + VM_BUG_ON_PAGE(!PageMobile(page), page);
> + atomic_set(&page->_mapcount, -1);
> +}
> +
> /*
> * If network-based swap is enabled, sl*b must keep track of whether pages
> * were allocated from pfmemalloc reserves.
> diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> index a6c4962..d50d9e8 100644
> --- a/include/uapi/linux/kernel-page-flags.h
> +++ b/include/uapi/linux/kernel-page-flags.h
> @@ -33,6 +33,7 @@
> #define KPF_THP 22
> #define KPF_BALLOON 23
> #define KPF_ZERO_PAGE 24
> +#define KPF_MOBILE 25
>
>
> #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
>
On 07/13/2015 10:35 AM, Gioh Kim wrote:
> From: Gioh Kim <[email protected]>
>
> Migration is completely generalized so that migrating mobile page
> is processed with lru-pages in move_to_new_page.
>
> Signed-off-by: Gioh Kim <[email protected]>
> Acked-by: Rafael Aquini <[email protected]>
Why not just fold this to Patch 3? You already modify this hunk there,
and prior to patch 3, the hunk was balloon-pages specific. You made it
look generic only to remove it, which is unneeded code churn and I don't
think it adds anything wrt e.g. bisectability.
> ---
> mm/migrate.c | 15 ---------------
> 1 file changed, 15 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 53f0081d..e6644ac 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -844,21 +844,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> }
> }
>
> - if (unlikely(mobile_page(page))) {
> - /*
> - * A mobile page does not need any special attention from
> - * physical to virtual reverse mapping procedures.
> - * Skip any attempt to unmap PTEs or to remap swap cache,
> - * in order to avoid burning cycles at rmap level, and perform
> - * the page migration right away (proteced by page lock).
> - */
> - lock_page(newpage);
> - rc = page->mapping->a_ops->migratepage(page->mapping,
> - newpage, page, mode);
> - unlock_page(newpage);
> - goto out_unlock;
> - }
> -
> /*
> * Corner case handling:
> * 1. When a new swap-cache page is read into, it is added to the LRU
>
On Mon, Jul 27, 2015 at 4:55 PM, Vlastimil Babka <[email protected]> wrote:
> On 07/13/2015 10:35 AM, Gioh Kim wrote:
>>
>> From: Gioh Kim <[email protected]>
>>
>> Add framework to register callback functions and check page mobility.
>> There are some modes for page isolation so that isolate interface
>> has arguments of page address and isolation mode while putback
>> interface has only page address as argument.
>
>
> Note that unlike what subject suggest, this doesn't really enable
> mobile-page migration inside compaction, since that only happens with patch
> 3. This might theoretically affect some cherry-pick backports that don't
> care about balloon pages. I can imagine that can easily happen in the world
> of mobile devices?
> It would thus be somewhat cleaner if this patch was complete in that sense.
>
>> Signed-off-by: Gioh Kim <[email protected]>
>> Acked-by: Rafael Aquini <[email protected]>
>> ---
>> fs/proc/page.c | 3 ++
>> include/linux/compaction.h | 80
>> ++++++++++++++++++++++++++++++++++
>> include/linux/fs.h | 2 +
>> include/linux/page-flags.h | 19 ++++++++
>> include/uapi/linux/kernel-page-flags.h | 1 +
>> 5 files changed, 105 insertions(+)
>>
>> diff --git a/fs/proc/page.c b/fs/proc/page.c
>> index 7eee2d8..a4f5a00 100644
>> --- a/fs/proc/page.c
>> +++ b/fs/proc/page.c
>> @@ -146,6 +146,9 @@ u64 stable_page_flags(struct page *page)
>> if (PageBalloon(page))
>> u |= 1 << KPF_BALLOON;
>>
>> + if (PageMobile(page))
>> + u |= 1 << KPF_MOBILE;
>
>
> PageMovable() would probably be as good a name and correspond to
> MIGRATE_MOVABLE somewhat, unlike a completely new term. Whatever driver
> starts to using this should probably change allocation flags to allocate
> MIGRATE_MOVABLE, so that it works fine with what fragmentation avoidance
> expects. Guess I should have said that earlier, but can you still
> reconsider?
Well, I've suggested to name it "mobile" because there's already a lot of things
called "movable". Mobile pages are special subset of movable pages: they
are non-lru pages and define their own rules of moving in address
space operations.
Also there's a little pun: I guess main user will zram which is used
mostly in embedded/mobile devices.
>
>> +
>> u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
>>
>> u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
>> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
>> index aa8f61c..f693072 100644
>> --- a/include/linux/compaction.h
>> +++ b/include/linux/compaction.h
>> @@ -1,6 +1,9 @@
>> #ifndef _LINUX_COMPACTION_H
>> #define _LINUX_COMPACTION_H
>>
>> +#include <linux/page-flags.h>
>> +#include <linux/pagemap.h>
>> +
>> /* Return values for compact_zone() and try_to_compact_pages() */
>> /* compaction didn't start as it was deferred due to past failures */
>> #define COMPACT_DEFERRED 0
>> @@ -51,6 +54,70 @@ extern void compaction_defer_reset(struct zone *zone,
>> int order,
>> bool alloc_success);
>> extern bool compaction_restarting(struct zone *zone, int order);
>>
>> +static inline bool mobile_page(struct page *page)
>> +{
>> + return page->mapping && (PageMobile(page) || PageBalloon(page));
>> +}
>
>
> I would put this definition to linux/page-flags.h and rename it to
> page_mobile (or better page_movable()), which is more common ordering.
>
>> +
>> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t
>> mode)
>
>
> Does this have to be in compaction.h? The only user is compaction.c so
> probably move it there, and if there ever is another module using this in
> the future, we can move it to a more appropriate place and declare it in
> e.g. mm/internal.h.
>
>
>> +{
>> + bool ret = false;
>> +
>> + /*
>> + * Avoid burning cycles with pages that are yet under
>> __free_pages(),
>> + * or just got freed under us.
>> + *
>> + * In case we 'win' a race for a mobile page being freed under us
>> and
>> + * raise its refcount preventing __free_pages() from doing its job
>> + * the put_page() at the end of this block will take care of
>> + * release this page, thus avoiding a nasty leakage.
>> + */
>> + if (unlikely(!get_page_unless_zero(page)))
>> + goto out;
>> +
>> + /*
>> + * As mobile pages are not isolated from LRU lists, concurrent
>> + * compaction threads can race against page migration functions
>> + * as well as race against the releasing a page.
>> + *
>> + * In order to avoid having an already isolated mobile page
>> + * being (wrongly) re-isolated while it is under migration,
>> + * or to avoid attempting to isolate pages being released,
>> + * lets be sure we have the page lock
>> + * before proceeding with the mobile page isolation steps.
>> + */
>> + if (unlikely(!trylock_page(page)))
>> + goto out_putpage;
>> +
>> + if (!(mobile_page(page) && page->mapping->a_ops->isolatepage))
>> + goto out_not_isolated;
>> + ret = page->mapping->a_ops->isolatepage(page, mode);
>> + if (!ret)
>> + goto out_not_isolated;
>> + unlock_page(page);
>> + return ret;
>> +
>> +out_not_isolated:
>> + unlock_page(page);
>> +out_putpage:
>> + put_page(page);
>> +out:
>> + return ret;
>> +}
>> +
>> +static inline void putback_mobilepage(struct page *page)
>
>
> Likewise, this could go to migrate.c. Or maybe together with
> isolate_mobilepage() if you don't want to split them.
>
>
>> +{
>> + /*
>> + * 'lock_page()' stabilizes the page and prevents races against
>> + * concurrent isolation threads attempting to re-isolate it.
>> + */
>> + lock_page(page);
>> + if (page->mapping && page->mapping->a_ops->putbackpage)
>> + page->mapping->a_ops->putbackpage(page);
>> + unlock_page(page);
>> + /* drop the extra ref count taken for mobile page isolation */
>> + put_page(page);
>> +}
>> #else
>> static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
>> unsigned int order, int alloc_flags,
>> @@ -83,6 +150,19 @@ static inline bool compaction_deferred(struct zone
>> *zone, int order)
>> return true;
>> }
>>
>> +static inline bool mobile_page(struct page *page)
>> +{
>> + return false;
>> +}
>> +
>> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t
>> mode)
>> +{
>> + return false;
>> +}
>> +
>> +static inline void putback_mobilepage(struct page *page)
>> +{
>> +}
>> #endif /* CONFIG_COMPACTION */
>>
>> #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) &&
>> defined(CONFIG_NUMA)
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index a0653e5..2cc4b24 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -396,6 +396,8 @@ struct address_space_operations {
>> */
>> int (*migratepage) (struct address_space *,
>> struct page *, struct page *, enum migrate_mode);
>> + bool (*isolatepage) (struct page *, isolate_mode_t);
>> + void (*putbackpage) (struct page *);
>> int (*launder_page) (struct page *);
>> int (*is_partially_uptodate) (struct page *, unsigned long,
>> unsigned long);
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index f34e040..abef145 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -582,6 +582,25 @@ static inline void __ClearPageBalloon(struct page
>> *page)
>> atomic_set(&page->_mapcount, -1);
>> }
>>
>> +#define PAGE_MOBILE_MAPCOUNT_VALUE (-255)
>> +
>> +static inline int PageMobile(struct page *page)
>> +{
>> + return atomic_read(&page->_mapcount) ==
>> PAGE_MOBILE_MAPCOUNT_VALUE;
>> +}
>> +
>> +static inline void __SetPageMobile(struct page *page)
>> +{
>> + VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
>> + atomic_set(&page->_mapcount, PAGE_MOBILE_MAPCOUNT_VALUE);
>> +}
>> +
>> +static inline void __ClearPageMobile(struct page *page)
>> +{
>> + VM_BUG_ON_PAGE(!PageMobile(page), page);
>> + atomic_set(&page->_mapcount, -1);
>> +}
>> +
>> /*
>> * If network-based swap is enabled, sl*b must keep track of whether
>> pages
>> * were allocated from pfmemalloc reserves.
>> diff --git a/include/uapi/linux/kernel-page-flags.h
>> b/include/uapi/linux/kernel-page-flags.h
>> index a6c4962..d50d9e8 100644
>> --- a/include/uapi/linux/kernel-page-flags.h
>> +++ b/include/uapi/linux/kernel-page-flags.h
>> @@ -33,6 +33,7 @@
>> #define KPF_THP 22
>> #define KPF_BALLOON 23
>> #define KPF_ZERO_PAGE 24
>> +#define KPF_MOBILE 25
>>
>>
>> #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
>>
>
> On Mon, Jul 27, 2015 at 4:55 PM, Vlastimil Babka <[email protected]> wrote:
>> On 07/13/2015 10:35 AM, Gioh Kim wrote:
>>>
>>> From: Gioh Kim <[email protected]>
>>>
>>> Add framework to register callback functions and check page mobility.
>>> There are some modes for page isolation so that isolate interface
>>> has arguments of page address and isolation mode while putback
>>> interface has only page address as argument.
>>
>>
>> Note that unlike what subject suggest, this doesn't really enable
>> mobile-page migration inside compaction, since that only happens with patch
>> 3. This might theoretically affect some cherry-pick backports that don't
>> care about balloon pages. I can imagine that can easily happen in the world
>> of mobile devices?
>> It would thus be somewhat cleaner if this patch was complete in that sense.
You have a point.
Current 2/4 patch is lack of calling migrate/isolate.
It is not complete without 3/4.
I'll add calling migrate/isolate() at next spin.
>>
>>> Signed-off-by: Gioh Kim <[email protected]>
>>> Acked-by: Rafael Aquini <[email protected]>
>>> ---
>>> fs/proc/page.c | 3 ++
>>> include/linux/compaction.h | 80
>>> ++++++++++++++++++++++++++++++++++
>>> include/linux/fs.h | 2 +
>>> include/linux/page-flags.h | 19 ++++++++
>>> include/uapi/linux/kernel-page-flags.h | 1 +
>>> 5 files changed, 105 insertions(+)
>>>
>>> diff --git a/fs/proc/page.c b/fs/proc/page.c
>>> index 7eee2d8..a4f5a00 100644
>>> --- a/fs/proc/page.c
>>> +++ b/fs/proc/page.c
>>> @@ -146,6 +146,9 @@ u64 stable_page_flags(struct page *page)
>>> if (PageBalloon(page))
>>> u |= 1 << KPF_BALLOON;
>>>
>>> + if (PageMobile(page))
>>> + u |= 1 << KPF_MOBILE;
>>
>>
>> PageMovable() would probably be as good a name and correspond to
>> MIGRATE_MOVABLE somewhat, unlike a completely new term. Whatever driver
>> starts to using this should probably change allocation flags to allocate
>> MIGRATE_MOVABLE, so that it works fine with what fragmentation avoidance
>> expects. Guess I should have said that earlier, but can you still
>> reconsider?
>
> Well, I've suggested to name it "mobile" because there's already a lot of things
> called "movable". Mobile pages are special subset of movable pages: they
> are non-lru pages and define their own rules of moving in address
> space operations.
>
> Also there's a little pun: I guess main user will zram which is used
> mostly in embedded/mobile devices.
I like "mobile".
"movable" is for pages allocated with GFP_MOVABLE.
I think "movable" is a little ambiguous in this situation.
>
>>
>>> +
>>> u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
>>>
>>> u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
>>> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
>>> index aa8f61c..f693072 100644
>>> --- a/include/linux/compaction.h
>>> +++ b/include/linux/compaction.h
>>> @@ -1,6 +1,9 @@
>>> #ifndef _LINUX_COMPACTION_H
>>> #define _LINUX_COMPACTION_H
>>>
>>> +#include <linux/page-flags.h>
>>> +#include <linux/pagemap.h>
>>> +
>>> /* Return values for compact_zone() and try_to_compact_pages() */
>>> /* compaction didn't start as it was deferred due to past failures */
>>> #define COMPACT_DEFERRED 0
>>> @@ -51,6 +54,70 @@ extern void compaction_defer_reset(struct zone *zone,
>>> int order,
>>> bool alloc_success);
>>> extern bool compaction_restarting(struct zone *zone, int order);
>>>
>>> +static inline bool mobile_page(struct page *page)
>>> +{
>>> + return page->mapping && (PageMobile(page) || PageBalloon(page));
>>> +}
>>
>>
>> I would put this definition to linux/page-flags.h and rename it to
>> page_mobile (or better page_movable()), which is more common ordering.
>>
>>> +
>>> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t
>>> mode)
>>
>>
>> Does this have to be in compaction.h? The only user is compaction.c so
>> probably move it there, and if there ever is another module using this in
>> the future, we can move it to a more appropriate place and declare it in
>> e.g. mm/internal.h.
I think compaction.c is suitable.
>>
>>
>>> +{
>>> + bool ret = false;
>>> +
>>> + /*
>>> + * Avoid burning cycles with pages that are yet under
>>> __free_pages(),
>>> + * or just got freed under us.
>>> + *
>>> + * In case we 'win' a race for a mobile page being freed under us
>>> and
>>> + * raise its refcount preventing __free_pages() from doing its job
>>> + * the put_page() at the end of this block will take care of
>>> + * release this page, thus avoiding a nasty leakage.
>>> + */
>>> + if (unlikely(!get_page_unless_zero(page)))
>>> + goto out;
>>> +
>>> + /*
>>> + * As mobile pages are not isolated from LRU lists, concurrent
>>> + * compaction threads can race against page migration functions
>>> + * as well as race against the releasing a page.
>>> + *
>>> + * In order to avoid having an already isolated mobile page
>>> + * being (wrongly) re-isolated while it is under migration,
>>> + * or to avoid attempting to isolate pages being released,
>>> + * lets be sure we have the page lock
>>> + * before proceeding with the mobile page isolation steps.
>>> + */
>>> + if (unlikely(!trylock_page(page)))
>>> + goto out_putpage;
>>> +
>>> + if (!(mobile_page(page) && page->mapping->a_ops->isolatepage))
>>> + goto out_not_isolated;
>>> + ret = page->mapping->a_ops->isolatepage(page, mode);
>>> + if (!ret)
>>> + goto out_not_isolated;
>>> + unlock_page(page);
>>> + return ret;
>>> +
>>> +out_not_isolated:
>>> + unlock_page(page);
>>> +out_putpage:
>>> + put_page(page);
>>> +out:
>>> + return ret;
>>> +}
>>> +
>>> +static inline void putback_mobilepage(struct page *page)
>>
>>
>> Likewise, this could go to migrate.c. Or maybe together with
>> isolate_mobilepage() if you don't want to split them.
I got it.
>>
>>
>>> +{
>>> + /*
>>> + * 'lock_page()' stabilizes the page and prevents races against
>>> + * concurrent isolation threads attempting to re-isolate it.
>>> + */
>>> + lock_page(page);
>>> + if (page->mapping && page->mapping->a_ops->putbackpage)
>>> + page->mapping->a_ops->putbackpage(page);
>>> + unlock_page(page);
>>> + /* drop the extra ref count taken for mobile page isolation */
>>> + put_page(page);
>>> +}
>>> #else
>>> static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
>>> unsigned int order, int alloc_flags,
>>> @@ -83,6 +150,19 @@ static inline bool compaction_deferred(struct zone
>>> *zone, int order)
>>> return true;
>>> }
>>>
>>> +static inline bool mobile_page(struct page *page)
>>> +{
>>> + return false;
>>> +}
>>> +
>>> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t
>>> mode)
>>> +{
>>> + return false;
>>> +}
>>> +
>>> +static inline void putback_mobilepage(struct page *page)
>>> +{
>>> +}
>>> #endif /* CONFIG_COMPACTION */
>>>
>>> #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) &&
>>> defined(CONFIG_NUMA)
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index a0653e5..2cc4b24 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -396,6 +396,8 @@ struct address_space_operations {
>>> */
>>> int (*migratepage) (struct address_space *,
>>> struct page *, struct page *, enum migrate_mode);
>>> + bool (*isolatepage) (struct page *, isolate_mode_t);
>>> + void (*putbackpage) (struct page *);
>>> int (*launder_page) (struct page *);
>>> int (*is_partially_uptodate) (struct page *, unsigned long,
>>> unsigned long);
>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>> index f34e040..abef145 100644
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -582,6 +582,25 @@ static inline void __ClearPageBalloon(struct page
>>> *page)
>>> atomic_set(&page->_mapcount, -1);
>>> }
>>>
>>> +#define PAGE_MOBILE_MAPCOUNT_VALUE (-255)
>>> +
>>> +static inline int PageMobile(struct page *page)
>>> +{
>>> + return atomic_read(&page->_mapcount) ==
>>> PAGE_MOBILE_MAPCOUNT_VALUE;
>>> +}
>>> +
>>> +static inline void __SetPageMobile(struct page *page)
>>> +{
>>> + VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
>>> + atomic_set(&page->_mapcount, PAGE_MOBILE_MAPCOUNT_VALUE);
>>> +}
>>> +
>>> +static inline void __ClearPageMobile(struct page *page)
>>> +{
>>> + VM_BUG_ON_PAGE(!PageMobile(page), page);
>>> + atomic_set(&page->_mapcount, -1);
>>> +}
>>> +
>>> /*
>>> * If network-based swap is enabled, sl*b must keep track of whether
>>> pages
>>> * were allocated from pfmemalloc reserves.
>>> diff --git a/include/uapi/linux/kernel-page-flags.h
>>> b/include/uapi/linux/kernel-page-flags.h
>>> index a6c4962..d50d9e8 100644
>>> --- a/include/uapi/linux/kernel-page-flags.h
>>> +++ b/include/uapi/linux/kernel-page-flags.h
>>> @@ -33,6 +33,7 @@
>>> #define KPF_THP 22
>>> #define KPF_BALLOON 23
>>> #define KPF_ZERO_PAGE 24
>>> +#define KPF_MOBILE 25
>>>
>>>
>>> #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
>>>
>>
>
2015-07-27 오후 10:58에 Vlastimil Babka 이(가) 쓴 글:
> On 07/13/2015 10:35 AM, Gioh Kim wrote:
>> From: Gioh Kim <[email protected]>
>>
>> Migration is completely generalized so that migrating mobile page
>> is processed with lru-pages in move_to_new_page.
>>
>> Signed-off-by: Gioh Kim <[email protected]>
>> Acked-by: Rafael Aquini <[email protected]>
>
> Why not just fold this to Patch 3? You already modify this hunk there, and prior to patch 3, the hunk was balloon-pages specific. You made it look generic only to remove it, which is unneeded code churn and I don't think it adds anything wrt e.g. bisectability.
Yes, you have a point.
I'll be merged into 3/4 at next spin.
I wanted to show the process how migration is generalized with this patch.
On second thought it is not necessary.
>
>> ---
>> mm/migrate.c | 15 ---------------
>> 1 file changed, 15 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 53f0081d..e6644ac 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -844,21 +844,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>> }
>> }
>>
>> - if (unlikely(mobile_page(page))) {
>> - /*
>> - * A mobile page does not need any special attention from
>> - * physical to virtual reverse mapping procedures.
>> - * Skip any attempt to unmap PTEs or to remap swap cache,
>> - * in order to avoid burning cycles at rmap level, and perform
>> - * the page migration right away (proteced by page lock).
>> - */
>> - lock_page(newpage);
>> - rc = page->mapping->a_ops->migratepage(page->mapping,
>> - newpage, page, mode);
>> - unlock_page(newpage);
>> - goto out_unlock;
>> - }
>> -
>> /*
>> * Corner case handling:
>> * 1. When a new swap-cache page is read into, it is added to the LRU
>>
>
>
On Mon, Jul 13, 2015 at 05:35:15PM +0900, Gioh Kim wrote:
> My ARM-based platform occured severe fragmentation problem after long-term
> (several days) test. Sometimes even order-3 page allocation failed. It has
> memory size 512MB ~ 1024MB. 30% ~ 40% memory is consumed for graphic processing
> and 20~30 memory is reserved for zram.
>
The primary motivation of this series is to reduce fragmentation by allowing
more kernel pages to be moved. Conceptually that is a worthwhile goal but
there should be at least one major in-kernel user and while balloon
pages were a good starting point, I think we really need to see what the
zram changes look like at the same time.
> I found that many pages of GPU driver and zram are non-movable pages. So I
> reported Minchan Kim, the maintainer of zram, and he made the internal
> compaction logic of zram. And I made the internal compaction of GPU driver.
>
I am not familiar with the internals of zram but I took a look at what
it merged. At a glance the compaction it implements and what you need are
are different in important respects. The core ability to move a zsmalloc
object is useful but the motivation of zram compaction appears to be
reducing the memory footprint. You need to reduce fragmentation which is
not the same. You could be faced with a situation where a full page in an
awkward place. Then there are three choices I can think of quickly and
probably more
1. You can move the whole page to another whole page and update all the
references. This would play nicely with how compactions migrate and
free scanner operates. However, you need free memory to move it
2. You could try moving the full page into other zsmalloc pages so that
memory usage is also potentially reduced. This would work better with
what Minchan intended but then there is the problem of discovery.
Potentially it means though that another address space callback is
required to nominate a target migration page
3. Hybrid approach. First trigger the zsmalloc compaction as it
currently exists, then kick of compaction and move whole pages
regardless of their content. The downside here is that it's expensive
and potentially copies data multiple times but it's going to be
easier to implement than 2.
1 would be the logical starting point, 3 is probably most effective even
if it's expensive and 2 is probably the best overall if the search costs
can be controlled.
This is a lot more complex than what balloon requires which is why I
would like to see it pinned down before new address_space operations are
created. Once they are created and drivers start using them then we lose
a lot of flexibilty and fixing the design becomes a lot harder.
With that in mind, I'll still read the rest of the series.
--
Mel Gorman
SUSE Labs
On Mon, Jul 13, 2015 at 05:35:16PM +0900, Gioh Kim wrote:
> From: Gioh Kim <[email protected]>
>
> The anon_inodes has already complete interfaces to create manage
> many anonymous inodes but don't have interface to get
> new inode. Other sub-modules can create anonymous inode
> without creating and mounting it's own pseudo filesystem.
>
> Signed-off-by: Gioh Kim <[email protected]>
> Acked-by: Rafael Aquini <[email protected]>
This is my first run through the series so I'm going to miss details but
this patch confuses me a little. You create an inode to associate with
the balloon dev_info so that page->mapping can be assigned. It's only the
mapping you care about for the aops so why are multiple inodes required? A
driver should be able to share and reference count a single inode. The
motivation to do it that way would be to reduce memory consumption and
this series is motivated by embedded platforms.
anon_inode_getfd has the following
* Creates a new file by hooking it on a single inode. This is useful for files
* that do not need to have a full-fledged inode in order to operate correctly.
* All the files created with anon_inode_getfd() will share a single inode,
* hence saving memory and avoiding code duplication for the file/inode/dentry
* setup. Returns new descriptor or an error code.
If all we care about the inode is the aops then it would follow that
anon_inode_getfd() is ideal. The tradeoff is reference counting overhead.
The changelog needs to explain why anon_inode_getfd() cannot be used.
--
Mel Gorman
SUSE Labs
On Mon, Jul 13, 2015 at 05:35:17PM +0900, Gioh Kim wrote:
> From: Gioh Kim <[email protected]>
>
> Add framework to register callback functions and check page mobility.
> There are some modes for page isolation so that isolate interface
> has arguments of page address and isolation mode while putback
> interface has only page address as argument.
>
> Signed-off-by: Gioh Kim <[email protected]>
> Acked-by: Rafael Aquini <[email protected]>
> ---
> fs/proc/page.c | 3 ++
> include/linux/compaction.h | 80 ++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 2 +
> include/linux/page-flags.h | 19 ++++++++
> include/uapi/linux/kernel-page-flags.h | 1 +
> 5 files changed, 105 insertions(+)
>
An update to the address_space operations in
Documentation/filesystems/Locking and Documentation/filesystems/vfs.txt
is required. I was going to say "recommended" but it really is required.
The responsibilities and locking rules of these interfaces must be extremely
clear as you may be asking multiple driver authors to use this interface.
For example, it must be clear to users of these interfaces that the isolate
must prevent any parallel updates to the data, prevent parallel frees and
halt attempted accesses until migration is complete. It will not always
be obvious how to do this and may not be obvious that it is required if
someone has not experienced the joy that is mm/migrate.c. For example,
mapped LRU pages get unmapped with migration entries so faults that access
the data wait until the migration completes. Balloons, zram, graphics will
need to provide similar guarantees.
As data accesses may now sleep due to migration, drivers will need to
be careful that it is safe to sleep and suggest that they do not attempt
to spin.
Depending on how it is implemented, the putback may be responsible for
waking up any tasks waiting to access the page.
There are going to be more hazards here which is why documentation to spell
it out is ideal and that zram gets converted to find all the locking and
access pitfalls.
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 7eee2d8..a4f5a00 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -146,6 +146,9 @@ u64 stable_page_flags(struct page *page)
> if (PageBalloon(page))
> u |= 1 << KPF_BALLOON;
>
> + if (PageMobile(page))
> + u |= 1 << KPF_MOBILE;
> +
> u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
>
> u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index aa8f61c..f693072 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -1,6 +1,9 @@
> #ifndef _LINUX_COMPACTION_H
> #define _LINUX_COMPACTION_H
>
> +#include <linux/page-flags.h>
> +#include <linux/pagemap.h>
> +
> /* Return values for compact_zone() and try_to_compact_pages() */
> /* compaction didn't start as it was deferred due to past failures */
> #define COMPACT_DEFERRED 0
> @@ -51,6 +54,70 @@ extern void compaction_defer_reset(struct zone *zone, int order,
> bool alloc_success);
> extern bool compaction_restarting(struct zone *zone, int order);
>
> +static inline bool mobile_page(struct page *page)
> +{
> + return page->mapping && (PageMobile(page) || PageBalloon(page));
> +}
> +
This creates an oddity because now there is a disconnect between movable
and mobile pages. They are similar but different.
o A Mobile page is a driver-owned page that has the address space
operations that enable migration.
o A Movable page is generally a page mapped by page tables that can be
migrated using the existing mechanisms.
The concepts should be unified.
A Mobile page is a driver-owner page that has the address space
operations that enable migration. Pages that are mapped by userspace are
considered to be mobile with the following properties
a_ops->isolatepage isolates the page from the LRU to prevent
parallel reclaim. It is unmapped from page tables using rmap
with PTEs replaced by migration entries. Any attempt to access
the page will wait in page fault until the migration completes.
a_ops->putbackpage removes the migration entries and wakes up
all waiters in page fault.
A further property is that allocation of this type specified
__GFP_MOVABLE to group them all together. They are the most mobile
page category that are cheapest to move. In theory, all mobile
pages could be allocated __GFP_MOVABLE if it's known in advance
the page->mapping will have the necessary operations in the
future.
?
A complicating factor is that a Movable page as it's currently defined
may not have a page->mapping. You'd have to continue replying on PageLRU to
identify them as a special page that has access to the necessary isolateppage
and putbackpage helpers. However, at least we would have a single view
on what a movable page is.
Additional note: After I wrote the above, I read the other reviews. I
did not read them in advance so I'd have a fresh view. I see
Konstantin Khlebnikov has been active and he suggested the mobility
naming to distinguish between the LRU pages. I simply disagree
even though I see his reasoning. I do not think we should have a
special case of LRU pages and everything else. Instead we should
have a single concept of movability (or mobility) with the special
case being that LRU pages without an aops can directly call the
necessary helpers.
> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
> +{
> + bool ret = false;
> +
> + /*
> + * Avoid burning cycles with pages that are yet under __free_pages(),
> + * or just got freed under us.
> + *
> + * In case we 'win' a race for a mobile page being freed under us and
> + * raise its refcount preventing __free_pages() from doing its job
> + * the put_page() at the end of this block will take care of
> + * release this page, thus avoiding a nasty leakage.
> + */
> + if (unlikely(!get_page_unless_zero(page)))
> + goto out;
> +
Ok.
> + /*
> + * As mobile pages are not isolated from LRU lists, concurrent
> + * compaction threads can race against page migration functions
> + * as well as race against the releasing a page.
> + *
> + * In order to avoid having an already isolated mobile page
> + * being (wrongly) re-isolated while it is under migration,
> + * or to avoid attempting to isolate pages being released,
> + * lets be sure we have the page lock
> + * before proceeding with the mobile page isolation steps.
> + */
> + if (unlikely(!trylock_page(page)))
> + goto out_putpage;
> +
There are some big assumptions here. It assumes that any users of this
interface can prevent parallel compaction attempts via the page lock. It
also assumes that the caller does not recursively hold the page lock already.
It would be incompatible with how LRU pages are isolated as they co-ordinate
via the zone->lru_lock.
I suspect you went with the page lock because it happens to be what the
balloon driver needed which is fine, but potentially pastes us into a
corner later.
I don't see a way this could be generically handled for arbitrary subsystems
unless you put responsibility for the locking inside a_ops->isolatepage. That
still works for existing movable pages if you give it a pseudo a_ops for
pages without page->mapping.
Because of this, I really think it would benefit if there was a patch
3 that converted the existing migration of LRU pages to use the aops
interface. This could be done via a fake address_space that only populates
the migration interfaces and is used for LRU pages. Then remove the LRU
special casing in compaction and migration before converting the balloon
driver and zram. This will rattle out any conceivable locking hazard and
unify migration in general. I recognise that it's a lot of heavy lifting
unfortunately but it leaves you with a partial solution to your problem
(zram in the way) and paves the way for drivers to reliably convert.
> + if (!(mobile_page(page) && page->mapping->a_ops->isolatepage))
> + goto out_not_isolated;
> + ret = page->mapping->a_ops->isolatepage(page, mode);
> + if (!ret)
> + goto out_not_isolated;
> + unlock_page(page);
> + return ret;
> +
> +out_not_isolated:
> + unlock_page(page);
> +out_putpage:
> + put_page(page);
> +out:
> + return ret;
> +}
> +
> +static inline void putback_mobilepage(struct page *page)
> +{
> + /*
> + * 'lock_page()' stabilizes the page and prevents races against
> + * concurrent isolation threads attempting to re-isolate it.
> + */
> + lock_page(page);
> + if (page->mapping && page->mapping->a_ops->putbackpage)
> + page->mapping->a_ops->putbackpage(page);
> + unlock_page(page);
> + /* drop the extra ref count taken for mobile page isolation */
> + put_page(page);
> +}
Similar comments about the locking, I think the a_ops handler needs to be
responsible. We should not expand the role of the page->lock in the
general case.
> #else
> static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
> unsigned int order, int alloc_flags,
> @@ -83,6 +150,19 @@ static inline bool compaction_deferred(struct zone *zone, int order)
> return true;
> }
>
> +static inline bool mobile_page(struct page *page)
> +{
> + return false;
> +}
> +
> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
> +{
> + return false;
> +}
> +
> +static inline void putback_mobilepage(struct page *page)
> +{
> +}
> #endif /* CONFIG_COMPACTION */
>
> #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a0653e5..2cc4b24 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -396,6 +396,8 @@ struct address_space_operations {
> */
> int (*migratepage) (struct address_space *,
> struct page *, struct page *, enum migrate_mode);
> + bool (*isolatepage) (struct page *, isolate_mode_t);
> + void (*putbackpage) (struct page *);
> int (*launder_page) (struct page *);
> int (*is_partially_uptodate) (struct page *, unsigned long,
> unsigned long);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f34e040..abef145 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -582,6 +582,25 @@ static inline void __ClearPageBalloon(struct page *page)
> atomic_set(&page->_mapcount, -1);
> }
>
> +#define PAGE_MOBILE_MAPCOUNT_VALUE (-255)
> +
> +static inline int PageMobile(struct page *page)
> +{
> + return atomic_read(&page->_mapcount) == PAGE_MOBILE_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageMobile(struct page *page)
> +{
> + VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> + atomic_set(&page->_mapcount, PAGE_MOBILE_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageMobile(struct page *page)
> +{
> + VM_BUG_ON_PAGE(!PageMobile(page), page);
> + atomic_set(&page->_mapcount, -1);
> +}
> +
This definition of Mobility would prevent LRU pages ever being considered
"mobile" in the same why. Why do we not either check it's an LRU page (in
which case it's inherently mobile) or has an aops with the correct handlers?
> /*
> * If network-based swap is enabled, sl*b must keep track of whether pages
> * were allocated from pfmemalloc reserves.
> diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> index a6c4962..d50d9e8 100644
> --- a/include/uapi/linux/kernel-page-flags.h
> +++ b/include/uapi/linux/kernel-page-flags.h
> @@ -33,6 +33,7 @@
> #define KPF_THP 22
> #define KPF_BALLOON 23
> #define KPF_ZERO_PAGE 24
> +#define KPF_MOBILE 25
>
>
> #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
--
Mel Gorman
SUSE Labs
On Wed, Jul 29, 2015 at 11:49:45AM +0100, Mel Gorman wrote:
> On Mon, Jul 13, 2015 at 05:35:15PM +0900, Gioh Kim wrote:
> > My ARM-based platform occured severe fragmentation problem after long-term
> > (several days) test. Sometimes even order-3 page allocation failed. It has
> > memory size 512MB ~ 1024MB. 30% ~ 40% memory is consumed for graphic processing
> > and 20~30 memory is reserved for zram.
> >
>
> The primary motivation of this series is to reduce fragmentation by allowing
> more kernel pages to be moved. Conceptually that is a worthwhile goal but
> there should be at least one major in-kernel user and while balloon
> pages were a good starting point, I think we really need to see what the
> zram changes look like at the same time.
I think gpu drivers really would be the perfect candidate for compacting
kernel page allocations. And this also seems the primary motivation for
this patch series, so I think that's really what we should use to judge
these patches.
Of course then there's the seemingly eternal chicken/egg problem of
upstream gpu drivers for SoCs :(
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Wed, Jul 29, 2015 at 12:55:54PM +0200, Daniel Vetter wrote:
> On Wed, Jul 29, 2015 at 11:49:45AM +0100, Mel Gorman wrote:
> > On Mon, Jul 13, 2015 at 05:35:15PM +0900, Gioh Kim wrote:
> > > My ARM-based platform occured severe fragmentation problem after long-term
> > > (several days) test. Sometimes even order-3 page allocation failed. It has
> > > memory size 512MB ~ 1024MB. 30% ~ 40% memory is consumed for graphic processing
> > > and 20~30 memory is reserved for zram.
> > >
> >
> > The primary motivation of this series is to reduce fragmentation by allowing
> > more kernel pages to be moved. Conceptually that is a worthwhile goal but
> > there should be at least one major in-kernel user and while balloon
> > pages were a good starting point, I think we really need to see what the
> > zram changes look like at the same time.
>
> I think gpu drivers really would be the perfect candidate for compacting
> kernel page allocations. And this also seems the primary motivation for
> this patch series, so I think that's really what we should use to judge
> these patches.
>
> Of course then there's the seemingly eternal chicken/egg problem of
> upstream gpu drivers for SoCs :(
I recognised that the driver he had modified was not an in-tree user so
it did not really help the review or the design. I did not think it was
very fair to ask that an in-tree GPU driver be converted when it would not
help the embedded platform of interest. Converting zram is both a useful
illustration of the aops requirements and is expected to be beneficial on
the embedded platform. Now, if a GPU driver author was willing to convert
theirs as an example then that would be useful!
--
Mel Gorman
SUSE Labs
On Wed, Jul 29, 2015 at 01:16:14PM +0100, Mel Gorman wrote:
> On Wed, Jul 29, 2015 at 12:55:54PM +0200, Daniel Vetter wrote:
> > On Wed, Jul 29, 2015 at 11:49:45AM +0100, Mel Gorman wrote:
> > > On Mon, Jul 13, 2015 at 05:35:15PM +0900, Gioh Kim wrote:
> > > > My ARM-based platform occured severe fragmentation problem after long-term
> > > > (several days) test. Sometimes even order-3 page allocation failed. It has
> > > > memory size 512MB ~ 1024MB. 30% ~ 40% memory is consumed for graphic processing
> > > > and 20~30 memory is reserved for zram.
> > > >
> > >
> > > The primary motivation of this series is to reduce fragmentation by allowing
> > > more kernel pages to be moved. Conceptually that is a worthwhile goal but
> > > there should be at least one major in-kernel user and while balloon
> > > pages were a good starting point, I think we really need to see what the
> > > zram changes look like at the same time.
> >
> > I think gpu drivers really would be the perfect candidate for compacting
> > kernel page allocations. And this also seems the primary motivation for
> > this patch series, so I think that's really what we should use to judge
> > these patches.
> >
> > Of course then there's the seemingly eternal chicken/egg problem of
> > upstream gpu drivers for SoCs :(
>
> I recognised that the driver he had modified was not an in-tree user so
> it did not really help the review or the design. I did not think it was
> very fair to ask that an in-tree GPU driver be converted when it would not
> help the embedded platform of interest. Converting zram is both a useful
> illustration of the aops requirements and is expected to be beneficial on
> the embedded platform. Now, if a GPU driver author was willing to convert
> theirs as an example then that would be useful!
Well my concern is more with merging infrastructure to upstream for
drivers which aren't upstream and with no plan to make that happen anytime
soon. Seems like just offload a bit to me ... but in the end core mm isn't
my thing so not my decision.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
2015-07-29 오후 9:46에 Daniel Vetter 이(가) 쓴 글:
> On Wed, Jul 29, 2015 at 01:16:14PM +0100, Mel Gorman wrote:
>> On Wed, Jul 29, 2015 at 12:55:54PM +0200, Daniel Vetter wrote:
>>> On Wed, Jul 29, 2015 at 11:49:45AM +0100, Mel Gorman wrote:
>>>> On Mon, Jul 13, 2015 at 05:35:15PM +0900, Gioh Kim wrote:
>>>>> My ARM-based platform occured severe fragmentation problem after long-term
>>>>> (several days) test. Sometimes even order-3 page allocation failed. It has
>>>>> memory size 512MB ~ 1024MB. 30% ~ 40% memory is consumed for graphic processing
>>>>> and 20~30 memory is reserved for zram.
>>>>>
>>>>
>>>> The primary motivation of this series is to reduce fragmentation by allowing
>>>> more kernel pages to be moved. Conceptually that is a worthwhile goal but
>>>> there should be at least one major in-kernel user and while balloon
>>>> pages were a good starting point, I think we really need to see what the
>>>> zram changes look like at the same time.
>>>
>>> I think gpu drivers really would be the perfect candidate for compacting
>>> kernel page allocations. And this also seems the primary motivation for
>>> this patch series, so I think that's really what we should use to judge
>>> these patches.
>>>
>>> Of course then there's the seemingly eternal chicken/egg problem of
>>> upstream gpu drivers for SoCs :(
>>
>> I recognised that the driver he had modified was not an in-tree user so
>> it did not really help the review or the design. I did not think it was
>> very fair to ask that an in-tree GPU driver be converted when it would not
>> help the embedded platform of interest. Converting zram is both a useful
>> illustration of the aops requirements and is expected to be beneficial on
>> the embedded platform. Now, if a GPU driver author was willing to convert
>> theirs as an example then that would be useful!
>
> Well my concern is more with merging infrastructure to upstream for
> drivers which aren't upstream and with no plan to make that happen anytime
> soon. Seems like just offload a bit to me ... but in the end core mm isn't
> my thing so not my decision.
> -Daniel
>
I get idea from the out-tree driver but this infrastructure will be useful
for zram and balloon. That is agreed by the maintainers of each driver.
I'm currently accepting feedbacks from
balloon and zram and trying to be applicable for them.
Of course I hope there will be more application. It'll be more useful
if it has more application.
Hi Gioh,
On Mon, Jul 13, 2015 at 05:35:17PM +0900, Gioh Kim wrote:
> From: Gioh Kim <[email protected]>
>
> Add framework to register callback functions and check page mobility.
> There are some modes for page isolation so that isolate interface
> has arguments of page address and isolation mode while putback
> interface has only page address as argument.
>
> Signed-off-by: Gioh Kim <[email protected]>
> Acked-by: Rafael Aquini <[email protected]>
> ---
> fs/proc/page.c | 3 ++
> include/linux/compaction.h | 80 ++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 2 +
> include/linux/page-flags.h | 19 ++++++++
> include/uapi/linux/kernel-page-flags.h | 1 +
> 5 files changed, 105 insertions(+)
>
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 7eee2d8..a4f5a00 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -146,6 +146,9 @@ u64 stable_page_flags(struct page *page)
> if (PageBalloon(page))
> u |= 1 << KPF_BALLOON;
>
> + if (PageMobile(page))
> + u |= 1 << KPF_MOBILE;
> +
Need a new page flag for non-LRU page migration?
I am worry that we don't have empty slot for page flag on 32-bit.
Aha, see SetPageMobile below. You use _mapcount.
It seems to be work for non-LRU pages but unfortunately, zsmalloc
already have used that field as own purpose so there is no room
for it.
> u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
>
> u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index aa8f61c..f693072 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -1,6 +1,9 @@
> #ifndef _LINUX_COMPACTION_H
> #define _LINUX_COMPACTION_H
>
> +#include <linux/page-flags.h>
> +#include <linux/pagemap.h>
> +
> /* Return values for compact_zone() and try_to_compact_pages() */
> /* compaction didn't start as it was deferred due to past failures */
> #define COMPACT_DEFERRED 0
> @@ -51,6 +54,70 @@ extern void compaction_defer_reset(struct zone *zone, int order,
> bool alloc_success);
> extern bool compaction_restarting(struct zone *zone, int order);
>
> +static inline bool mobile_page(struct page *page)
> +{
> + return page->mapping && (PageMobile(page) || PageBalloon(page));
What's the locking rule to test mobile_page?
Why we need such many checking?
Why we need PageBalloon check?
You are trying to make generic abstraction for non-LRU page to migrate
but PageBalloon check in here breaks your goal.
> +}
> +
> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
> +{
> + bool ret = false;
> +
> + /*
> + * Avoid burning cycles with pages that are yet under __free_pages(),
> + * or just got freed under us.
> + *
> + * In case we 'win' a race for a mobile page being freed under us and
> + * raise its refcount preventing __free_pages() from doing its job
> + * the put_page() at the end of this block will take care of
> + * release this page, thus avoiding a nasty leakage.
> + */
Good.
> + if (unlikely(!get_page_unless_zero(page)))
> + goto out;
> +
> + /*
> + * As mobile pages are not isolated from LRU lists, concurrent
> + * compaction threads can race against page migration functions
> + * as well as race against the releasing a page.
How can it race with releasing a page?
We already get refcount above.
Do you mean page->mapping tearing race?
If so, zsmalloc should be chaned to hold a lock.
> + *
> + * In order to avoid having an already isolated mobile page
> + * being (wrongly) re-isolated while it is under migration,
> + * or to avoid attempting to isolate pages being released,
> + * lets be sure we have the page lock
> + * before proceeding with the mobile page isolation steps.
> + */
> + if (unlikely(!trylock_page(page)))
> + goto out_putpage;
> +
> + if (!(mobile_page(page) && page->mapping->a_ops->isolatepage))
> + goto out_not_isolated;
I cannot know how isolate_mobilepage is called by upper layer
because this patch doesn't include it.
Anyway, why we need such many checks to prove it's mobile page?
mobile_page() {
page->mapping && (PageMobile(page) || PageBalloon(page));
}
Additionally, added page->mapping->a_ops->isolatepage check
Is it possible that a non-LRU page's address space provides
only page->maping->migratepage without isolate/putback?
Couldn't we make it simple like this?
page->mapping && page->mapping->migratepage
So, couldn't we make mobile_page like this
PageMobile(struct page *page)
{
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageLRU(page), page);
return page->mapping && page->mapping->migratepage;
}
It will save _mapcount of struct page.
> + ret = page->mapping->a_ops->isolatepage(page, mode);
> + if (!ret)
> + goto out_not_isolated;
> + unlock_page(page);
> + return ret;
> +
> +out_not_isolated:
> + unlock_page(page);
> +out_putpage:
> + put_page(page);
> +out:
> + return ret;
> +}
> +
> +static inline void putback_mobilepage(struct page *page)
> +{
> + /*
> + * 'lock_page()' stabilizes the page and prevents races against
What does it mean 'stabilize'?
Clear comment is always welcome rather than vague word.
> + * concurrent isolation threads attempting to re-isolate it.
> + */
> + lock_page(page);
> + if (page->mapping && page->mapping->a_ops->putbackpage)
> + page->mapping->a_ops->putbackpage(page);
> + unlock_page(page);
> + /* drop the extra ref count taken for mobile page isolation */
> + put_page(page);
> +}
> #else
> static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
> unsigned int order, int alloc_flags,
> @@ -83,6 +150,19 @@ static inline bool compaction_deferred(struct zone *zone, int order)
> return true;
> }
>
> +static inline bool mobile_page(struct page *page)
> +{
> + return false;
> +}
> +
> +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
> +{
> + return false;
> +}
> +
> +static inline void putback_mobilepage(struct page *page)
> +{
> +}
> #endif /* CONFIG_COMPACTION */
>
> #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a0653e5..2cc4b24 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -396,6 +396,8 @@ struct address_space_operations {
> */
> int (*migratepage) (struct address_space *,
> struct page *, struct page *, enum migrate_mode);
> + bool (*isolatepage) (struct page *, isolate_mode_t);
> + void (*putbackpage) (struct page *);
> int (*launder_page) (struct page *);
> int (*is_partially_uptodate) (struct page *, unsigned long,
> unsigned long);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f34e040..abef145 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -582,6 +582,25 @@ static inline void __ClearPageBalloon(struct page *page)
> atomic_set(&page->_mapcount, -1);
> }
>
> +#define PAGE_MOBILE_MAPCOUNT_VALUE (-255)
> +
> +static inline int PageMobile(struct page *page)
> +{
> + return atomic_read(&page->_mapcount) == PAGE_MOBILE_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageMobile(struct page *page)
> +{
> + VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> + atomic_set(&page->_mapcount, PAGE_MOBILE_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageMobile(struct page *page)
> +{
> + VM_BUG_ON_PAGE(!PageMobile(page), page);
> + atomic_set(&page->_mapcount, -1);
> +}
> +
> /*
> * If network-based swap is enabled, sl*b must keep track of whether pages
> * were allocated from pfmemalloc reserves.
> diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> index a6c4962..d50d9e8 100644
> --- a/include/uapi/linux/kernel-page-flags.h
> +++ b/include/uapi/linux/kernel-page-flags.h
> @@ -33,6 +33,7 @@
> #define KPF_THP 22
> #define KPF_BALLOON 23
> #define KPF_ZERO_PAGE 24
> +#define KPF_MOBILE 25
>
>
> #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
> --
> 2.1.4
>
On Fri, Jul 31, 2015 at 07:43:09PM +0900, Minchan Kim wrote:
> Hi Gioh,
>
> On Mon, Jul 13, 2015 at 05:35:17PM +0900, Gioh Kim wrote:
> > From: Gioh Kim <[email protected]>
> >
> > Add framework to register callback functions and check page mobility.
> > There are some modes for page isolation so that isolate interface
> > has arguments of page address and isolation mode while putback
> > interface has only page address as argument.
> >
> > Signed-off-by: Gioh Kim <[email protected]>
> > Acked-by: Rafael Aquini <[email protected]>
> > ---
> > fs/proc/page.c | 3 ++
> > include/linux/compaction.h | 80 ++++++++++++++++++++++++++++++++++
> > include/linux/fs.h | 2 +
> > include/linux/page-flags.h | 19 ++++++++
> > include/uapi/linux/kernel-page-flags.h | 1 +
> > 5 files changed, 105 insertions(+)
> >
> > diff --git a/fs/proc/page.c b/fs/proc/page.c
> > index 7eee2d8..a4f5a00 100644
> > --- a/fs/proc/page.c
> > +++ b/fs/proc/page.c
> > @@ -146,6 +146,9 @@ u64 stable_page_flags(struct page *page)
> > if (PageBalloon(page))
> > u |= 1 << KPF_BALLOON;
> >
> > + if (PageMobile(page))
> > + u |= 1 << KPF_MOBILE;
> > +
>
> Need a new page flag for non-LRU page migration?
> I am worry that we don't have empty slot for page flag on 32-bit.
>
> Aha, see SetPageMobile below. You use _mapcount.
> It seems to be work for non-LRU pages but unfortunately, zsmalloc
> already have used that field as own purpose so there is no room
> for it.
Gioh, I just sent a patch which zsmalloc doesn't use page->mapping
and _mapcount so I think zsmalloc could support compaction with your
patchset. Although zsmalloc can support compaction with it, I still
don't think it's a good that driver have to mark pages mobile via
_mapcount.
I hope you can find another way.
Thanks.
>
>
> > u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
> >
> > u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
> > diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> > index aa8f61c..f693072 100644
> > --- a/include/linux/compaction.h
> > +++ b/include/linux/compaction.h
> > @@ -1,6 +1,9 @@
> > #ifndef _LINUX_COMPACTION_H
> > #define _LINUX_COMPACTION_H
> >
> > +#include <linux/page-flags.h>
> > +#include <linux/pagemap.h>
> > +
> > /* Return values for compact_zone() and try_to_compact_pages() */
> > /* compaction didn't start as it was deferred due to past failures */
> > #define COMPACT_DEFERRED 0
> > @@ -51,6 +54,70 @@ extern void compaction_defer_reset(struct zone *zone, int order,
> > bool alloc_success);
> > extern bool compaction_restarting(struct zone *zone, int order);
> >
> > +static inline bool mobile_page(struct page *page)
> > +{
> > + return page->mapping && (PageMobile(page) || PageBalloon(page));
>
>
> What's the locking rule to test mobile_page?
> Why we need such many checking?
>
> Why we need PageBalloon check?
> You are trying to make generic abstraction for non-LRU page to migrate
> but PageBalloon check in here breaks your goal.
>
> > +}
> > +
> > +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
> > +{
> > + bool ret = false;
> > +
> > + /*
> > + * Avoid burning cycles with pages that are yet under __free_pages(),
> > + * or just got freed under us.
> > + *
> > + * In case we 'win' a race for a mobile page being freed under us and
> > + * raise its refcount preventing __free_pages() from doing its job
> > + * the put_page() at the end of this block will take care of
> > + * release this page, thus avoiding a nasty leakage.
> > + */
>
> Good.
>
> > + if (unlikely(!get_page_unless_zero(page)))
> > + goto out;
> > +
> > + /*
> > + * As mobile pages are not isolated from LRU lists, concurrent
> > + * compaction threads can race against page migration functions
> > + * as well as race against the releasing a page.
>
> How can it race with releasing a page?
> We already get refcount above.
>
> Do you mean page->mapping tearing race?
> If so, zsmalloc should be chaned to hold a lock.
>
>
> > + *
> > + * In order to avoid having an already isolated mobile page
> > + * being (wrongly) re-isolated while it is under migration,
> > + * or to avoid attempting to isolate pages being released,
> > + * lets be sure we have the page lock
> > + * before proceeding with the mobile page isolation steps.
> > + */
> > + if (unlikely(!trylock_page(page)))
> > + goto out_putpage;
> > +
> > + if (!(mobile_page(page) && page->mapping->a_ops->isolatepage))
> > + goto out_not_isolated;
>
> I cannot know how isolate_mobilepage is called by upper layer
> because this patch doesn't include it.
> Anyway, why we need such many checks to prove it's mobile page?
>
> mobile_page() {
> page->mapping && (PageMobile(page) || PageBalloon(page));
> }
>
> Additionally, added page->mapping->a_ops->isolatepage check
>
> Is it possible that a non-LRU page's address space provides
> only page->maping->migratepage without isolate/putback?
>
> Couldn't we make it simple like this?
>
> page->mapping && page->mapping->migratepage
>
> So, couldn't we make mobile_page like this
>
> PageMobile(struct page *page)
> {
> VM_BUG_ON_PAGE(!PageLocked(page), page);
> VM_BUG_ON_PAGE(PageLRU(page), page);
>
> return page->mapping && page->mapping->migratepage;
> }
>
> It will save _mapcount of struct page.
>
> > + ret = page->mapping->a_ops->isolatepage(page, mode);
> > + if (!ret)
> > + goto out_not_isolated;
> > + unlock_page(page);
> > + return ret;
> > +
> > +out_not_isolated:
> > + unlock_page(page);
> > +out_putpage:
> > + put_page(page);
> > +out:
> > + return ret;
> > +}
> > +
> > +static inline void putback_mobilepage(struct page *page)
> > +{
> > + /*
> > + * 'lock_page()' stabilizes the page and prevents races against
>
> What does it mean 'stabilize'?
> Clear comment is always welcome rather than vague word.
>
> > + * concurrent isolation threads attempting to re-isolate it.
> > + */
> > + lock_page(page);
> > + if (page->mapping && page->mapping->a_ops->putbackpage)
> > + page->mapping->a_ops->putbackpage(page);
> > + unlock_page(page);
> > + /* drop the extra ref count taken for mobile page isolation */
> > + put_page(page);
> > +}
> > #else
> > static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
> > unsigned int order, int alloc_flags,
> > @@ -83,6 +150,19 @@ static inline bool compaction_deferred(struct zone *zone, int order)
> > return true;
> > }
> >
> > +static inline bool mobile_page(struct page *page)
> > +{
> > + return false;
> > +}
> > +
> > +static inline bool isolate_mobilepage(struct page *page, isolate_mode_t mode)
> > +{
> > + return false;
> > +}
> > +
> > +static inline void putback_mobilepage(struct page *page)
> > +{
> > +}
> > #endif /* CONFIG_COMPACTION */
> >
> > #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index a0653e5..2cc4b24 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -396,6 +396,8 @@ struct address_space_operations {
> > */
> > int (*migratepage) (struct address_space *,
> > struct page *, struct page *, enum migrate_mode);
> > + bool (*isolatepage) (struct page *, isolate_mode_t);
> > + void (*putbackpage) (struct page *);
> > int (*launder_page) (struct page *);
> > int (*is_partially_uptodate) (struct page *, unsigned long,
> > unsigned long);
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index f34e040..abef145 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -582,6 +582,25 @@ static inline void __ClearPageBalloon(struct page *page)
> > atomic_set(&page->_mapcount, -1);
> > }
> >
> > +#define PAGE_MOBILE_MAPCOUNT_VALUE (-255)
> > +
> > +static inline int PageMobile(struct page *page)
> > +{
> > + return atomic_read(&page->_mapcount) == PAGE_MOBILE_MAPCOUNT_VALUE;
> > +}
> > +
> > +static inline void __SetPageMobile(struct page *page)
> > +{
> > + VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> > + atomic_set(&page->_mapcount, PAGE_MOBILE_MAPCOUNT_VALUE);
> > +}
> > +
> > +static inline void __ClearPageMobile(struct page *page)
> > +{
> > + VM_BUG_ON_PAGE(!PageMobile(page), page);
> > + atomic_set(&page->_mapcount, -1);
> > +}
> > +
> > /*
> > * If network-based swap is enabled, sl*b must keep track of whether pages
> > * were allocated from pfmemalloc reserves.
> > diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
> > index a6c4962..d50d9e8 100644
> > --- a/include/uapi/linux/kernel-page-flags.h
> > +++ b/include/uapi/linux/kernel-page-flags.h
> > @@ -33,6 +33,7 @@
> > #define KPF_THP 22
> > #define KPF_BALLOON 23
> > #define KPF_ZERO_PAGE 24
> > +#define KPF_MOBILE 25
> >
> >
> > #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
> > --
> > 2.1.4
> >