2012-11-07 03:06:43

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v11 0/7] make balloon pages movable by compaction

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch-set follows the main idea discussed at 2012 LSFMMS session:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction & migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue

Following are numbers that prove this patch benefits on allowing compaction
to be more effective at memory ballooned guests.

Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB chunks,
at every minute (inflating/deflating), while test was running:

===BEGIN stress-highalloc

STRESS-HIGHALLOC
highalloc-3.7 highalloc-3.7
rc4-clean rc4-patch
Pass 1 55.00 ( 0.00%) 62.00 ( 7.00%)
Pass 2 54.00 ( 0.00%) 62.00 ( 8.00%)
while Rested 75.00 ( 0.00%) 80.00 ( 5.00%)

MMTests Statistics: duration
3.7 3.7
rc4-clean rc4-patch
User 1207.59 1207.46
System 1300.55 1299.61
Elapsed 2273.72 2157.06

MMTests Statistics: vmstat
3.7 3.7
rc4-clean rc4-patch
Page Ins 3581516 2374368
Page Outs 11148692 10410332
Swap Ins 80 47
Swap Outs 3641 476
Direct pages scanned 37978 33826
Kswapd pages scanned 1828245 1342869
Kswapd pages reclaimed 1710236 1304099
Direct pages reclaimed 32207 31005
Kswapd efficiency 93% 97%
Kswapd velocity 804.077 622.546
Direct efficiency 84% 91%
Direct velocity 16.703 15.682
Percentage direct scans 2% 2%
Page writes by reclaim 79252 9704
Page writes file 75611 9228
Page writes anon 3641 476
Page reclaim immediate 16764 11014
Page rescued immediate 0 0
Slabs scanned 2171904 2152448
Direct inode steals 385 2261
Kswapd inode steals 659137 609670
Kswapd skipped wait 1 69
THP fault alloc 546 631
THP collapse alloc 361 339
THP splits 259 263
THP fault fallback 98 50
THP collapse fail 20 17
Compaction stalls 747 499
Compaction success 244 145
Compaction failures 503 354
Compaction pages moved 370888 474837
Compaction move failure 77378 65259

===END stress-highalloc

Rafael Aquini (7):
mm: adjust address_space_operations.migratepage() return code
mm: redefine address_space.assoc_mapping
mm: introduce a common interface for balloon pages mobility
mm: introduce compaction and migration for ballooned pages
virtio_balloon: introduce migration primitives to balloon pages
mm: introduce putback_movable_pages()
mm: add vm event counters for balloon pages compaction

drivers/virtio/virtio_balloon.c | 136 +++++++++++++++++--
fs/buffer.c | 12 +-
fs/gfs2/glock.c | 2 +-
fs/hugetlbfs/inode.c | 4 +-
fs/inode.c | 2 +-
fs/nilfs2/page.c | 2 +-
include/linux/balloon_compaction.h | 220 ++++++++++++++++++++++++++++++
include/linux/fs.h | 2 +-
include/linux/migrate.h | 19 +++
include/linux/pagemap.h | 16 +++
include/linux/vm_event_item.h | 8 +-
mm/Kconfig | 15 ++
mm/Makefile | 1 +
mm/balloon_compaction.c | 271 +++++++++++++++++++++++++++++++++++++
mm/compaction.c | 27 +++-
mm/migrate.c | 77 +++++++++--
mm/page_alloc.c | 2 +-
mm/vmstat.c | 10 +-
18 files changed, 782 insertions(+), 44 deletions(-)
create mode 100644 include/linux/balloon_compaction.h
create mode 100644 mm/balloon_compaction.c

Change log:
v11:
* Address AKPM's last review suggestions;
* Extend the balloon compaction common API and simplify its usage at driver;
* Minor nitpicks on code commentary;
v10:
* Adjust leak_balloon() wait_event logic to make a clear locking scheme (MST);
* Drop the RCU protection approach for dereferencing balloon's page->mapping;
* Minor nitpitcks on code commentaries (MST);
v9:
* Adjust rcu_dereference usage to leverage page lock protection (Paul, Peter);
* Enhance doc on compaction interface introduced to balloon driver (Michael);
* Fix issue with isolated pages breaking leak_balloon() logics (Michael);
v8:
* introduce a common MM interface for balloon driver page compaction (Michael);
* remove the global state preventing multiple balloon device support (Michael);
* introduce RCU protection/syncrhonization to balloon page->mapping (Michael);
v7:
* fix a potential page leak case at 'putback_balloon_page' (Mel);
* adjust vm-events-counter patch and remove its drop-on-merge message (Rik);
* add 'putback_movable_pages' to avoid hacks on 'putback_lru_pages' (Minchan);
v6:
* rename 'is_balloon_page()' to 'movable_balloon_page()' (Rik);
v5:
* address Andrew Morton's review comments on the patch series;
* address a couple extra nitpick suggestions on PATCH 01 (Minchan);
v4:
* address Rusty Russel's review comments on PATCH 02;
* re-base virtio_balloon patch on 9c378abc5c0c6fc8e3acf5968924d274503819b3;
V3:
* address reviewers nitpick suggestions on PATCH 01 (Mel, Minchan);
V2:
* address Mel Gorman's review comments on PATCH 01;

--
1.7.11.7


2012-11-07 03:06:45

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v11 1/7] mm: adjust address_space_operations.migratepage() return code

This patch introduces MIGRATEPAGE_SUCCESS as the default return code
for address_space_operations.migratepage() method and documents the
expected return code for the same method in failure cases.

Signed-off-by: Rafael Aquini <[email protected]>
---
fs/hugetlbfs/inode.c | 4 ++--
include/linux/migrate.h | 7 +++++++
mm/migrate.c | 22 +++++++++++-----------
3 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c5bc355..bdeda2c 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -608,11 +608,11 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
int rc;

rc = migrate_huge_page_move_mapping(mapping, newpage, page);
- if (rc)
+ if (rc != MIGRATEPAGE_SUCCESS)
return rc;
migrate_page_copy(newpage, page);

- return 0;
+ return MIGRATEPAGE_SUCCESS;
}

static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ce7e667..a4e886d 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -7,6 +7,13 @@

typedef struct page *new_page_t(struct page *, unsigned long private, int **);

+/*
+ * Return values from addresss_space_operations.migratepage():
+ * - negative errno on page migration failure;
+ * - zero on page migration success;
+ */
+#define MIGRATEPAGE_SUCCESS 0
+
#ifdef CONFIG_MIGRATION

extern void putback_lru_pages(struct list_head *l);
diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..98c7a89 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -286,7 +286,7 @@ static int migrate_page_move_mapping(struct address_space *mapping,
/* Anonymous page without mapping */
if (page_count(page) != 1)
return -EAGAIN;
- return 0;
+ return MIGRATEPAGE_SUCCESS;
}

spin_lock_irq(&mapping->tree_lock);
@@ -356,7 +356,7 @@ static int migrate_page_move_mapping(struct address_space *mapping,
}
spin_unlock_irq(&mapping->tree_lock);

- return 0;
+ return MIGRATEPAGE_SUCCESS;
}

/*
@@ -372,7 +372,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
if (!mapping) {
if (page_count(page) != 1)
return -EAGAIN;
- return 0;
+ return MIGRATEPAGE_SUCCESS;
}

spin_lock_irq(&mapping->tree_lock);
@@ -399,7 +399,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
page_unfreeze_refs(page, expected_count - 1);

spin_unlock_irq(&mapping->tree_lock);
- return 0;
+ return MIGRATEPAGE_SUCCESS;
}

/*
@@ -486,11 +486,11 @@ int migrate_page(struct address_space *mapping,

rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode);

- if (rc)
+ if (rc != MIGRATEPAGE_SUCCESS)
return rc;

migrate_page_copy(newpage, page);
- return 0;
+ return MIGRATEPAGE_SUCCESS;
}
EXPORT_SYMBOL(migrate_page);

@@ -513,7 +513,7 @@ int buffer_migrate_page(struct address_space *mapping,

rc = migrate_page_move_mapping(mapping, newpage, page, head, mode);

- if (rc)
+ if (rc != MIGRATEPAGE_SUCCESS)
return rc;

/*
@@ -549,7 +549,7 @@ int buffer_migrate_page(struct address_space *mapping,

} while (bh != head);

- return 0;
+ return MIGRATEPAGE_SUCCESS;
}
EXPORT_SYMBOL(buffer_migrate_page);
#endif
@@ -814,7 +814,7 @@ skip_unmap:
put_anon_vma(anon_vma);

uncharge:
- mem_cgroup_end_migration(mem, page, newpage, rc == 0);
+ mem_cgroup_end_migration(mem, page, newpage, rc == MIGRATEPAGE_SUCCESS);
unlock:
unlock_page(page);
out:
@@ -987,7 +987,7 @@ int migrate_pages(struct list_head *from,
case -EAGAIN:
retry++;
break;
- case 0:
+ case MIGRATEPAGE_SUCCESS:
break;
default:
/* Permanent failure */
@@ -1024,7 +1024,7 @@ int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
/* try again */
cond_resched();
break;
- case 0:
+ case MIGRATEPAGE_SUCCESS:
goto out;
default:
rc = -EIO;
--
1.7.11.7

2012-11-07 03:06:50

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v11 4/7] mm: introduce compaction and migration for ballooned pages

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.

Signed-off-by: Rafael Aquini <[email protected]>
---
mm/compaction.c | 21 +++++++++++++++++++--
mm/migrate.c | 36 ++++++++++++++++++++++++++++++++++--
2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9eef558..76abd84 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,6 +14,7 @@
#include <linux/backing-dev.h>
#include <linux/sysctl.h>
#include <linux/sysfs.h>
+#include <linux/balloon_compaction.h>
#include "internal.h"

#if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -565,9 +566,24 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
goto next_pageblock;
}

- /* Check may be lockless but that's ok as we recheck later */
- if (!PageLRU(page))
+ /*
+ * Check may be lockless but that's ok as we recheck later.
+ * It's possible to migrate LRU pages and balloon pages
+ * Skip any other type of page
+ */
+ if (!PageLRU(page)) {
+ if (unlikely(balloon_page_movable(page))) {
+ if (locked && balloon_page_isolate(page)) {
+ /* Successfully isolated */
+ cc->finished_update_migrate = true;
+ list_add(&page->lru, migratelist);
+ cc->nr_migratepages++;
+ nr_isolated++;
+ goto check_compact_cluster;
+ }
+ }
continue;
+ }

/*
* PageLRU is set. lru_lock normally excludes isolation
@@ -621,6 +637,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
cc->nr_migratepages++;
nr_isolated++;

+check_compact_cluster:
/* Avoid isolating too much */
if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
++low_pfn;
diff --git a/mm/migrate.c b/mm/migrate.c
index 98c7a89..87ffe54 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -35,6 +35,7 @@
#include <linux/hugetlb.h>
#include <linux/hugetlb_cgroup.h>
#include <linux/gfp.h>
+#include <linux/balloon_compaction.h>

#include <asm/tlbflush.h>

@@ -79,7 +80,10 @@ void putback_lru_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- putback_lru_page(page);
+ if (unlikely(balloon_page_movable(page)))
+ balloon_page_putback(page);
+ else
+ putback_lru_page(page);
}
}

@@ -778,6 +782,18 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
}
}

+ if (unlikely(balloon_page_movable(page))) {
+ /*
+ * A ballooned page does not need any special attention from
+ * physical to virtual reverse mapping procedures.
+ * Skip any attempt to unmap PTEs or to remap swap cache,
+ * in order to avoid burning cycles at rmap level, and perform
+ * the page migration right away (proteced by page lock).
+ */
+ rc = balloon_page_migrate(newpage, page, mode);
+ goto uncharge;
+ }
+
/*
* Corner case handling:
* 1. When a new swap-cache page is read into, it is added to the LRU
@@ -814,7 +830,9 @@ skip_unmap:
put_anon_vma(anon_vma);

uncharge:
- mem_cgroup_end_migration(mem, page, newpage, rc == MIGRATEPAGE_SUCCESS);
+ mem_cgroup_end_migration(mem, page, newpage,
+ (rc == MIGRATEPAGE_SUCCESS ||
+ rc == MIGRATEPAGE_BALLOON_SUCCESS));
unlock:
unlock_page(page);
out:
@@ -846,6 +864,20 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
goto out;

rc = __unmap_and_move(page, newpage, force, offlining, mode);
+
+ if (unlikely(rc == MIGRATEPAGE_BALLOON_SUCCESS)) {
+ /*
+ * A ballooned page has been migrated already.
+ * Now, it's the time to remove the old page from the isolated
+ * pageset list and handle it back to Buddy, wrap-up counters
+ * and return.
+ */
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+ put_page(page);
+ __free_page(page);
+ return 0;
+ }
out:
if (rc != -EAGAIN) {
/*
--
1.7.11.7

2012-11-07 03:06:57

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme, in order to
enhance the syncronization methods for accessing elements of struct
virtio_balloon, thus providing protection against concurrent access
introduced by parallel memory migration threads.

- balloon_lock (mutex) : synchronizes the access demand to elements of
struct virtio_balloon and its queue operations;

Signed-off-by: Rafael Aquini <[email protected]>
---
drivers/virtio/virtio_balloon.c | 135 ++++++++++++++++++++++++++++++++++++----
1 file changed, 123 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..69eede7 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,7 @@
#include <linux/delay.h>
#include <linux/slab.h>
#include <linux/module.h>
+#include <linux/balloon_compaction.h>

/*
* Balloon device works in 4K page units. So each page is pointed to by
@@ -34,6 +35,7 @@
* page units.
*/
#define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
+#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256

struct virtio_balloon
{
@@ -52,15 +54,19 @@ struct virtio_balloon
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
/*
- * The pages we've told the Host we're not using.
+ * The pages we've told the Host we're not using are enqueued
+ * at vb_dev_info->pages list.
* Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
* to num_pages above.
*/
- struct list_head pages;
+ struct balloon_dev_info *vb_dev_info;
+
+ /* Synchronize access/update to this struct virtio_balloon elements */
+ struct mutex balloon_lock;

/* The array of pfns we tell the Host about. */
unsigned int num_pfns;
- u32 pfns[256];
+ u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];

/* Memory statistics */
int need_stats_update;
@@ -122,18 +128,25 @@ static void set_page_pfns(u32 pfns[], struct page *page)

static void fill_balloon(struct virtio_balloon *vb, size_t num)
{
+ struct balloon_dev_info *vb_dev_info = vb->vb_dev_info;
+
+ static DEFINE_RATELIMIT_STATE(fill_balloon_rs,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));

+ mutex_lock(&vb->balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
- __GFP_NOMEMALLOC | __GFP_NOWARN);
+ struct page *page = balloon_page_enqueue(vb_dev_info);
+
if (!page) {
- if (printk_ratelimit())
+ if (__ratelimit(&fill_balloon_rs))
dev_printk(KERN_INFO, &vb->vdev->dev,
"Out of puff! Can't get %zu pages\n",
- num);
+ VIRTIO_BALLOON_PAGES_PER_PAGE);
/* Sleep for at least 1/5 of a second before retry. */
msleep(200);
break;
@@ -141,7 +154,6 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
- list_add(&page->lru, &vb->pages);
}

/* Didn't get any? Oh well. */
@@ -149,6 +161,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
return;

tell_host(vb, vb->inflate_vq);
+ mutex_unlock(&vb->balloon_lock);
}

static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -165,14 +178,17 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
static void leak_balloon(struct virtio_balloon *vb, size_t num)
{
struct page *page;
+ struct balloon_dev_info *vb_dev_info = vb->vb_dev_info;

/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));

+ mutex_lock(&vb->balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- page = list_first_entry(&vb->pages, struct page, lru);
- list_del(&page->lru);
+ page = balloon_page_dequeue(vb_dev_info);
+ if (!page)
+ break;
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
@@ -183,6 +199,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
* is true, we *have* to do it in this order
*/
tell_host(vb, vb->deflate_vq);
+ mutex_unlock(&vb->balloon_lock);
release_pages_by_pfn(vb->pfns, vb->num_pfns);
}

@@ -339,9 +356,76 @@ static int init_vqs(struct virtio_balloon *vb)
return 0;
}

+static const struct address_space_operations virtio_balloon_aops;
+#ifdef CONFIG_BALLOON_COMPACTION
+/*
+ * virtballoon_migratepage - perform the balloon page migration on behalf of
+ * a compation thread. (called under page lock)
+ * @mapping: the page->mapping which will be assigned to the new migrated page.
+ * @newpage: page that will replace the isolated page after migration finishes.
+ * @page : the isolated (old) page that is about to be migrated to newpage.
+ * @mode : compaction mode -- not used for balloon page migration.
+ *
+ * After a ballooned page gets isolated by compaction procedures, this is the
+ * function that performs the page migration on behalf of a compaction thread
+ * The page migration for virtio balloon is done in a simple swap fashion which
+ * follows these two macro steps:
+ * 1) insert newpage into vb->pages list and update the host about it;
+ * 2) update the host about the old page removed from vb->pages list;
+ *
+ * This function preforms the balloon page migration task.
+ * Called through balloon_mapping->a_ops->migratepage
+ */
+int virtballoon_migratepage(struct address_space *mapping,
+ struct page *newpage, struct page *page, enum migrate_mode mode)
+{
+ struct balloon_dev_info *vb_dev_info = balloon_page_device(page);
+ struct virtio_balloon *vb;
+ unsigned long flags;
+
+ BUG_ON(!vb_dev_info);
+
+ vb = vb_dev_info->balloon_device;
+
+ if (!mutex_trylock(&vb->balloon_lock))
+ return -EAGAIN;
+
+ /* balloon's page migration 1st step -- inflate "newpage" */
+ spin_lock_irqsave(&vb_dev_info->pages_lock, flags);
+ balloon_page_insert(newpage, mapping, &vb_dev_info->pages);
+ vb_dev_info->isolated_pages--;
+ spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
+ vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+ set_page_pfns(vb->pfns, newpage);
+ tell_host(vb, vb->inflate_vq);
+
+ /*
+ * balloon's page migration 2nd step -- deflate "page"
+ *
+ * It's safe to delete page->lru here because this page is at
+ * an isolated migration list, and this step is expected to happen here
+ */
+ balloon_page_delete(page);
+ vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+ set_page_pfns(vb->pfns, page);
+ tell_host(vb, vb->deflate_vq);
+
+ mutex_unlock(&vb->balloon_lock);
+
+ return MIGRATEPAGE_BALLOON_SUCCESS;
+}
+
+/* define the balloon_mapping->a_ops callback to allow balloon page migration */
+static const struct address_space_operations virtio_balloon_aops = {
+ .migratepage = virtballoon_migratepage,
+};
+#endif /* CONFIG_BALLOON_COMPACTION */
+
static int virtballoon_probe(struct virtio_device *vdev)
{
struct virtio_balloon *vb;
+ struct address_space *vb_mapping;
+ struct balloon_dev_info *vb_devinfo;
int err;

vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
@@ -350,16 +434,37 @@ static int virtballoon_probe(struct virtio_device *vdev)
goto out;
}

- INIT_LIST_HEAD(&vb->pages);
vb->num_pages = 0;
+ mutex_init(&vb->balloon_lock);
init_waitqueue_head(&vb->config_change);
init_waitqueue_head(&vb->acked);
vb->vdev = vdev;
vb->need_stats_update = 0;

+ vb_devinfo = balloon_devinfo_alloc(vb);
+ if (IS_ERR(vb_devinfo)) {
+ err = PTR_ERR(vb_devinfo);
+ goto out_free_vb;
+ }
+
+ vb_mapping = balloon_mapping_alloc(vb_devinfo,
+ (balloon_compaction_check()) ?
+ &virtio_balloon_aops : NULL);
+ if (IS_ERR(vb_mapping)) {
+ /*
+ * IS_ERR(vb_mapping) && PTR_ERR(vb_mapping) == -EOPNOTSUPP
+ * This means !CONFIG_BALLOON_COMPACTION, otherwise we get off.
+ */
+ err = PTR_ERR(vb_mapping);
+ if (err != -EOPNOTSUPP)
+ goto out_free_vb_devinfo;
+ }
+
+ vb->vb_dev_info = vb_devinfo;
+
err = init_vqs(vb);
if (err)
- goto out_free_vb;
+ goto out_free_vb_mapping;

vb->thread = kthread_run(balloon, vb, "vballoon");
if (IS_ERR(vb->thread)) {
@@ -371,6 +476,10 @@ static int virtballoon_probe(struct virtio_device *vdev)

out_del_vqs:
vdev->config->del_vqs(vdev);
+out_free_vb_mapping:
+ balloon_mapping_free(vb_mapping);
+out_free_vb_devinfo:
+ balloon_devinfo_free(vb_devinfo);
out_free_vb:
kfree(vb);
out:
@@ -396,6 +505,8 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)

kthread_stop(vb->thread);
remove_common(vb);
+ balloon_mapping_free(vb->vb_dev_info->mapping);
+ balloon_devinfo_free(vb->vb_dev_info);
kfree(vb);
}

--
1.7.11.7

2012-11-07 03:07:04

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v11 7/7] mm: add vm event counters for balloon pages compaction

This patch introduces a new set of vm event counters to keep track of
ballooned pages compaction activity.

Signed-off-by: Rafael Aquini <[email protected]>
---
drivers/virtio/virtio_balloon.c | 1 +
include/linux/vm_event_item.h | 8 +++++++-
mm/balloon_compaction.c | 2 ++
mm/migrate.c | 1 +
mm/vmstat.c | 10 +++++++++-
5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 69eede7..3756fc1 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -411,6 +411,7 @@ int virtballoon_migratepage(struct address_space *mapping,
tell_host(vb, vb->deflate_vq);

mutex_unlock(&vb->balloon_lock);
+ balloon_event_count(COMPACTBALLOONMIGRATED);

return MIGRATEPAGE_BALLOON_SUCCESS;
}
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 3d31145..cbd72fc 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,7 +41,13 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
-#endif
+#ifdef CONFIG_BALLOON_COMPACTION
+ COMPACTBALLOONISOLATED, /* isolated from balloon pagelist */
+ COMPACTBALLOONMIGRATED, /* balloon page sucessfully migrated */
+ COMPACTBALLOONRELEASED, /* old-page released after migration */
+ COMPACTBALLOONRETURNED, /* putback to pagelist, not-migrated */
+#endif /* CONFIG_BALLOON_COMPACTION */
+#endif /* CONFIG_COMPACTION */
#ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
#endif
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 90935aa..32927eb 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -215,6 +215,7 @@ bool balloon_page_isolate(struct page *page)
if (__is_movable_balloon_page(page) &&
page_count(page) == 2) {
__isolate_balloon_page(page);
+ balloon_event_count(COMPACTBALLOONISOLATED);
unlock_page(page);
return true;
}
@@ -237,6 +238,7 @@ void balloon_page_putback(struct page *page)
if (__is_movable_balloon_page(page)) {
__putback_balloon_page(page);
put_page(page);
+ balloon_event_count(COMPACTBALLOONRETURNED);
} else {
__WARN();
dump_page(page);
diff --git a/mm/migrate.c b/mm/migrate.c
index adb3d44..ee3037d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -896,6 +896,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
page_is_file_cache(page));
put_page(page);
__free_page(page);
+ balloon_event_count(COMPACTBALLOONRELEASED);
return 0;
}
out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..1363edc 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -781,7 +781,15 @@ const char * const vmstat_text[] = {
"compact_stall",
"compact_fail",
"compact_success",
-#endif
+
+#ifdef CONFIG_BALLOON_COMPACTION
+ "compact_balloon_isolated",
+ "compact_balloon_migrated",
+ "compact_balloon_released",
+ "compact_balloon_returned",
+#endif /* CONFIG_BALLOON_COMPACTION */
+
+#endif /* CONFIG_COMPACTION */

#ifdef CONFIG_HUGETLB_PAGE
"htlb_buddy_alloc_success",
--
1.7.11.7

2012-11-07 03:06:55

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v11 3/7] mm: introduce a common interface for balloon pages mobility

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces a common interface to help a balloon driver on
making its page set movable to compaction, and thus allowing the system
to better leverage the compation efforts on memory defragmentation.

Signed-off-by: Rafael Aquini <[email protected]>
---
include/linux/balloon_compaction.h | 220 ++++++++++++++++++++++++++++++
include/linux/migrate.h | 10 ++
include/linux/pagemap.h | 16 +++
mm/Kconfig | 15 +++
mm/Makefile | 1 +
mm/balloon_compaction.c | 269 +++++++++++++++++++++++++++++++++++++
6 files changed, 531 insertions(+)
create mode 100644 include/linux/balloon_compaction.h
create mode 100644 mm/balloon_compaction.c

diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
new file mode 100644
index 0000000..1865bd5
--- /dev/null
+++ b/include/linux/balloon_compaction.h
@@ -0,0 +1,220 @@
+/*
+ * include/linux/balloon_compaction.h
+ *
+ * Common interface definitions for making balloon pages movable to compaction.
+ *
+ * Copyright (C) 2012, Red Hat, Inc. Rafael Aquini <[email protected]>
+ */
+#ifndef _LINUX_BALLOON_COMPACTION_H
+#define _LINUX_BALLOON_COMPACTION_H
+#include <linux/pagemap.h>
+#include <linux/migrate.h>
+#include <linux/gfp.h>
+#include <linux/err.h>
+
+/*
+ * Balloon device information descriptor.
+ * This struct is used to allow the common balloon compaction interface
+ * procedures to find the proper balloon device holding memory pages they'll
+ * have to cope for page compaction / migration, as well as it serves the
+ * balloon driver as a page book-keeper for its registered balloon devices.
+ */
+struct balloon_dev_info {
+ void *balloon_device; /* balloon device descriptor */
+ struct address_space *mapping; /* balloon special page->mapping */
+ unsigned long isolated_pages; /* # of isolated pages for migration */
+ spinlock_t pages_lock; /* Protection to pages list */
+ struct list_head pages; /* Pages enqueued & handled to Host */
+};
+
+extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
+extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
+extern struct balloon_dev_info *balloon_devinfo_alloc(
+ void *balloon_dev_descriptor);
+
+static inline void balloon_devinfo_free(struct balloon_dev_info *b_dev_info)
+{
+ kfree(b_dev_info);
+}
+
+#ifdef CONFIG_BALLOON_COMPACTION
+extern bool balloon_page_isolate(struct page *page);
+extern void balloon_page_putback(struct page *page);
+extern int balloon_page_migrate(struct page *newpage,
+ struct page *page, enum migrate_mode mode);
+extern struct address_space
+*balloon_mapping_alloc(struct balloon_dev_info *b_dev_info,
+ const struct address_space_operations *a_ops);
+
+static inline void balloon_mapping_free(struct address_space *balloon_mapping)
+{
+ kfree(balloon_mapping);
+}
+
+/*
+ * balloon_page_insert - insert a page into the balloon's page list and make
+ * the page->mapping assignment accordingly.
+ * @page : page to be assigned as a 'balloon page'
+ * @mapping : allocated special 'balloon_mapping'
+ * @head : balloon's device page list head
+ */
+static inline void balloon_page_insert(struct page *page,
+ struct address_space *mapping,
+ struct list_head *head)
+{
+ list_add(&page->lru, head);
+ /*
+ * Make sure the page is already inserted on balloon's page list
+ * before assigning its ->mapping.
+ */
+ smp_wmb();
+ page->mapping = mapping;
+}
+
+/*
+ * balloon_page_delete - clear the page->mapping and delete the page from
+ * balloon's page list accordingly.
+ * @page : page to be released from balloon's page list
+ */
+static inline void balloon_page_delete(struct page *page)
+{
+ page->mapping = NULL;
+ /*
+ * Make sure page->mapping is cleared before we proceed with
+ * balloon's page list deletion.
+ */
+ smp_wmb();
+ list_del(&page->lru);
+}
+
+/*
+ * __is_movable_balloon_page - helper to perform @page mapping->flags tests
+ */
+static inline bool __is_movable_balloon_page(struct page *page)
+{
+ /*
+ * we might attempt to read ->mapping concurrently to other
+ * threads trying to write to it.
+ */
+ struct address_space *mapping = ACCESS_ONCE(page->mapping);
+ smp_read_barrier_depends();
+ return mapping_balloon(mapping);
+}
+
+/*
+ * balloon_page_movable - test page->mapping->flags to identify balloon pages
+ * that can be moved by compaction/migration.
+ *
+ * This function is used at core compaction's page isolation scheme, therefore
+ * most pages exposed to it are not enlisted as balloon pages and so, to avoid
+ * undesired side effects like racing against __free_pages(), we cannot afford
+ * holding the page locked while testing page->mapping->flags here.
+ *
+ * As we might return false positives in the case of a balloon page being just
+ * released under us, the page->mapping->flags need to be re-tested later,
+ * under the proper page lock, at the functions that will be coping with the
+ * balloon page case.
+ */
+static inline bool balloon_page_movable(struct page *page)
+{
+ /*
+ * Before dereferencing and testing mapping->flags, lets make sure
+ * this is not a page that uses ->mapping in a different way
+ */
+ if (!PageSlab(page) && !PageSwapCache(page) && !PageAnon(page) &&
+ !page_mapped(page))
+ return __is_movable_balloon_page(page);
+
+ return false;
+}
+
+/*
+ * balloon_page_device - get the b_dev_info descriptor for the balloon device
+ * that enqueues the given page.
+ */
+static inline struct balloon_dev_info *balloon_page_device(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ if (likely(mapping))
+ return mapping->private_data;
+
+ return NULL;
+}
+
+static inline gfp_t balloon_mapping_gfp_mask(void)
+{
+ return GFP_HIGHUSER_MOVABLE;
+}
+
+static inline void balloon_event_count(enum vm_event_item item)
+{
+ count_vm_event(item);
+}
+
+static inline bool balloon_compaction_check(void)
+{
+ return true;
+}
+
+#else /* !CONFIG_BALLOON_COMPACTION */
+
+static inline void *balloon_mapping_alloc(void *balloon_device,
+ const struct address_space_operations *a_ops)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
+static inline void balloon_mapping_free(struct address_space *balloon_mapping)
+{
+ return;
+}
+
+static inline void balloon_page_insert(struct page *page,
+ struct address_space *mapping,
+ struct list_head *head)
+{
+ list_add(&page->lru, head);
+}
+
+static inline void balloon_page_delete(struct page *page)
+{
+ list_del(&page->lru);
+}
+
+static inline bool balloon_page_movable(struct page *page)
+{
+ return false;
+}
+
+static inline bool balloon_page_isolate(struct page *page)
+{
+ return false;
+}
+
+static inline void balloon_page_putback(struct page *page)
+{
+ return;
+}
+
+static inline int balloon_page_migrate(struct page *newpage,
+ struct page *page, enum migrate_mode mode)
+{
+ return 0;
+}
+
+static inline gfp_t balloon_mapping_gfp_mask(void)
+{
+ return GFP_HIGHUSER;
+}
+
+static inline void balloon_event_count(enum vm_event_item item)
+{
+ return;
+}
+
+static inline bool balloon_compaction_check(void)
+{
+ return false;
+}
+#endif /* CONFIG_BALLOON_COMPACTION */
+#endif /* _LINUX_BALLOON_COMPACTION_H */
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index a4e886d..e570c3c 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -11,8 +11,18 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
* Return values from addresss_space_operations.migratepage():
* - negative errno on page migration failure;
* - zero on page migration success;
+ *
+ * The balloon page migration introduces this special case where a 'distinct'
+ * return code is used to flag a successful page migration to unmap_and_move().
+ * This approach is necessary because page migration can race against balloon
+ * deflation procedure, and for such case we could introduce a nasty page leak
+ * if a successfully migrated balloon page gets released concurrently with
+ * migration's unmap_and_move() wrap-up steps.
*/
#define MIGRATEPAGE_SUCCESS 0
+#define MIGRATEPAGE_BALLOON_SUCCESS 1 /* special ret code for balloon page
+ * sucessfull migration case.
+ */

#ifdef CONFIG_MIGRATION

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e42c762..6da609d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -24,6 +24,7 @@ enum mapping_flags {
AS_ENOSPC = __GFP_BITS_SHIFT + 1, /* ENOSPC on async write */
AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
+ AS_BALLOON_MAP = __GFP_BITS_SHIFT + 4, /* balloon page special map */
};

static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -53,6 +54,21 @@ static inline int mapping_unevictable(struct address_space *mapping)
return !!mapping;
}

+static inline void mapping_set_balloon(struct address_space *mapping)
+{
+ set_bit(AS_BALLOON_MAP, &mapping->flags);
+}
+
+static inline void mapping_clear_balloon(struct address_space *mapping)
+{
+ clear_bit(AS_BALLOON_MAP, &mapping->flags);
+}
+
+static inline int mapping_balloon(struct address_space *mapping)
+{
+ return mapping && test_bit(AS_BALLOON_MAP, &mapping->flags);
+}
+
static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
{
return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
diff --git a/mm/Kconfig b/mm/Kconfig
index a3f8ddd..b119172 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -188,6 +188,21 @@ config SPLIT_PTLOCK_CPUS
default "4"

#
+# support for memory balloon compaction
+config BALLOON_COMPACTION
+ bool "Allow for balloon memory compaction/migration"
+ select COMPACTION
+ depends on VIRTIO_BALLOON
+ help
+ Memory fragmentation introduced by ballooning might reduce
+ significantly the number of 2MB contiguous memory blocks that can be
+ used within a guest, thus imposing performance penalties associated
+ with the reduced number of transparent huge pages that could be used
+ by the guest workload. Allowing the compaction & migration for memory
+ pages enlisted as being part of memory balloon devices avoids the
+ scenario aforementioned and helps improving memory defragmentation.
+
+#
# support for memory compaction
config COMPACTION
bool "Allow for memory compaction"
diff --git a/mm/Makefile b/mm/Makefile
index 6b025f8..21e10ee 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -57,3 +57,4 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
+obj-$(CONFIG_BALLOON_COMPACTION) += balloon_compaction.o
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
new file mode 100644
index 0000000..90935aa
--- /dev/null
+++ b/mm/balloon_compaction.c
@@ -0,0 +1,269 @@
+/*
+ * mm/balloon_compaction.c
+ *
+ * Common interface for making balloon pages movable to compaction.
+ *
+ * Copyright (C) 2012, Red Hat, Inc. Rafael Aquini <[email protected]>
+ */
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/export.h>
+#include <linux/balloon_compaction.h>
+
+/*
+ * balloon_devinfo_alloc - allocates a balloon device information descriptor.
+ * @balloon_dev_descriptor: pointer to reference the balloon device which
+ * this struct balloon_dev_info will be servicing.
+ *
+ * Driver must call it to properly allocate and initialize an instance of
+ * struct balloon_dev_info which will be used to reference a balloon device
+ * as well as to keep track of the balloon device page list.
+ */
+struct balloon_dev_info *balloon_devinfo_alloc(void *balloon_dev_descriptor)
+{
+ struct balloon_dev_info *b_dev_info;
+ b_dev_info = kmalloc(sizeof(*b_dev_info), GFP_KERNEL);
+ if (!b_dev_info)
+ return ERR_PTR(-ENOMEM);
+
+ b_dev_info->balloon_device = balloon_dev_descriptor;
+ b_dev_info->mapping = NULL;
+ b_dev_info->isolated_pages = 0;
+ spin_lock_init(&b_dev_info->pages_lock);
+ INIT_LIST_HEAD(&b_dev_info->pages);
+
+ return b_dev_info;
+}
+EXPORT_SYMBOL_GPL(balloon_devinfo_alloc);
+
+/*
+ * balloon_page_enqueue - allocates a new page and inserts it into the balloon
+ * page list.
+ * @b_dev_info: balloon device decriptor where we will insert a new page to
+ *
+ * Driver must call it to properly allocate a new enlisted balloon page
+ * before definetively removing it from the guest system.
+ * This function returns the page address for the recently enqueued page or
+ * NULL in the case we fail to allocate a new page this turn.
+ */
+struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info)
+{
+ unsigned long flags;
+ struct page *page = alloc_page(balloon_mapping_gfp_mask() |
+ __GFP_NOMEMALLOC | __GFP_NORETRY);
+ if (!page)
+ return NULL;
+
+ BUG_ON(!trylock_page(page));
+ spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+ balloon_page_insert(page, b_dev_info->mapping, &b_dev_info->pages);
+ spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+ unlock_page(page);
+ return page;
+}
+EXPORT_SYMBOL_GPL(balloon_page_enqueue);
+
+/*
+ * balloon_page_dequeue - removes a page from balloon's page list and returns
+ * the its address to allow the driver release the page.
+ * @b_dev_info: balloon device decriptor where we will grab a page from.
+ *
+ * Driver must call it to properly de-allocate a previous enlisted balloon page
+ * before definetively releasing it back to the guest system.
+ * This function returns the page address for the recently dequeued page or
+ * NULL in the case we find balloon's page list temporarely empty due to
+ * compaction isolated pages.
+ */
+struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
+{
+ struct page *page, *tmp;
+ unsigned long flags;
+ bool dequeued_page;
+
+ dequeued_page = false;
+ list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
+ if (trylock_page(page)) {
+ spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+ balloon_page_delete(page);
+ spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+ unlock_page(page);
+ dequeued_page = true;
+ break;
+ }
+ }
+
+ if (!dequeued_page) {
+ spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+ if (unlikely(list_empty(&b_dev_info->pages) &&
+ !b_dev_info->isolated_pages))
+ BUG();
+ spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+ page = NULL;
+ }
+ return page;
+}
+EXPORT_SYMBOL_GPL(balloon_page_dequeue);
+
+#ifdef CONFIG_BALLOON_COMPACTION
+/*
+ * balloon_mapping_alloc - allocates a special ->mapping for ballooned pages.
+ * @b_dev_info: holds the balloon device information descriptor.
+ * @a_ops: balloon_mapping address_space_operations descriptor.
+ *
+ * Driver must call it to properly allocate and initialize an instance of
+ * struct address_space which will be used as the special page->mapping for
+ * balloon device enlisted page instances.
+ */
+struct address_space *balloon_mapping_alloc(struct balloon_dev_info *b_dev_info,
+ const struct address_space_operations *a_ops)
+{
+ struct address_space *mapping;
+
+ mapping = kmalloc(sizeof(*mapping), GFP_KERNEL);
+ if (!mapping)
+ return ERR_PTR(-ENOMEM);
+
+ /*
+ * Give a clean 'zeroed' status to all elements of this special
+ * balloon page->mapping struct address_space instance.
+ */
+ address_space_init_once(mapping);
+
+ /*
+ * Set mapping->flags appropriately, to allow balloon pages
+ * ->mapping identification.
+ */
+ mapping_set_balloon(mapping);
+ mapping_set_gfp_mask(mapping, balloon_mapping_gfp_mask());
+
+ /* balloon's page->mapping->a_ops callback descriptor */
+ mapping->a_ops = a_ops;
+
+ /*
+ * Establish a pointer reference back to the balloon device descriptor
+ * this particular page->mapping will be servicing.
+ * This is used by compaction / migration procedures to identify and
+ * access the balloon device pageset while isolating / migrating pages.
+ *
+ * As some balloon drivers can register multiple balloon devices
+ * for a single guest, this also helps compaction / migration to
+ * properly deal with multiple balloon pagesets, when required.
+ */
+ mapping->private_data = b_dev_info;
+ b_dev_info->mapping = mapping;
+
+ return mapping;
+}
+EXPORT_SYMBOL_GPL(balloon_mapping_alloc);
+
+static inline void __isolate_balloon_page(struct page *page)
+{
+ struct balloon_dev_info *b_dev_info = page->mapping->private_data;
+ unsigned long flags;
+ spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+ list_del(&page->lru);
+ b_dev_info->isolated_pages++;
+ spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+}
+
+static inline void __putback_balloon_page(struct page *page)
+{
+ struct balloon_dev_info *b_dev_info = page->mapping->private_data;
+ unsigned long flags;
+ spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+ list_add(&page->lru, &b_dev_info->pages);
+ b_dev_info->isolated_pages--;
+ spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+}
+
+static inline int __migrate_balloon_page(struct address_space *mapping,
+ struct page *newpage, struct page *page, enum migrate_mode mode)
+{
+ return page->mapping->a_ops->migratepage(mapping, newpage, page, mode);
+}
+
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool balloon_page_isolate(struct page *page)
+{
+ /*
+ * Avoid burning cycles with pages that are yet under __free_pages(),
+ * or just got freed under us.
+ *
+ * In case we 'win' a race for a balloon page being freed under us and
+ * raise its refcount preventing __free_pages() from doing its job
+ * the put_page() at the end of this block will take care of
+ * release this page, thus avoiding a nasty leakage.
+ */
+ if (likely(get_page_unless_zero(page))) {
+ /*
+ * As balloon pages are not isolated from LRU lists, concurrent
+ * compaction threads can race against page migration functions
+ * as well as race against the balloon driver releasing a page.
+ *
+ * In order to avoid having an already isolated balloon page
+ * being (wrongly) re-isolated while it is under migration,
+ * or to avoid attempting to isolate pages being released by
+ * the balloon driver, lets be sure we have the page lock
+ * before proceeding with the balloon page isolation steps.
+ */
+ if (likely(trylock_page(page))) {
+ /*
+ * A ballooned page, by default, has just one refcount.
+ * Prevent concurrent compaction threads from isolating
+ * an already isolated balloon page by refcount check.
+ */
+ if (__is_movable_balloon_page(page) &&
+ page_count(page) == 2) {
+ __isolate_balloon_page(page);
+ unlock_page(page);
+ return true;
+ }
+ unlock_page(page);
+ }
+ put_page(page);
+ }
+ return false;
+}
+
+/* putback_lru_page() counterpart for a ballooned page */
+void balloon_page_putback(struct page *page)
+{
+ /*
+ * 'lock_page()' stabilizes the page and prevents races against
+ * concurrent isolation threads attempting to re-isolate it.
+ */
+ lock_page(page);
+
+ if (__is_movable_balloon_page(page)) {
+ __putback_balloon_page(page);
+ put_page(page);
+ } else {
+ __WARN();
+ dump_page(page);
+ }
+ unlock_page(page);
+}
+
+/* move_to_new_page() counterpart for a ballooned page */
+int balloon_page_migrate(struct page *newpage,
+ struct page *page, enum migrate_mode mode)
+{
+ struct address_space *mapping;
+ int rc = -EAGAIN;
+
+ BUG_ON(!trylock_page(newpage));
+
+ if (WARN_ON(!__is_movable_balloon_page(page))) {
+ dump_page(page);
+ unlock_page(newpage);
+ return rc;
+ }
+
+ mapping = page->mapping;
+ if (mapping)
+ rc = __migrate_balloon_page(mapping, newpage, page, mode);
+
+ unlock_page(newpage);
+ return rc;
+}
+#endif /* CONFIG_BALLOON_COMPACTION */
--
1.7.11.7

2012-11-07 03:07:51

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v11 6/7] mm: introduce putback_movable_pages()

The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
hacks around putback_lru_pages() in order to allow ballooned pages to be
re-inserted on balloon page list as if a ballooned page was like a LRU page.

As ballooned pages are not legitimate LRU pages, this patch introduces
putback_movable_pages() to properly cope with cases where the isolated
pageset contains ballooned pages and LRU pages, thus fixing the mentioned
inelegant hack around putback_lru_pages().

Signed-off-by: Rafael Aquini <[email protected]>
---
include/linux/migrate.h | 2 ++
mm/compaction.c | 6 +++---
mm/migrate.c | 20 ++++++++++++++++++++
mm/page_alloc.c | 2 +-
4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e570c3c..ff074a4 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -27,6 +27,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
#ifdef CONFIG_MIGRATION

extern void putback_lru_pages(struct list_head *l);
+extern void putback_movable_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
extern int migrate_pages(struct list_head *l, new_page_t x,
@@ -50,6 +51,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
#else

static inline void putback_lru_pages(struct list_head *l) {}
+static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 76abd84..f268bd8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -995,7 +995,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_PARTIAL;
- putback_lru_pages(&cc->migratepages);
+ putback_movable_pages(&cc->migratepages);
cc->nr_migratepages = 0;
goto out;
case ISOLATE_NONE:
@@ -1018,9 +1018,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
nr_remaining);

- /* Release LRU pages not migrated */
+ /* Release isolated pages not migrated */
if (err) {
- putback_lru_pages(&cc->migratepages);
+ putback_movable_pages(&cc->migratepages);
cc->nr_migratepages = 0;
if (err == -ENOMEM) {
ret = COMPACT_PARTIAL;
diff --git a/mm/migrate.c b/mm/migrate.c
index 87ffe54..adb3d44 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -80,6 +80,26 @@ void putback_lru_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
+ putback_lru_page(page);
+ }
+}
+
+/*
+ * Put previously isolated pages back onto the appropriate lists
+ * from where they were once taken off for compaction/migration.
+ *
+ * This function shall be used instead of putback_lru_pages(),
+ * whenever the isolated pageset has been built by isolate_migratepages_range()
+ */
+void putback_movable_pages(struct list_head *l)
+{
+ struct page *page;
+ struct page *page2;
+
+ list_for_each_entry_safe(page, page2, l, lru) {
+ list_del(&page->lru);
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
if (unlikely(balloon_page_movable(page)))
balloon_page_putback(page);
else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..1cb0f93 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5710,7 +5710,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
0, false, MIGRATE_SYNC);
}

- putback_lru_pages(&cc->migratepages);
+ putback_movable_pages(&cc->migratepages);
return ret > 0 ? 0 : ret;
}

--
1.7.11.7

2012-11-07 03:06:43

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v11 2/7] mm: redefine address_space.assoc_mapping

This patch overhauls struct address_space.assoc_mapping renaming it to
address_space.private_data and its type is redefined to void*.
By this approach we consistently name the .private_* elements from
struct address_space as well as allow extended usage for address_space
association with other data structures through ->private_data.

Also, all users of old ->assoc_mapping element are converted to reflect
its new name and type change.

Signed-off-by: Rafael Aquini <[email protected]>
---
fs/buffer.c | 12 ++++++------
fs/gfs2/glock.c | 2 +-
fs/inode.c | 2 +-
fs/nilfs2/page.c | 2 +-
include/linux/fs.h | 2 +-
5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b5f0442..e0bad95 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -555,7 +555,7 @@ void emergency_thaw_all(void)
*/
int sync_mapping_buffers(struct address_space *mapping)
{
- struct address_space *buffer_mapping = mapping->assoc_mapping;
+ struct address_space *buffer_mapping = mapping->private_data;

if (buffer_mapping == NULL || list_empty(&mapping->private_list))
return 0;
@@ -588,10 +588,10 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
struct address_space *buffer_mapping = bh->b_page->mapping;

mark_buffer_dirty(bh);
- if (!mapping->assoc_mapping) {
- mapping->assoc_mapping = buffer_mapping;
+ if (!mapping->private_data) {
+ mapping->private_data = buffer_mapping;
} else {
- BUG_ON(mapping->assoc_mapping != buffer_mapping);
+ BUG_ON(mapping->private_data != buffer_mapping);
}
if (!bh->b_assoc_map) {
spin_lock(&buffer_mapping->private_lock);
@@ -788,7 +788,7 @@ void invalidate_inode_buffers(struct inode *inode)
if (inode_has_buffers(inode)) {
struct address_space *mapping = &inode->i_data;
struct list_head *list = &mapping->private_list;
- struct address_space *buffer_mapping = mapping->assoc_mapping;
+ struct address_space *buffer_mapping = mapping->private_data;

spin_lock(&buffer_mapping->private_lock);
while (!list_empty(list))
@@ -811,7 +811,7 @@ int remove_inode_buffers(struct inode *inode)
if (inode_has_buffers(inode)) {
struct address_space *mapping = &inode->i_data;
struct list_head *list = &mapping->private_list;
- struct address_space *buffer_mapping = mapping->assoc_mapping;
+ struct address_space *buffer_mapping = mapping->private_data;

spin_lock(&buffer_mapping->private_lock);
while (!list_empty(list)) {
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e6c2fd5..0f22d09 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -768,7 +768,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
mapping->host = s->s_bdev->bd_inode;
mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_NOFS);
- mapping->assoc_mapping = NULL;
+ mapping->private_data = NULL;
mapping->backing_dev_info = s->s_bdi;
mapping->writeback_index = 0;
}
diff --git a/fs/inode.c b/fs/inode.c
index b03c719..4cac8e1 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -165,7 +165,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
mapping->host = inode;
mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
- mapping->assoc_mapping = NULL;
+ mapping->private_data = NULL;
mapping->backing_dev_info = &default_backing_dev_info;
mapping->writeback_index = 0;

diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 3e7b2a0..07f76db 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -431,7 +431,7 @@ void nilfs_mapping_init(struct address_space *mapping, struct inode *inode,
mapping->host = inode;
mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_NOFS);
- mapping->assoc_mapping = NULL;
+ mapping->private_data = NULL;
mapping->backing_dev_info = bdi;
mapping->a_ops = &empty_aops;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b33cfc9..0982565 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -418,7 +418,7 @@ struct address_space {
struct backing_dev_info *backing_dev_info; /* device readahead, etc */
spinlock_t private_lock; /* for use by the address_space */
struct list_head private_list; /* ditto */
- struct address_space *assoc_mapping; /* ditto */
+ void *private_data; /* ditto */
} __attribute__((aligned(sizeof(long))));
/*
* On most architectures that alignment is already the case; but
--
1.7.11.7

2012-11-07 19:56:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v11 1/7] mm: adjust address_space_operations.migratepage() return code

On Wed, 7 Nov 2012 01:05:48 -0200
Rafael Aquini <[email protected]> wrote:

> This patch introduces MIGRATEPAGE_SUCCESS as the default return code
> for address_space_operations.migratepage() method and documents the
> expected return code for the same method in failure cases.

I hit a large number of rejects applying this against linux-next. Due
to the increasingly irritating sched/numa code in there.

I attempted to fix it up and also converted some (but not all) of the
implicit tests of `rc' against zero.

Please check the result very carefully - more changes will be needed.

All those

- if (rc)
+ if (rc != MIGRATEPAGE_SUCCESS)

changes are a pain. Perhaps we shouldn't bother.

2012-11-07 19:58:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

On Wed, 7 Nov 2012 01:05:52 -0200
Rafael Aquini <[email protected]> wrote:

> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
>
> Besides making balloon pages movable at allocation time and introducing
> the necessary primitives to perform balloon page migration/compaction,
> this patch also introduces the following locking scheme, in order to
> enhance the syncronization methods for accessing elements of struct
> virtio_balloon, thus providing protection against concurrent access
> introduced by parallel memory migration threads.
>
> ...
>
> @@ -122,18 +128,25 @@ static void set_page_pfns(u32 pfns[], struct page *page)
>
> static void fill_balloon(struct virtio_balloon *vb, size_t num)
> {
> + struct balloon_dev_info *vb_dev_info = vb->vb_dev_info;
> +
> + static DEFINE_RATELIMIT_STATE(fill_balloon_rs,
> + DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> +
> /* We can only do one array worth at a time. */
> num = min(num, ARRAY_SIZE(vb->pfns));
>
> + mutex_lock(&vb->balloon_lock);
> for (vb->num_pfns = 0; vb->num_pfns < num;
> vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
> - __GFP_NOMEMALLOC | __GFP_NOWARN);
> + struct page *page = balloon_page_enqueue(vb_dev_info);
> +
> if (!page) {
> - if (printk_ratelimit())
> + if (__ratelimit(&fill_balloon_rs))
> dev_printk(KERN_INFO, &vb->vdev->dev,
> "Out of puff! Can't get %zu pages\n",
> - num);
> + VIRTIO_BALLOON_PAGES_PER_PAGE);
> /* Sleep for at least 1/5 of a second before retry. */
> msleep(200);
> break;

linux-next's fill_balloon() has already been converted to
dev_info_ratelimited(). I fixed everything up. Please check the result.

2012-11-07 21:02:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] mm: introduce a common interface for balloon pages mobility

On Wed, 7 Nov 2012 01:05:50 -0200
Rafael Aquini <[email protected]> wrote:

> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
>
> This patch introduces a common interface to help a balloon driver on
> making its page set movable to compaction, and thus allowing the system
> to better leverage the compation efforts on memory defragmentation.


mm/migrate.c: In function 'unmap_and_move':
mm/migrate.c:899: error: 'COMPACTBALLOONRELEASED' undeclared (first use in this function)
mm/migrate.c:899: error: (Each undeclared identifier is reported only once
mm/migrate.c:899: error: for each function it appears in.)

You've been bad - you didn't test with your feature disabled.
Please do that. And not just compilation testing.


We can fix this one with a sucky macro. I think that's better than
unconditionally defining the enums.

--- a/include/linux/balloon_compaction.h~mm-introduce-a-common-interface-for-balloon-pages-mobility-fix
+++ a/include/linux/balloon_compaction.h
@@ -207,10 +207,8 @@ static inline gfp_t balloon_mapping_gfp_
return GFP_HIGHUSER;
}

-static inline void balloon_event_count(enum vm_event_item item)
-{
- return;
-}
+/* A macro, to avoid generating references to the undefined COMPACTBALLOON* */
+#define balloon_event_count(item) do { } while (0)

static inline bool balloon_compaction_check(void)
{

2012-11-07 21:35:09

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] mm: introduce a common interface for balloon pages mobility

On Wed, Nov 07, 2012 at 01:02:07PM -0800, Andrew Morton wrote:
> On Wed, 7 Nov 2012 01:05:50 -0200
> Rafael Aquini <[email protected]> wrote:
>
> > Memory fragmentation introduced by ballooning might reduce significantly
> > the number of 2MB contiguous memory blocks that can be used within a guest,
> > thus imposing performance penalties associated with the reduced number of
> > transparent huge pages that could be used by the guest workload.
> >
> > This patch introduces a common interface to help a balloon driver on
> > making its page set movable to compaction, and thus allowing the system
> > to better leverage the compation efforts on memory defragmentation.
>
>
> mm/migrate.c: In function 'unmap_and_move':
> mm/migrate.c:899: error: 'COMPACTBALLOONRELEASED' undeclared (first use in this function)
> mm/migrate.c:899: error: (Each undeclared identifier is reported only once
> mm/migrate.c:899: error: for each function it appears in.)
>
> You've been bad - you didn't test with your feature disabled.
> Please do that. And not just compilation testing.
>

Gasp... Shame on me, indeed. balloon_event_count() was a macro and I had it
tested a couple of review rounds earlier. You requested me to get rid of
preprocessor pirotech, and I did but I miserably failed on re-testing it.

I'm pretty sure Santa is not going to visit me this year.

Do you want me to resubmit this?

>
> We can fix this one with a sucky macro. I think that's better than
> unconditionally defining the enums.
>
> --- a/include/linux/balloon_compaction.h~mm-introduce-a-common-interface-for-balloon-pages-mobility-fix
> +++ a/include/linux/balloon_compaction.h
> @@ -207,10 +207,8 @@ static inline gfp_t balloon_mapping_gfp_
> return GFP_HIGHUSER;
> }
>
> -static inline void balloon_event_count(enum vm_event_item item)
> -{
> - return;
> -}
> +/* A macro, to avoid generating references to the undefined COMPACTBALLOON* */
> +#define balloon_event_count(item) do { } while (0)
>
> static inline bool balloon_compaction_check(void)
> {
>

2012-11-07 21:39:57

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 1/7] mm: adjust address_space_operations.migratepage() return code

On Wed, Nov 07, 2012 at 11:56:10AM -0800, Andrew Morton wrote:
> On Wed, 7 Nov 2012 01:05:48 -0200
> Rafael Aquini <[email protected]> wrote:
>
> > This patch introduces MIGRATEPAGE_SUCCESS as the default return code
> > for address_space_operations.migratepage() method and documents the
> > expected return code for the same method in failure cases.
>
> I hit a large number of rejects applying this against linux-next. Due
> to the increasingly irritating sched/numa code in there.
>
> I attempted to fix it up and also converted some (but not all) of the
> implicit tests of `rc' against zero.
>
> Please check the result very carefully - more changes will be needed.
>
> All those
>
> - if (rc)
> + if (rc != MIGRATEPAGE_SUCCESS)
>
> changes are a pain. Perhaps we shouldn't bother.

Thanks for doing that.

This hunk at migrate_pages(), however, is not necessary:

@@ -1001,7 +1001,7 @@ out:
if (!swapwrite)
current->flags &= ~PF_SWAPWRITE;

- if (rc)
+ if (rc != MIGRATEPAGE_SUCCESS)
return rc;

Here, migrate_pages() is not testing rc for the migration success, but it's just trying to
devise the flow if it has to return -ENOMEM, actually.

I guess, a change to make that snippet more clear could be:

diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..6562aee 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -987,7 +987,7 @@ int migrate_pages(struct list_head *from,
case -EAGAIN:
retry++;
break;
- case 0:
+ case MIGRATEPAGE_SUCCESS:
break;
default:
/* Permanent failure */
@@ -996,15 +996,12 @@ int migrate_pages(struct list_head *from,
}
}
}
- rc = 0;
+ rc = nr_failed + retry;
out:
if (!swapwrite)
current->flags &= ~PF_SWAPWRITE;

- if (rc)
- return rc;
-
- return nr_failed + retry;
+ return rc;
}


I can rebase this patch and resubmit if you prefer

2012-11-07 22:03:30

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Nov 07, 2012 at 11:58:10AM -0800, Andrew Morton wrote:
> On Wed, 7 Nov 2012 01:05:52 -0200
> Rafael Aquini <[email protected]> wrote:
>
> > Memory fragmentation introduced by ballooning might reduce significantly
> > the number of 2MB contiguous memory blocks that can be used within a guest,
> > thus imposing performance penalties associated with the reduced number of
> > transparent huge pages that could be used by the guest workload.
> >
> > Besides making balloon pages movable at allocation time and introducing
> > the necessary primitives to perform balloon page migration/compaction,
> > this patch also introduces the following locking scheme, in order to
> > enhance the syncronization methods for accessing elements of struct
> > virtio_balloon, thus providing protection against concurrent access
> > introduced by parallel memory migration threads.
> >
> > ...
> >
> > @@ -122,18 +128,25 @@ static void set_page_pfns(u32 pfns[], struct page *page)
> >
> > static void fill_balloon(struct virtio_balloon *vb, size_t num)
> > {
> > + struct balloon_dev_info *vb_dev_info = vb->vb_dev_info;
> > +
> > + static DEFINE_RATELIMIT_STATE(fill_balloon_rs,
> > + DEFAULT_RATELIMIT_INTERVAL,
> > + DEFAULT_RATELIMIT_BURST);
> > +
> > /* We can only do one array worth at a time. */
> > num = min(num, ARRAY_SIZE(vb->pfns));
> >
> > + mutex_lock(&vb->balloon_lock);
> > for (vb->num_pfns = 0; vb->num_pfns < num;
> > vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> > - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
> > - __GFP_NOMEMALLOC | __GFP_NOWARN);
> > + struct page *page = balloon_page_enqueue(vb_dev_info);
> > +
> > if (!page) {
> > - if (printk_ratelimit())
> > + if (__ratelimit(&fill_balloon_rs))
> > dev_printk(KERN_INFO, &vb->vdev->dev,
> > "Out of puff! Can't get %zu pages\n",
> > - num);
> > + VIRTIO_BALLOON_PAGES_PER_PAGE);
> > /* Sleep for at least 1/5 of a second before retry. */
> > msleep(200);
> > break;
>
> linux-next's fill_balloon() has already been converted to
> dev_info_ratelimited(). I fixed everything up. Please check the result.

Looks great, thanks for doing it

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-11-08 00:01:26

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

Rafael Aquini <[email protected]> writes:
> + * virtballoon_migratepage - perform the balloon page migration on behalf of
> + * a compation thread. (called under page lock)

> + if (!mutex_trylock(&vb->balloon_lock))
> + return -EAGAIN;

Erk, OK...

> + /* balloon's page migration 1st step -- inflate "newpage" */
> + spin_lock_irqsave(&vb_dev_info->pages_lock, flags);
> + balloon_page_insert(newpage, mapping, &vb_dev_info->pages);
> + vb_dev_info->isolated_pages--;
> + spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
> + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> + set_page_pfns(vb->pfns, newpage);
> + tell_host(vb, vb->inflate_vq);

tell_host does wait_event(), so you can't call it under the page_lock.
Right?

You probably get away with it because current qemu will service you
immediately. You could spin here in this case for the moment.

There's a second call to tell_host():

> + /*
> + * balloon's page migration 2nd step -- deflate "page"
> + *
> + * It's safe to delete page->lru here because this page is at
> + * an isolated migration list, and this step is expected to happen here
> + */
> + balloon_page_delete(page);
> + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> + set_page_pfns(vb->pfns, page);
> + tell_host(vb, vb->deflate_vq);

The first one can be delayed, the second one can be delayed if the host
didn't ask for VIRTIO_BALLOON_F_MUST_TELL_HOST (qemu doesn't).

We could implement a proper request queue for these, and return -EAGAIN
if the queue fills. Though in practice, it's not important (it might
help performance).

Cheers,
Rusty.

2012-11-08 00:11:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

On Thu, 08 Nov 2012 09:32:18 +1030
Rusty Russell <[email protected]> wrote:

> Rafael Aquini <[email protected]> writes:
> > + * virtballoon_migratepage - perform the balloon page migration on behalf of
> > + * a compation thread. (called under page lock)
>
> > + if (!mutex_trylock(&vb->balloon_lock))
> > + return -EAGAIN;
>
> Erk, OK...

Not really. As is almost always the case with a trylock, it needs a
comment explaining why we couldn't use the far superior mutex_lock().
Data: this reader doesn't know!

> > + /* balloon's page migration 1st step -- inflate "newpage" */
> > + spin_lock_irqsave(&vb_dev_info->pages_lock, flags);
> > + balloon_page_insert(newpage, mapping, &vb_dev_info->pages);
> > + vb_dev_info->isolated_pages--;
> > + spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
> > + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> > + set_page_pfns(vb->pfns, newpage);
> > + tell_host(vb, vb->inflate_vq);
>
> tell_host does wait_event(), so you can't call it under the page_lock.
> Right?

Sleeping inside lock_page() is OK. More problematic is that GFP_KERNEL
allocation. iirc it _should_ be OK. Core VM uses trylock_page() and
the filesystems shouldn't be doing a synchronous lock_page() in the
pageout path. But I suspect it isn't a well-tested area.

> You probably get away with it because current qemu will service you
> immediately. You could spin here in this case for the moment.
>
> There's a second call to tell_host():
>
> > + /*
> > + * balloon's page migration 2nd step -- deflate "page"
> > + *
> > + * It's safe to delete page->lru here because this page is at
> > + * an isolated migration list, and this step is expected to happen here
> > + */
> > + balloon_page_delete(page);
> > + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> > + set_page_pfns(vb->pfns, page);
> > + tell_host(vb, vb->deflate_vq);
>
> The first one can be delayed, the second one can be delayed if the host
> didn't ask for VIRTIO_BALLOON_F_MUST_TELL_HOST (qemu doesn't).
>
> We could implement a proper request queue for these, and return -EAGAIN
> if the queue fills. Though in practice, it's not important (it might
> help performance).

2012-11-08 00:32:32

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Nov 07, 2012 at 04:11:46PM -0800, Andrew Morton wrote:
> On Thu, 08 Nov 2012 09:32:18 +1030
> Rusty Russell <[email protected]> wrote:
>
> > Rafael Aquini <[email protected]> writes:
> > > + * virtballoon_migratepage - perform the balloon page migration on behalf of
> > > + * a compation thread. (called under page lock)
> >
> > > + if (!mutex_trylock(&vb->balloon_lock))
> > > + return -EAGAIN;
> >
> > Erk, OK...
>
> Not really. As is almost always the case with a trylock, it needs a
> comment explaining why we couldn't use the far superior mutex_lock().
> Data: this reader doesn't know!
>


That was just to alleviate balloon_lock contention if we're migrating pages
concurrently with balloon_fill() or balloon_leak(), as it's easier to retry
the page migration later (in the contended case).


> > > + /* balloon's page migration 1st step -- inflate "newpage" */
> > > + spin_lock_irqsave(&vb_dev_info->pages_lock, flags);
> > > + balloon_page_insert(newpage, mapping, &vb_dev_info->pages);
> > > + vb_dev_info->isolated_pages--;
> > > + spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
> > > + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> > > + set_page_pfns(vb->pfns, newpage);
> > > + tell_host(vb, vb->inflate_vq);
> >
> > tell_host does wait_event(), so you can't call it under the page_lock.
> > Right?
>
> Sleeping inside lock_page() is OK. More problematic is that GFP_KERNEL
> allocation. iirc it _should_ be OK. Core VM uses trylock_page() and
> the filesystems shouldn't be doing a synchronous lock_page() in the
> pageout path. But I suspect it isn't a well-tested area.


The locked page under migration is not contended by any other FS / core VM path,
as it is already isolated by compaction to a private migration page list.
OTOH, there's a chance of another parallel compaction thread hitting this page
while scanning page blocks for isolation, but that path can be considered safe
as it uses trylock_page()



>
> > You probably get away with it because current qemu will service you
> > immediately. You could spin here in this case for the moment.
> >
> > There's a second call to tell_host():
> >
> > > + /*
> > > + * balloon's page migration 2nd step -- deflate "page"
> > > + *
> > > + * It's safe to delete page->lru here because this page is at
> > > + * an isolated migration list, and this step is expected to happen here
> > > + */
> > > + balloon_page_delete(page);
> > > + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> > > + set_page_pfns(vb->pfns, page);
> > > + tell_host(vb, vb->deflate_vq);
> >
> > The first one can be delayed, the second one can be delayed if the host
> > didn't ask for VIRTIO_BALLOON_F_MUST_TELL_HOST (qemu doesn't).
> >
> > We could implement a proper request queue for these, and return -EAGAIN
> > if the queue fills. Though in practice, it's not important (it might
> > help performance).
>

2012-11-08 00:34:30

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

On Thu, Nov 08, 2012 at 09:32:18AM +1030, Rusty Russell wrote:
> The first one can be delayed, the second one can be delayed if the host
> didn't ask for VIRTIO_BALLOON_F_MUST_TELL_HOST (qemu doesn't).
>
> We could implement a proper request queue for these, and return -EAGAIN
> if the queue fills. Though in practice, it's not important (it might
> help performance).

I liked the idea. Give me the directions to accomplish it and I'll give it a try
for sure.

2012-11-09 12:11:54

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] mm: introduce a common interface for balloon pages mobility

On Wed, Nov 07, 2012 at 01:05:50AM -0200, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
>
> This patch introduces a common interface to help a balloon driver on
> making its page set movable to compaction, and thus allowing the system
> to better leverage the compation efforts on memory defragmentation.
>
> Signed-off-by: Rafael Aquini <[email protected]>
> ---
> include/linux/balloon_compaction.h | 220 ++++++++++++++++++++++++++++++
> include/linux/migrate.h | 10 ++
> include/linux/pagemap.h | 16 +++
> mm/Kconfig | 15 +++
> mm/Makefile | 1 +
> mm/balloon_compaction.c | 269 +++++++++++++++++++++++++++++++++++++
> 6 files changed, 531 insertions(+)
> create mode 100644 include/linux/balloon_compaction.h
> create mode 100644 mm/balloon_compaction.c
>
> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> new file mode 100644
> index 0000000..1865bd5
> --- /dev/null
> +++ b/include/linux/balloon_compaction.h
> @@ -0,0 +1,220 @@
> +/*
> + * include/linux/balloon_compaction.h
> + *
> + * Common interface definitions for making balloon pages movable to compaction.
> + *

s/to/by/

And the record for nit-picking goes to....

> + * Copyright (C) 2012, Red Hat, Inc. Rafael Aquini <[email protected]>
> + */
> +#ifndef _LINUX_BALLOON_COMPACTION_H
> +#define _LINUX_BALLOON_COMPACTION_H
> +#include <linux/pagemap.h>
> +#include <linux/migrate.h>
> +#include <linux/gfp.h>
> +#include <linux/err.h>
> +
> +/*
> + * Balloon device information descriptor.
> + * This struct is used to allow the common balloon compaction interface
> + * procedures to find the proper balloon device holding memory pages they'll
> + * have to cope for page compaction / migration, as well as it serves the
> + * balloon driver as a page book-keeper for its registered balloon devices.
> + */
> +struct balloon_dev_info {
> + void *balloon_device; /* balloon device descriptor */
> + struct address_space *mapping; /* balloon special page->mapping */
> + unsigned long isolated_pages; /* # of isolated pages for migration */
> + spinlock_t pages_lock; /* Protection to pages list */
> + struct list_head pages; /* Pages enqueued & handled to Host */
> +};
> +
> +extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
> +extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
> +extern struct balloon_dev_info *balloon_devinfo_alloc(
> + void *balloon_dev_descriptor);
> +
> +static inline void balloon_devinfo_free(struct balloon_dev_info *b_dev_info)
> +{
> + kfree(b_dev_info);
> +}
> +

More stupid nit-picking but in terms of defensive programming it would
be preferred if the container struct of balloon_dev_info was passed in
and then

kfree(something->b_dev_info);
something->b_dev_info = NULL;

This looks like serious overkill but if the lifetime of the balloon driver
is complex and relatively rarely used (which I bet is the case) then we
want use-after-free bugs to blow up quickly instead of corrupt.

It is not mandatory to address this before merging but if you do another
revision it might be a nice idea. I'm only bringing this up as I got bitten
by this recently when I screwed up the lifetime of a mm-associated struct
and it took a while to pin down.

The suggestion is irrelevant if the containing structure is freed at the
same time of course.

> +#ifdef CONFIG_BALLOON_COMPACTION
> +extern bool balloon_page_isolate(struct page *page);
> +extern void balloon_page_putback(struct page *page);
> +extern int balloon_page_migrate(struct page *newpage,
> + struct page *page, enum migrate_mode mode);
> +extern struct address_space
> +*balloon_mapping_alloc(struct balloon_dev_info *b_dev_info,
> + const struct address_space_operations *a_ops);
> +

It should not be part of your patch but one day we might want to abstract
even this and put users with different migration requirements behind some
sort of migration_operations callback strucr. Balloon would be the first
user of course but CMA also is and but if placement constraints for
memory usage ever happens, it'd be another user.

> +static inline void balloon_mapping_free(struct address_space *balloon_mapping)
> +{
> + kfree(balloon_mapping);
> +}
> +
> +/*
> + * balloon_page_insert - insert a page into the balloon's page list and make
> + * the page->mapping assignment accordingly.
> + * @page : page to be assigned as a 'balloon page'
> + * @mapping : allocated special 'balloon_mapping'
> + * @head : balloon's device page list head
> + */
> +static inline void balloon_page_insert(struct page *page,
> + struct address_space *mapping,
> + struct list_head *head)
> +{
> + list_add(&page->lru, head);
> + /*
> + * Make sure the page is already inserted on balloon's page list
> + * before assigning its ->mapping.
> + */
> + smp_wmb();
> + page->mapping = mapping;
> +}
> +

Elsewhere we have;

spin_lock_irqsave(&b_dev_info->pages_lock, flags);
balloon_page_insert(page, b_dev_info->mapping, &b_dev_info->pages);
spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);

So this happens under an irq-safe lock. Why is a smp_wmb necessary?

> +
> +/*
> + * balloon_page_delete - clear the page->mapping and delete the page from
> + * balloon's page list accordingly.
> + * @page : page to be released from balloon's page list
> + */
> +static inline void balloon_page_delete(struct page *page)
> +{
> + page->mapping = NULL;
> + /*
> + * Make sure page->mapping is cleared before we proceed with
> + * balloon's page list deletion.
> + */
> + smp_wmb();
> + list_del(&page->lru);
> +}
> +

Same thing on locking except this also appears to be under the page lock
making the barrier seem even more unnecessary.

> +/*
> + * __is_movable_balloon_page - helper to perform @page mapping->flags tests
> + */
> +static inline bool __is_movable_balloon_page(struct page *page)
> +{
> + /*
> + * we might attempt to read ->mapping concurrently to other
> + * threads trying to write to it.
> + */
> + struct address_space *mapping = ACCESS_ONCE(page->mapping);
> + smp_read_barrier_depends();
> + return mapping_balloon(mapping);
> +}
> +

What happens if this race occurs? I assume it's a racy check before you
isolate the balloon in which case the barrier may be overkill.

> +/*
> + * balloon_page_movable - test page->mapping->flags to identify balloon pages
> + * that can be moved by compaction/migration.
> + *
> + * This function is used at core compaction's page isolation scheme, therefore
> + * most pages exposed to it are not enlisted as balloon pages and so, to avoid
> + * undesired side effects like racing against __free_pages(), we cannot afford
> + * holding the page locked while testing page->mapping->flags here.
> + *
> + * As we might return false positives in the case of a balloon page being just
> + * released under us, the page->mapping->flags need to be re-tested later,
> + * under the proper page lock, at the functions that will be coping with the
> + * balloon page case.
> + */
> +static inline bool balloon_page_movable(struct page *page)
> +{
> + /*
> + * Before dereferencing and testing mapping->flags, lets make sure
> + * this is not a page that uses ->mapping in a different way
> + */
> + if (!PageSlab(page) && !PageSwapCache(page) && !PageAnon(page) &&
> + !page_mapped(page))
> + return __is_movable_balloon_page(page);
> +
> + return false;
> +}
> +

Ok.

> +/*
> + * balloon_page_device - get the b_dev_info descriptor for the balloon device
> + * that enqueues the given page.
> + */
> +static inline struct balloon_dev_info *balloon_page_device(struct page *page)
> +{
> + struct address_space *mapping = page->mapping;
> + if (likely(mapping))
> + return mapping->private_data;
> +
> + return NULL;
> +}
> +
> +static inline gfp_t balloon_mapping_gfp_mask(void)
> +{
> + return GFP_HIGHUSER_MOVABLE;
> +}
> +
> +static inline void balloon_event_count(enum vm_event_item item)
> +{
> + count_vm_event(item);
> +}
> +
> +static inline bool balloon_compaction_check(void)
> +{
> + return true;
> +}
> +
> +#else /* !CONFIG_BALLOON_COMPACTION */
> +
> +static inline void *balloon_mapping_alloc(void *balloon_device,
> + const struct address_space_operations *a_ops)
> +{
> + return ERR_PTR(-EOPNOTSUPP);
> +}
> +
> +static inline void balloon_mapping_free(struct address_space *balloon_mapping)
> +{
> + return;
> +}
> +
> +static inline void balloon_page_insert(struct page *page,
> + struct address_space *mapping,
> + struct list_head *head)
> +{
> + list_add(&page->lru, head);
> +}
> +
> +static inline void balloon_page_delete(struct page *page)
> +{
> + list_del(&page->lru);
> +}
> +
> +static inline bool balloon_page_movable(struct page *page)
> +{
> + return false;
> +}
> +
> +static inline bool balloon_page_isolate(struct page *page)
> +{
> + return false;
> +}
> +
> +static inline void balloon_page_putback(struct page *page)
> +{
> + return;
> +}
> +
> +static inline int balloon_page_migrate(struct page *newpage,
> + struct page *page, enum migrate_mode mode)
> +{
> + return 0;
> +}
> +
> +static inline gfp_t balloon_mapping_gfp_mask(void)
> +{
> + return GFP_HIGHUSER;
> +}
> +
> +static inline void balloon_event_count(enum vm_event_item item)
> +{
> + return;
> +}
> +
> +static inline bool balloon_compaction_check(void)
> +{
> + return false;
> +}
> +#endif /* CONFIG_BALLOON_COMPACTION */
> +#endif /* _LINUX_BALLOON_COMPACTION_H */
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index a4e886d..e570c3c 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -11,8 +11,18 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
> * Return values from addresss_space_operations.migratepage():
> * - negative errno on page migration failure;
> * - zero on page migration success;
> + *
> + * The balloon page migration introduces this special case where a 'distinct'
> + * return code is used to flag a successful page migration to unmap_and_move().
> + * This approach is necessary because page migration can race against balloon
> + * deflation procedure, and for such case we could introduce a nasty page leak
> + * if a successfully migrated balloon page gets released concurrently with
> + * migration's unmap_and_move() wrap-up steps.
> */
> #define MIGRATEPAGE_SUCCESS 0
> +#define MIGRATEPAGE_BALLOON_SUCCESS 1 /* special ret code for balloon page
> + * sucessfull migration case.
> + */

s/sucessfull/successful/

>
> #ifdef CONFIG_MIGRATION
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index e42c762..6da609d 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -24,6 +24,7 @@ enum mapping_flags {
> AS_ENOSPC = __GFP_BITS_SHIFT + 1, /* ENOSPC on async write */
> AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
> AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
> + AS_BALLOON_MAP = __GFP_BITS_SHIFT + 4, /* balloon page special map */
> };
>
> static inline void mapping_set_error(struct address_space *mapping, int error)
> @@ -53,6 +54,21 @@ static inline int mapping_unevictable(struct address_space *mapping)
> return !!mapping;
> }
>
> +static inline void mapping_set_balloon(struct address_space *mapping)
> +{
> + set_bit(AS_BALLOON_MAP, &mapping->flags);
> +}
> +
> +static inline void mapping_clear_balloon(struct address_space *mapping)
> +{
> + clear_bit(AS_BALLOON_MAP, &mapping->flags);
> +}
> +
> +static inline int mapping_balloon(struct address_space *mapping)
> +{
> + return mapping && test_bit(AS_BALLOON_MAP, &mapping->flags);
> +}
> +
> static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
> {
> return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
> diff --git a/mm/Kconfig b/mm/Kconfig
> index a3f8ddd..b119172 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -188,6 +188,21 @@ config SPLIT_PTLOCK_CPUS
> default "4"
>
> #
> +# support for memory balloon compaction
> +config BALLOON_COMPACTION
> + bool "Allow for balloon memory compaction/migration"
> + select COMPACTION
> + depends on VIRTIO_BALLOON
> + help
> + Memory fragmentation introduced by ballooning might reduce
> + significantly the number of 2MB contiguous memory blocks that can be
> + used within a guest, thus imposing performance penalties associated
> + with the reduced number of transparent huge pages that could be used
> + by the guest workload. Allowing the compaction & migration for memory
> + pages enlisted as being part of memory balloon devices avoids the
> + scenario aforementioned and helps improving memory defragmentation.
> +

Rather than select COMPACTION, should it depend on it? Similarly as THP
is the primary motivation, would it make more sense to depend on
TRANSPARENT_HUGEPAGE?

Should it default y? It seems useful, why would someone support
VIRTIO_BALLOON and *not* use this?

> +#
> # support for memory compaction
> config COMPACTION
> bool "Allow for memory compaction"
> diff --git a/mm/Makefile b/mm/Makefile
> index 6b025f8..21e10ee 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -57,3 +57,4 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
> obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
> obj-$(CONFIG_CLEANCACHE) += cleancache.o
> obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
> +obj-$(CONFIG_BALLOON_COMPACTION) += balloon_compaction.o
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> new file mode 100644
> index 0000000..90935aa
> --- /dev/null
> +++ b/mm/balloon_compaction.c
> @@ -0,0 +1,269 @@
> +/*
> + * mm/balloon_compaction.c
> + *
> + * Common interface for making balloon pages movable to compaction.
> + *
> + * Copyright (C) 2012, Red Hat, Inc. Rafael Aquini <[email protected]>
> + */
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/export.h>
> +#include <linux/balloon_compaction.h>
> +
> +/*
> + * balloon_devinfo_alloc - allocates a balloon device information descriptor.
> + * @balloon_dev_descriptor: pointer to reference the balloon device which
> + * this struct balloon_dev_info will be servicing.
> + *
> + * Driver must call it to properly allocate and initialize an instance of
> + * struct balloon_dev_info which will be used to reference a balloon device
> + * as well as to keep track of the balloon device page list.
> + */
> +struct balloon_dev_info *balloon_devinfo_alloc(void *balloon_dev_descriptor)
> +{
> + struct balloon_dev_info *b_dev_info;
> + b_dev_info = kmalloc(sizeof(*b_dev_info), GFP_KERNEL);
> + if (!b_dev_info)
> + return ERR_PTR(-ENOMEM);
> +
> + b_dev_info->balloon_device = balloon_dev_descriptor;
> + b_dev_info->mapping = NULL;
> + b_dev_info->isolated_pages = 0;
> + spin_lock_init(&b_dev_info->pages_lock);
> + INIT_LIST_HEAD(&b_dev_info->pages);
> +
> + return b_dev_info;
> +}
> +EXPORT_SYMBOL_GPL(balloon_devinfo_alloc);
> +
> +/*
> + * balloon_page_enqueue - allocates a new page and inserts it into the balloon
> + * page list.
> + * @b_dev_info: balloon device decriptor where we will insert a new page to
> + *
> + * Driver must call it to properly allocate a new enlisted balloon page
> + * before definetively removing it from the guest system.
> + * This function returns the page address for the recently enqueued page or
> + * NULL in the case we fail to allocate a new page this turn.
> + */
> +struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info)
> +{
> + unsigned long flags;
> + struct page *page = alloc_page(balloon_mapping_gfp_mask() |
> + __GFP_NOMEMALLOC | __GFP_NORETRY);
> + if (!page)
> + return NULL;
> +
> + BUG_ON(!trylock_page(page));
> + spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> + balloon_page_insert(page, b_dev_info->mapping, &b_dev_info->pages);
> + spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> + unlock_page(page);
> + return page;
> +}
> +EXPORT_SYMBOL_GPL(balloon_page_enqueue);
> +
> +/*
> + * balloon_page_dequeue - removes a page from balloon's page list and returns
> + * the its address to allow the driver release the page.
> + * @b_dev_info: balloon device decriptor where we will grab a page from.
> + *
> + * Driver must call it to properly de-allocate a previous enlisted balloon page
> + * before definetively releasing it back to the guest system.
> + * This function returns the page address for the recently dequeued page or
> + * NULL in the case we find balloon's page list temporarely empty due to
> + * compaction isolated pages.
> + */
> +struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> +{
> + struct page *page, *tmp;
> + unsigned long flags;
> + bool dequeued_page;
> +
> + dequeued_page = false;
> + list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> + if (trylock_page(page)) {
> + spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> + balloon_page_delete(page);
> + spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> + unlock_page(page);
> + dequeued_page = true;
> + break;
> + }
> + }
> +
> + if (!dequeued_page) {
> + spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> + if (unlikely(list_empty(&b_dev_info->pages) &&
> + !b_dev_info->isolated_pages))
> + BUG();
> + spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> + page = NULL;
> + }
> + return page;
> +}
> +EXPORT_SYMBOL_GPL(balloon_page_dequeue);
> +
> +#ifdef CONFIG_BALLOON_COMPACTION
> +/*
> + * balloon_mapping_alloc - allocates a special ->mapping for ballooned pages.
> + * @b_dev_info: holds the balloon device information descriptor.
> + * @a_ops: balloon_mapping address_space_operations descriptor.
> + *
> + * Driver must call it to properly allocate and initialize an instance of
> + * struct address_space which will be used as the special page->mapping for
> + * balloon device enlisted page instances.
> + */
> +struct address_space *balloon_mapping_alloc(struct balloon_dev_info *b_dev_info,
> + const struct address_space_operations *a_ops)
> +{
> + struct address_space *mapping;
> +
> + mapping = kmalloc(sizeof(*mapping), GFP_KERNEL);
> + if (!mapping)
> + return ERR_PTR(-ENOMEM);
> +
> + /*
> + * Give a clean 'zeroed' status to all elements of this special
> + * balloon page->mapping struct address_space instance.
> + */
> + address_space_init_once(mapping);
> +
> + /*
> + * Set mapping->flags appropriately, to allow balloon pages
> + * ->mapping identification.
> + */
> + mapping_set_balloon(mapping);
> + mapping_set_gfp_mask(mapping, balloon_mapping_gfp_mask());
> +
> + /* balloon's page->mapping->a_ops callback descriptor */
> + mapping->a_ops = a_ops;
> +
> + /*
> + * Establish a pointer reference back to the balloon device descriptor
> + * this particular page->mapping will be servicing.
> + * This is used by compaction / migration procedures to identify and
> + * access the balloon device pageset while isolating / migrating pages.
> + *
> + * As some balloon drivers can register multiple balloon devices
> + * for a single guest, this also helps compaction / migration to
> + * properly deal with multiple balloon pagesets, when required.
> + */
> + mapping->private_data = b_dev_info;
> + b_dev_info->mapping = mapping;
> +
> + return mapping;
> +}
> +EXPORT_SYMBOL_GPL(balloon_mapping_alloc);
> +
> +static inline void __isolate_balloon_page(struct page *page)
> +{
> + struct balloon_dev_info *b_dev_info = page->mapping->private_data;
> + unsigned long flags;
> + spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> + list_del(&page->lru);
> + b_dev_info->isolated_pages++;
> + spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +}
> +
> +static inline void __putback_balloon_page(struct page *page)
> +{
> + struct balloon_dev_info *b_dev_info = page->mapping->private_data;
> + unsigned long flags;
> + spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> + list_add(&page->lru, &b_dev_info->pages);
> + b_dev_info->isolated_pages--;
> + spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +}
> +
> +static inline int __migrate_balloon_page(struct address_space *mapping,
> + struct page *newpage, struct page *page, enum migrate_mode mode)
> +{
> + return page->mapping->a_ops->migratepage(mapping, newpage, page, mode);
> +}
> +
> +/* __isolate_lru_page() counterpart for a ballooned page */
> +bool balloon_page_isolate(struct page *page)
> +{
> + /*
> + * Avoid burning cycles with pages that are yet under __free_pages(),
> + * or just got freed under us.
> + *
> + * In case we 'win' a race for a balloon page being freed under us and
> + * raise its refcount preventing __free_pages() from doing its job
> + * the put_page() at the end of this block will take care of
> + * release this page, thus avoiding a nasty leakage.
> + */
> + if (likely(get_page_unless_zero(page))) {
> + /*
> + * As balloon pages are not isolated from LRU lists, concurrent
> + * compaction threads can race against page migration functions
> + * as well as race against the balloon driver releasing a page.
> + *
> + * In order to avoid having an already isolated balloon page
> + * being (wrongly) re-isolated while it is under migration,
> + * or to avoid attempting to isolate pages being released by
> + * the balloon driver, lets be sure we have the page lock
> + * before proceeding with the balloon page isolation steps.
> + */
> + if (likely(trylock_page(page))) {
> + /*
> + * A ballooned page, by default, has just one refcount.
> + * Prevent concurrent compaction threads from isolating
> + * an already isolated balloon page by refcount check.
> + */
> + if (__is_movable_balloon_page(page) &&
> + page_count(page) == 2) {
> + __isolate_balloon_page(page);
> + unlock_page(page);
> + return true;
> + }
> + unlock_page(page);
> + }
> + put_page(page);
> + }
> + return false;
> +}
> +
> +/* putback_lru_page() counterpart for a ballooned page */
> +void balloon_page_putback(struct page *page)
> +{
> + /*
> + * 'lock_page()' stabilizes the page and prevents races against
> + * concurrent isolation threads attempting to re-isolate it.
> + */
> + lock_page(page);
> +
> + if (__is_movable_balloon_page(page)) {
> + __putback_balloon_page(page);
> + put_page(page);
> + } else {
> + __WARN();
> + dump_page(page);
> + }
> + unlock_page(page);
> +}
> +
> +/* move_to_new_page() counterpart for a ballooned page */
> +int balloon_page_migrate(struct page *newpage,
> + struct page *page, enum migrate_mode mode)
> +{
> + struct address_space *mapping;
> + int rc = -EAGAIN;
> +
> + BUG_ON(!trylock_page(newpage));
> +
> + if (WARN_ON(!__is_movable_balloon_page(page))) {
> + dump_page(page);
> + unlock_page(newpage);
> + return rc;
> + }
> +
> + mapping = page->mapping;
> + if (mapping)
> + rc = __migrate_balloon_page(mapping, newpage, page, mode);
> +
> + unlock_page(newpage);
> + return rc;
> +}
> +#endif /* CONFIG_BALLOON_COMPACTION */

Ok, I did not spot any obvious problems in this. The barriers were the
big issue for me really - they seem overkill. I think we've discussed
this already but even though it was recent I cannot remember the
conclusion. In a sense, it doesn't matter because it should have been
described in the code anyway.

If you get the barrier issue sorted out then feel free to add

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2012-11-09 12:16:10

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v11 4/7] mm: introduce compaction and migration for ballooned pages

On Wed, Nov 07, 2012 at 01:05:51AM -0200, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
>
> This patch introduces the helper functions as well as the necessary changes
> to teach compaction and migration bits how to cope with pages which are
> part of a guest memory balloon, in order to make them movable by memory
> compaction procedures.
>
> Signed-off-by: Rafael Aquini <[email protected]>
> ---
> mm/compaction.c | 21 +++++++++++++++++++--
> mm/migrate.c | 36 ++++++++++++++++++++++++++++++++++--
> 2 files changed, 53 insertions(+), 4 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9eef558..76abd84 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -14,6 +14,7 @@
> #include <linux/backing-dev.h>
> #include <linux/sysctl.h>
> #include <linux/sysfs.h>
> +#include <linux/balloon_compaction.h>
> #include "internal.h"
>
> #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> @@ -565,9 +566,24 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> goto next_pageblock;
> }
>
> - /* Check may be lockless but that's ok as we recheck later */
> - if (!PageLRU(page))
> + /*
> + * Check may be lockless but that's ok as we recheck later.
> + * It's possible to migrate LRU pages and balloon pages
> + * Skip any other type of page
> + */
> + if (!PageLRU(page)) {
> + if (unlikely(balloon_page_movable(page))) {

Because it's lockless, it really seems that the barrier stuck down there
is unnecessary. At worst you get a temporarily incorrect answer that you
recheck later under page lock in balloon_page_isolate.

> + if (locked && balloon_page_isolate(page)) {
> + /* Successfully isolated */
> + cc->finished_update_migrate = true;
> + list_add(&page->lru, migratelist);
> + cc->nr_migratepages++;
> + nr_isolated++;
> + goto check_compact_cluster;
> + }
> + }
> continue;
> + }
>
> /*
> * PageLRU is set. lru_lock normally excludes isolation
> @@ -621,6 +637,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> cc->nr_migratepages++;
> nr_isolated++;
>
> +check_compact_cluster:
> /* Avoid isolating too much */
> if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
> ++low_pfn;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 98c7a89..87ffe54 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -35,6 +35,7 @@
> #include <linux/hugetlb.h>
> #include <linux/hugetlb_cgroup.h>
> #include <linux/gfp.h>
> +#include <linux/balloon_compaction.h>
>
> #include <asm/tlbflush.h>
>
> @@ -79,7 +80,10 @@ void putback_lru_pages(struct list_head *l)
> list_del(&page->lru);
> dec_zone_page_state(page, NR_ISOLATED_ANON +
> page_is_file_cache(page));
> - putback_lru_page(page);
> + if (unlikely(balloon_page_movable(page)))
> + balloon_page_putback(page);
> + else
> + putback_lru_page(page);
> }
> }
>
> @@ -778,6 +782,18 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> }
> }
>
> + if (unlikely(balloon_page_movable(page))) {
> + /*
> + * A ballooned page does not need any special attention from
> + * physical to virtual reverse mapping procedures.
> + * Skip any attempt to unmap PTEs or to remap swap cache,
> + * in order to avoid burning cycles at rmap level, and perform
> + * the page migration right away (proteced by page lock).
> + */
> + rc = balloon_page_migrate(newpage, page, mode);
> + goto uncharge;
> + }
> +
> /*
> * Corner case handling:
> * 1. When a new swap-cache page is read into, it is added to the LRU
> @@ -814,7 +830,9 @@ skip_unmap:
> put_anon_vma(anon_vma);
>
> uncharge:
> - mem_cgroup_end_migration(mem, page, newpage, rc == MIGRATEPAGE_SUCCESS);
> + mem_cgroup_end_migration(mem, page, newpage,
> + (rc == MIGRATEPAGE_SUCCESS ||
> + rc == MIGRATEPAGE_BALLOON_SUCCESS));
> unlock:
> unlock_page(page);
> out:
> @@ -846,6 +864,20 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
> goto out;
>
> rc = __unmap_and_move(page, newpage, force, offlining, mode);
> +
> + if (unlikely(rc == MIGRATEPAGE_BALLOON_SUCCESS)) {
> + /*
> + * A ballooned page has been migrated already.
> + * Now, it's the time to remove the old page from the isolated
> + * pageset list and handle it back to Buddy, wrap-up counters
> + * and return.
> + */
> + dec_zone_page_state(page, NR_ISOLATED_ANON +
> + page_is_file_cache(page));
> + put_page(page);
> + __free_page(page);
> + return 0;
> + }
> out:
> if (rc != -EAGAIN) {
> /*

It may be necessary to make this more generic for migration-related
callbacks but I see nothing incompatible in your patch with doing that.
Doing the abstraction now would be overkill so

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2012-11-09 12:20:38

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v11 7/7] mm: add vm event counters for balloon pages compaction

On Wed, Nov 07, 2012 at 01:05:54AM -0200, Rafael Aquini wrote:
> This patch introduces a new set of vm event counters to keep track of
> ballooned pages compaction activity.
>
> Signed-off-by: Rafael Aquini <[email protected]>

Other than confirming the thing actually works can any meaningful
conclusions be drawn from this counters?

I know I have been inconsistent on this myself in the past but recently
I've been taking the attitude that the counters can be used to fit into
some other metric. I'm looking to change the compaction counters to be
able to build a basic cost model for example. The same idea could be
used for balloons of course but it's a less critical path than
compaction for THP for example.

Assuming it builds and all the defines are correct when the feature is
not configured (I didn't check) then there is nothing wrong with the
patch. However, if it was dropped would it make life very hard or would
you notice?

--
Mel Gorman
SUSE Labs

2012-11-09 14:54:19

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] mm: introduce a common interface for balloon pages mobility

On Fri, Nov 09, 2012 at 12:11:33PM +0000, Mel Gorman wrote:
> > +/*
> > + * balloon_page_insert - insert a page into the balloon's page list and make
> > + * the page->mapping assignment accordingly.
> > + * @page : page to be assigned as a 'balloon page'
> > + * @mapping : allocated special 'balloon_mapping'
> > + * @head : balloon's device page list head
> > + */
> > +static inline void balloon_page_insert(struct page *page,
> > + struct address_space *mapping,
> > + struct list_head *head)
> > +{
> > + list_add(&page->lru, head);
> > + /*
> > + * Make sure the page is already inserted on balloon's page list
> > + * before assigning its ->mapping.
> > + */
> > + smp_wmb();
> > + page->mapping = mapping;
> > +}
> > +
>
> Elsewhere we have;
>
> spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> balloon_page_insert(page, b_dev_info->mapping, &b_dev_info->pages);
> spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>
> So this happens under an irq-safe lock. Why is a smp_wmb necessary?
>
> > +
> > +/*
> > + * balloon_page_delete - clear the page->mapping and delete the page from
> > + * balloon's page list accordingly.
> > + * @page : page to be released from balloon's page list
> > + */
> > +static inline void balloon_page_delete(struct page *page)
> > +{
> > + page->mapping = NULL;
> > + /*
> > + * Make sure page->mapping is cleared before we proceed with
> > + * balloon's page list deletion.
> > + */
> > + smp_wmb();
> > + list_del(&page->lru);
> > +}
> > +
>
> Same thing on locking except this also appears to be under the page lock
> making the barrier seem even more unnecessary.
>
> > +/*
> > + * __is_movable_balloon_page - helper to perform @page mapping->flags tests
> > + */
> > +static inline bool __is_movable_balloon_page(struct page *page)
> > +{
> > + /*
> > + * we might attempt to read ->mapping concurrently to other
> > + * threads trying to write to it.
> > + */
> > + struct address_space *mapping = ACCESS_ONCE(page->mapping);
> > + smp_read_barrier_depends();
> > + return mapping_balloon(mapping);
> > +}
> > +
>
> What happens if this race occurs? I assume it's a racy check before you
> isolate the balloon in which case the barrier may be overkill.
>

You're 100% right. If, by any chance, we stumble across a balloon page
transitioning to non-balloon page (to be released by the driver), while scanning
pages for isolation, the racy checks at balloon_isolate_page() will catch that up
and properly sort the situation out.

> > +/*
> > + * balloon_page_movable - test page->mapping->flags to identify balloon pages
> > + * that can be moved by compaction/migration.
> > + *
> > + * This function is used at core compaction's page isolation scheme, therefore
> > + * most pages exposed to it are not enlisted as balloon pages and so, to avoid
> > + * undesired side effects like racing against __free_pages(), we cannot afford
> > + * holding the page locked while testing page->mapping->flags here.
> > + *
> > + * As we might return false positives in the case of a balloon page being just
> > + * released under us, the page->mapping->flags need to be re-tested later,
> > + * under the proper page lock, at the functions that will be coping with the
> > + * balloon page case.
> > + */
> > +static inline bool balloon_page_movable(struct page *page)
> > +{


> > +# support for memory balloon compaction
> > +config BALLOON_COMPACTION
> > + bool "Allow for balloon memory compaction/migration"
> > + select COMPACTION
> > + depends on VIRTIO_BALLOON
> > + help
> > + Memory fragmentation introduced by ballooning might reduce
> > + significantly the number of 2MB contiguous memory blocks that can be
> > + used within a guest, thus imposing performance penalties associated
> > + with the reduced number of transparent huge pages that could be used
> > + by the guest workload. Allowing the compaction & migration for memory
> > + pages enlisted as being part of memory balloon devices avoids the
> > + scenario aforementioned and helps improving memory defragmentation.
> > +
>
> Rather than select COMPACTION, should it depend on it? Similarly as THP
> is the primary motivation, would it make more sense to depend on
> TRANSPARENT_HUGEPAGE?
>
> Should it default y? It seems useful, why would someone support
> VIRTIO_BALLOON and *not* use this?
>
Good catch, will change it.

>
> Ok, I did not spot any obvious problems in this. The barriers were the
> big issue for me really - they seem overkill. I think we've discussed
> this already but even though it was recent I cannot remember the
> conclusion. In a sense, it doesn't matter because it should have been
> described in the code anyway.
>
> If you get the barrier issue sorted out then feel free to add
>
> Acked-by: Mel Gorman <[email protected]>
>

I believe we can drop the barriers stuff, as the locking scheme is now provinding
enough protection against collisions between isolation page scanning and
balloon_leak() page release (the major concern that has lead to the barriers
originally)

I'll refactor this patch with no barriers and ensure a better commentary on the
aforementioned locking scheme and resubmit, if it's OK to everyone

-- Rafael

2012-11-09 15:26:19

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 7/7] mm: add vm event counters for balloon pages compaction

On Fri, Nov 09, 2012 at 12:20:33PM +0000, Mel Gorman wrote:
> On Wed, Nov 07, 2012 at 01:05:54AM -0200, Rafael Aquini wrote:
> > This patch introduces a new set of vm event counters to keep track of
> > ballooned pages compaction activity.
> >
> > Signed-off-by: Rafael Aquini <[email protected]>
>
> Other than confirming the thing actually works can any meaningful
> conclusions be drawn from this counters?
>
> I know I have been inconsistent on this myself in the past but recently
> I've been taking the attitude that the counters can be used to fit into
> some other metric. I'm looking to change the compaction counters to be
> able to build a basic cost model for example. The same idea could be
> used for balloons of course but it's a less critical path than
> compaction for THP for example.
>
> Assuming it builds and all the defines are correct when the feature is
> not configured (I didn't check) then there is nothing wrong with the
> patch. However, if it was dropped would it make life very hard or would
> you notice?
>

Originally, I proposed this patch as droppable (and it's still droppable)
because its major purpose was solely to show the thing working consistently

OTOH, it might make the life easier to spot breakages if it remains with the
merged bits, and per a reviewer request I removed its 'DROP BEFORE MERGE'
disclaimer.

https://lkml.org/lkml/2012/8/8/616

-- Rafael

2012-11-09 16:23:39

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] mm: introduce a common interface for balloon pages mobility

On Fri, Nov 09, 2012 at 12:53:22PM -0200, Rafael Aquini wrote:
> > <SNIP>
> > If you get the barrier issue sorted out then feel free to add
> >
> > Acked-by: Mel Gorman <[email protected]>
> >
>
> I believe we can drop the barriers stuff, as the locking scheme is now provinding
> enough protection against collisions between isolation page scanning and
> balloon_leak() page release (the major concern that has lead to the barriers
> originally)
>
> I'll refactor this patch with no barriers and ensure a better commentary on the
> aforementioned locking scheme and resubmit, if it's OK to everyone
>

Sounds good to me. When they are dropped again feel free to stick my ack
on for the compaction and migration parts. The virtio aspects are up to
someone else :)

--
Mel Gorman
SUSE Labs

2012-11-09 17:59:38

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 3/7] mm: introduce a common interface for balloon pages mobility

On Fri, Nov 09, 2012 at 04:23:27PM +0000, Mel Gorman wrote:
> On Fri, Nov 09, 2012 at 12:53:22PM -0200, Rafael Aquini wrote:
> > > <SNIP>
> > > If you get the barrier issue sorted out then feel free to add
> > >
> > > Acked-by: Mel Gorman <[email protected]>
> > >
> >
> > I believe we can drop the barriers stuff, as the locking scheme is now provinding
> > enough protection against collisions between isolation page scanning and
> > balloon_leak() page release (the major concern that has lead to the barriers
> > originally)
> >
> > I'll refactor this patch with no barriers and ensure a better commentary on the
> > aforementioned locking scheme and resubmit, if it's OK to everyone
> >
>
> Sounds good to me. When they are dropped again feel free to stick my ack
> on for the compaction and migration parts. The virtio aspects are up to
> someone else :)
>

Andrew,

If we get no further objections raised on dropping those barriers, would you
like me to resubmit the whole series rebased on the latest -next,
or just this (new) refactored patch?

-- Rafael

2012-11-10 15:51:19

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Nov 07, 2012 at 04:11:46PM -0800, Andrew Morton wrote:
> On Thu, 08 Nov 2012 09:32:18 +1030
> Rusty Russell <[email protected]> wrote:
>
> > Rafael Aquini <[email protected]> writes:
> > > + * virtballoon_migratepage - perform the balloon page migration on behalf of
> > > + * a compation thread. (called under page lock)
> >
> > > + if (!mutex_trylock(&vb->balloon_lock))
> > > + return -EAGAIN;
> >
> > Erk, OK...
>
> Not really. As is almost always the case with a trylock, it needs a
> comment explaining why we couldn't use the far superior mutex_lock().
> Data: this reader doesn't know!
>
> > > + /* balloon's page migration 1st step -- inflate "newpage" */
> > > + spin_lock_irqsave(&vb_dev_info->pages_lock, flags);
> > > + balloon_page_insert(newpage, mapping, &vb_dev_info->pages);
> > > + vb_dev_info->isolated_pages--;
> > > + spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
> > > + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> > > + set_page_pfns(vb->pfns, newpage);
> > > + tell_host(vb, vb->inflate_vq);
> >
> > tell_host does wait_event(), so you can't call it under the page_lock.
> > Right?
>
> Sleeping inside lock_page() is OK. More problematic is that GFP_KERNEL
> allocation.

Do you mean this one:
if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
?

In practice it never triggers an allocation, we can pass in
GFP_ATOMIC just as well.

--
MST

2012-11-10 15:51:42

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Nov 07, 2012 at 01:05:52AM -0200, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
>
> Besides making balloon pages movable at allocation time and introducing
> the necessary primitives to perform balloon page migration/compaction,
> this patch also introduces the following locking scheme, in order to
> enhance the syncronization methods for accessing elements of struct
> virtio_balloon, thus providing protection against concurrent access
> introduced by parallel memory migration threads.
>
> - balloon_lock (mutex) : synchronizes the access demand to elements of
> struct virtio_balloon and its queue operations;
>
> Signed-off-by: Rafael Aquini <[email protected]>


Acked-by: Michael S. Tsirkin <[email protected]>


> ---
> drivers/virtio/virtio_balloon.c | 135 ++++++++++++++++++++++++++++++++++++----
> 1 file changed, 123 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 0908e60..69eede7 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -27,6 +27,7 @@
> #include <linux/delay.h>
> #include <linux/slab.h>
> #include <linux/module.h>
> +#include <linux/balloon_compaction.h>
>
> /*
> * Balloon device works in 4K page units. So each page is pointed to by
> @@ -34,6 +35,7 @@
> * page units.
> */
> #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
> +#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
>
> struct virtio_balloon
> {
> @@ -52,15 +54,19 @@ struct virtio_balloon
> /* Number of balloon pages we've told the Host we're not using. */
> unsigned int num_pages;
> /*
> - * The pages we've told the Host we're not using.
> + * The pages we've told the Host we're not using are enqueued
> + * at vb_dev_info->pages list.
> * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
> * to num_pages above.
> */
> - struct list_head pages;
> + struct balloon_dev_info *vb_dev_info;
> +
> + /* Synchronize access/update to this struct virtio_balloon elements */
> + struct mutex balloon_lock;
>
> /* The array of pfns we tell the Host about. */
> unsigned int num_pfns;
> - u32 pfns[256];
> + u32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
>
> /* Memory statistics */
> int need_stats_update;
> @@ -122,18 +128,25 @@ static void set_page_pfns(u32 pfns[], struct page *page)
>
> static void fill_balloon(struct virtio_balloon *vb, size_t num)
> {
> + struct balloon_dev_info *vb_dev_info = vb->vb_dev_info;
> +
> + static DEFINE_RATELIMIT_STATE(fill_balloon_rs,
> + DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> +
> /* We can only do one array worth at a time. */
> num = min(num, ARRAY_SIZE(vb->pfns));
>
> + mutex_lock(&vb->balloon_lock);
> for (vb->num_pfns = 0; vb->num_pfns < num;
> vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
> - __GFP_NOMEMALLOC | __GFP_NOWARN);
> + struct page *page = balloon_page_enqueue(vb_dev_info);
> +
> if (!page) {
> - if (printk_ratelimit())
> + if (__ratelimit(&fill_balloon_rs))
> dev_printk(KERN_INFO, &vb->vdev->dev,
> "Out of puff! Can't get %zu pages\n",
> - num);
> + VIRTIO_BALLOON_PAGES_PER_PAGE);
> /* Sleep for at least 1/5 of a second before retry. */
> msleep(200);
> break;
> @@ -141,7 +154,6 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> set_page_pfns(vb->pfns + vb->num_pfns, page);
> vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> totalram_pages--;
> - list_add(&page->lru, &vb->pages);
> }
>
> /* Didn't get any? Oh well. */
> @@ -149,6 +161,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> return;
>
> tell_host(vb, vb->inflate_vq);
> + mutex_unlock(&vb->balloon_lock);
> }
>
> static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> @@ -165,14 +178,17 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> static void leak_balloon(struct virtio_balloon *vb, size_t num)
> {
> struct page *page;
> + struct balloon_dev_info *vb_dev_info = vb->vb_dev_info;
>
> /* We can only do one array worth at a time. */
> num = min(num, ARRAY_SIZE(vb->pfns));
>
> + mutex_lock(&vb->balloon_lock);
> for (vb->num_pfns = 0; vb->num_pfns < num;
> vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> - page = list_first_entry(&vb->pages, struct page, lru);
> - list_del(&page->lru);
> + page = balloon_page_dequeue(vb_dev_info);
> + if (!page)
> + break;
> set_page_pfns(vb->pfns + vb->num_pfns, page);
> vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
> }
> @@ -183,6 +199,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
> * is true, we *have* to do it in this order
> */
> tell_host(vb, vb->deflate_vq);
> + mutex_unlock(&vb->balloon_lock);
> release_pages_by_pfn(vb->pfns, vb->num_pfns);
> }
>
> @@ -339,9 +356,76 @@ static int init_vqs(struct virtio_balloon *vb)
> return 0;
> }
>
> +static const struct address_space_operations virtio_balloon_aops;
> +#ifdef CONFIG_BALLOON_COMPACTION
> +/*
> + * virtballoon_migratepage - perform the balloon page migration on behalf of
> + * a compation thread. (called under page lock)
> + * @mapping: the page->mapping which will be assigned to the new migrated page.
> + * @newpage: page that will replace the isolated page after migration finishes.
> + * @page : the isolated (old) page that is about to be migrated to newpage.
> + * @mode : compaction mode -- not used for balloon page migration.
> + *
> + * After a ballooned page gets isolated by compaction procedures, this is the
> + * function that performs the page migration on behalf of a compaction thread
> + * The page migration for virtio balloon is done in a simple swap fashion which
> + * follows these two macro steps:
> + * 1) insert newpage into vb->pages list and update the host about it;
> + * 2) update the host about the old page removed from vb->pages list;
> + *
> + * This function preforms the balloon page migration task.
> + * Called through balloon_mapping->a_ops->migratepage
> + */
> +int virtballoon_migratepage(struct address_space *mapping,
> + struct page *newpage, struct page *page, enum migrate_mode mode)
> +{
> + struct balloon_dev_info *vb_dev_info = balloon_page_device(page);
> + struct virtio_balloon *vb;
> + unsigned long flags;
> +
> + BUG_ON(!vb_dev_info);
> +
> + vb = vb_dev_info->balloon_device;
> +
> + if (!mutex_trylock(&vb->balloon_lock))
> + return -EAGAIN;
> +
> + /* balloon's page migration 1st step -- inflate "newpage" */
> + spin_lock_irqsave(&vb_dev_info->pages_lock, flags);
> + balloon_page_insert(newpage, mapping, &vb_dev_info->pages);
> + vb_dev_info->isolated_pages--;
> + spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
> + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> + set_page_pfns(vb->pfns, newpage);
> + tell_host(vb, vb->inflate_vq);
> +
> + /*
> + * balloon's page migration 2nd step -- deflate "page"
> + *
> + * It's safe to delete page->lru here because this page is at
> + * an isolated migration list, and this step is expected to happen here
> + */
> + balloon_page_delete(page);
> + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> + set_page_pfns(vb->pfns, page);
> + tell_host(vb, vb->deflate_vq);
> +
> + mutex_unlock(&vb->balloon_lock);
> +
> + return MIGRATEPAGE_BALLOON_SUCCESS;
> +}
> +
> +/* define the balloon_mapping->a_ops callback to allow balloon page migration */
> +static const struct address_space_operations virtio_balloon_aops = {
> + .migratepage = virtballoon_migratepage,
> +};
> +#endif /* CONFIG_BALLOON_COMPACTION */
> +
> static int virtballoon_probe(struct virtio_device *vdev)
> {
> struct virtio_balloon *vb;
> + struct address_space *vb_mapping;
> + struct balloon_dev_info *vb_devinfo;
> int err;
>
> vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
> @@ -350,16 +434,37 @@ static int virtballoon_probe(struct virtio_device *vdev)
> goto out;
> }
>
> - INIT_LIST_HEAD(&vb->pages);
> vb->num_pages = 0;
> + mutex_init(&vb->balloon_lock);
> init_waitqueue_head(&vb->config_change);
> init_waitqueue_head(&vb->acked);
> vb->vdev = vdev;
> vb->need_stats_update = 0;
>
> + vb_devinfo = balloon_devinfo_alloc(vb);
> + if (IS_ERR(vb_devinfo)) {
> + err = PTR_ERR(vb_devinfo);
> + goto out_free_vb;
> + }
> +
> + vb_mapping = balloon_mapping_alloc(vb_devinfo,
> + (balloon_compaction_check()) ?
> + &virtio_balloon_aops : NULL);
> + if (IS_ERR(vb_mapping)) {
> + /*
> + * IS_ERR(vb_mapping) && PTR_ERR(vb_mapping) == -EOPNOTSUPP
> + * This means !CONFIG_BALLOON_COMPACTION, otherwise we get off.
> + */
> + err = PTR_ERR(vb_mapping);
> + if (err != -EOPNOTSUPP)
> + goto out_free_vb_devinfo;
> + }
> +
> + vb->vb_dev_info = vb_devinfo;
> +
> err = init_vqs(vb);
> if (err)
> - goto out_free_vb;
> + goto out_free_vb_mapping;
>
> vb->thread = kthread_run(balloon, vb, "vballoon");
> if (IS_ERR(vb->thread)) {
> @@ -371,6 +476,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
>
> out_del_vqs:
> vdev->config->del_vqs(vdev);
> +out_free_vb_mapping:
> + balloon_mapping_free(vb_mapping);
> +out_free_vb_devinfo:
> + balloon_devinfo_free(vb_devinfo);
> out_free_vb:
> kfree(vb);
> out:
> @@ -396,6 +505,8 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
>
> kthread_stop(vb->thread);
> remove_common(vb);
> + balloon_mapping_free(vb->vb_dev_info->mapping);
> + balloon_devinfo_free(vb->vb_dev_info);
> kfree(vb);
> }
>
> --
> 1.7.11.7

2012-11-10 15:53:17

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v11 7/7] mm: add vm event counters for balloon pages compaction

On Wed, Nov 07, 2012 at 01:05:54AM -0200, Rafael Aquini wrote:
> This patch introduces a new set of vm event counters to keep track of
> ballooned pages compaction activity.
>
> Signed-off-by: Rafael Aquini <[email protected]>
> ---
> drivers/virtio/virtio_balloon.c | 1 +
> include/linux/vm_event_item.h | 8 +++++++-
> mm/balloon_compaction.c | 2 ++
> mm/migrate.c | 1 +
> mm/vmstat.c | 10 +++++++++-
> 5 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 69eede7..3756fc1 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -411,6 +411,7 @@ int virtballoon_migratepage(struct address_space *mapping,
> tell_host(vb, vb->deflate_vq);
>
> mutex_unlock(&vb->balloon_lock);
> + balloon_event_count(COMPACTBALLOONMIGRATED);
>
> return MIGRATEPAGE_BALLOON_SUCCESS;
> }

Looks like any ballon would need to do this.
Can this chunk go into caller instead?

> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 3d31145..cbd72fc 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -41,7 +41,13 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> #ifdef CONFIG_COMPACTION
> COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
> COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
> -#endif
> +#ifdef CONFIG_BALLOON_COMPACTION
> + COMPACTBALLOONISOLATED, /* isolated from balloon pagelist */
> + COMPACTBALLOONMIGRATED, /* balloon page sucessfully migrated */
> + COMPACTBALLOONRELEASED, /* old-page released after migration */
> + COMPACTBALLOONRETURNED, /* putback to pagelist, not-migrated */
> +#endif /* CONFIG_BALLOON_COMPACTION */
> +#endif /* CONFIG_COMPACTION */
> #ifdef CONFIG_HUGETLB_PAGE
> HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
> #endif
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index 90935aa..32927eb 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -215,6 +215,7 @@ bool balloon_page_isolate(struct page *page)
> if (__is_movable_balloon_page(page) &&
> page_count(page) == 2) {
> __isolate_balloon_page(page);
> + balloon_event_count(COMPACTBALLOONISOLATED);
> unlock_page(page);
> return true;
> }
> @@ -237,6 +238,7 @@ void balloon_page_putback(struct page *page)
> if (__is_movable_balloon_page(page)) {
> __putback_balloon_page(page);
> put_page(page);
> + balloon_event_count(COMPACTBALLOONRETURNED);
> } else {
> __WARN();
> dump_page(page);
> diff --git a/mm/migrate.c b/mm/migrate.c
> index adb3d44..ee3037d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -896,6 +896,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
> page_is_file_cache(page));
> put_page(page);
> __free_page(page);
> + balloon_event_count(COMPACTBALLOONRELEASED);
> return 0;
> }
> out:
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index c737057..1363edc 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -781,7 +781,15 @@ const char * const vmstat_text[] = {
> "compact_stall",
> "compact_fail",
> "compact_success",
> -#endif
> +
> +#ifdef CONFIG_BALLOON_COMPACTION
> + "compact_balloon_isolated",
> + "compact_balloon_migrated",
> + "compact_balloon_released",
> + "compact_balloon_returned",
> +#endif /* CONFIG_BALLOON_COMPACTION */
> +
> +#endif /* CONFIG_COMPACTION */
>
> #ifdef CONFIG_HUGETLB_PAGE
> "htlb_buddy_alloc_success",
> --
> 1.7.11.7

2012-11-11 19:22:36

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 7/7] mm: add vm event counters for balloon pages compaction

On Sat, Nov 10, 2012 at 05:55:38PM +0200, Michael S. Tsirkin wrote:
> > mutex_unlock(&vb->balloon_lock);
> > + balloon_event_count(COMPACTBALLOONMIGRATED);
> >
> > return MIGRATEPAGE_BALLOON_SUCCESS;
> > }
>
> Looks like any ballon would need to do this.
> Can this chunk go into caller instead?
>

Good catch. It's done, already (v12 just hit the wild).

Thanks!
-- Rafael

2012-11-12 07:50:13

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v11 5/7] virtio_balloon: introduce migration primitives to balloon pages

Rafael Aquini <[email protected]> writes:

> On Thu, Nov 08, 2012 at 09:32:18AM +1030, Rusty Russell wrote:
>> The first one can be delayed, the second one can be delayed if the host
>> didn't ask for VIRTIO_BALLOON_F_MUST_TELL_HOST (qemu doesn't).
>>
>> We could implement a proper request queue for these, and return -EAGAIN
>> if the queue fills. Though in practice, it's not important (it might
>> help performance).
>
> I liked the idea. Give me the directions to accomplish it and I'll give it a try
> for sure.

OK, let's get this applied first, but here are some pointers:

Here's the current callback function when the host has processed the
buffers we put in the queue:

static void balloon_ack(struct virtqueue *vq)
{
struct virtio_balloon *vb = vq->vdev->priv;

wake_up(&vb->acked);
}

It's almost a noop: here's how we use it to make our queues synchronous:

static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
{
struct scatterlist sg;
unsigned int len;

sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);

/* We should always be able to add one buffer to an empty queue. */
if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
BUG();
virtqueue_kick(vq);

/* When host has read buffer, this completes via balloon_ack */
wait_event(vb->acked, virtqueue_get_buf(vq, &len));
}

And we set up the callback when we create the virtqueue:

vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
...
err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);

So off the top of my head it should be as simple as changing tell_host()
to only wait if the virtqueue_add_buf() fails (ie. queue is full).

Hmm, though you will want to synchronize the inflate and deflate queues:
if we tell the host we're giving a page up we want it to have seen that
before we tell it we're using it again...

Cheers,
Rusty.

2012-11-20 23:29:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v11 7/7] mm: add vm event counters for balloon pages compaction

On Fri, 9 Nov 2012 12:58:29 -0200
Rafael Aquini <[email protected]> wrote:

> On Fri, Nov 09, 2012 at 12:20:33PM +0000, Mel Gorman wrote:
> > On Wed, Nov 07, 2012 at 01:05:54AM -0200, Rafael Aquini wrote:
> > > This patch introduces a new set of vm event counters to keep track of
> > > ballooned pages compaction activity.
> > >
> > > Signed-off-by: Rafael Aquini <[email protected]>
> >
> > Other than confirming the thing actually works can any meaningful
> > conclusions be drawn from this counters?
> >
> > I know I have been inconsistent on this myself in the past but recently
> > I've been taking the attitude that the counters can be used to fit into
> > some other metric. I'm looking to change the compaction counters to be
> > able to build a basic cost model for example. The same idea could be
> > used for balloons of course but it's a less critical path than
> > compaction for THP for example.
> >
> > Assuming it builds and all the defines are correct when the feature is
> > not configured (I didn't check) then there is nothing wrong with the
> > patch. However, if it was dropped would it make life very hard or would
> > you notice?
> >
>
> Originally, I proposed this patch as droppable (and it's still droppable)
> because its major purpose was solely to show the thing working consistently
>
> OTOH, it might make the life easier to spot breakages if it remains with the
> merged bits, and per a reviewer request I removed its 'DROP BEFORE MERGE'
> disclaimer.
>
> https://lkml.org/lkml/2012/8/8/616

There's a lot to be said for not merging things.

I think I'll maintain this as a mm-only patch. That way it's
available in linux-next and we can merge it later if a need arises.

2012-11-20 23:33:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v11 4/7] mm: introduce compaction and migration for ballooned pages

On Fri, 9 Nov 2012 12:16:02 +0000
Mel Gorman <[email protected]> wrote:

> On Wed, Nov 07, 2012 at 01:05:51AM -0200, Rafael Aquini wrote:
> > Memory fragmentation introduced by ballooning might reduce significantly
> > the number of 2MB contiguous memory blocks that can be used within a guest,
> > thus imposing performance penalties associated with the reduced number of
> > transparent huge pages that could be used by the guest workload.
> >
> > This patch introduces the helper functions as well as the necessary changes
> > to teach compaction and migration bits how to cope with pages which are
> > part of a guest memory balloon, in order to make them movable by memory
> > compaction procedures.
> >
>
> ...
>
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -14,6 +14,7 @@
> > #include <linux/backing-dev.h>
> > #include <linux/sysctl.h>
> > #include <linux/sysfs.h>
> > +#include <linux/balloon_compaction.h>
> > #include "internal.h"
> >
> > #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> > @@ -565,9 +566,24 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > goto next_pageblock;
> > }
> >
> > - /* Check may be lockless but that's ok as we recheck later */
> > - if (!PageLRU(page))
> > + /*
> > + * Check may be lockless but that's ok as we recheck later.
> > + * It's possible to migrate LRU pages and balloon pages
> > + * Skip any other type of page
> > + */
> > + if (!PageLRU(page)) {
> > + if (unlikely(balloon_page_movable(page))) {
>
> Because it's lockless, it really seems that the barrier stuck down there
> is unnecessary. At worst you get a temporarily incorrect answer that you
> recheck later under page lock in balloon_page_isolate.

What happened with this?

Also: what barrier?

2012-11-27 11:59:49

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v11 4/7] mm: introduce compaction and migration for ballooned pages

On Tue, Nov 20, 2012 at 03:33:24PM -0800, Andrew Morton wrote:
> On Fri, 9 Nov 2012 12:16:02 +0000
> Mel Gorman <[email protected]> wrote:
>
> > On Wed, Nov 07, 2012 at 01:05:51AM -0200, Rafael Aquini wrote:
> > > Memory fragmentation introduced by ballooning might reduce significantly
> > > the number of 2MB contiguous memory blocks that can be used within a guest,
> > > thus imposing performance penalties associated with the reduced number of
> > > transparent huge pages that could be used by the guest workload.
> > >
> > > This patch introduces the helper functions as well as the necessary changes
> > > to teach compaction and migration bits how to cope with pages which are
> > > part of a guest memory balloon, in order to make them movable by memory
> > > compaction procedures.
> > >
> >
> > ...
> >
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -14,6 +14,7 @@
> > > #include <linux/backing-dev.h>
> > > #include <linux/sysctl.h>
> > > #include <linux/sysfs.h>
> > > +#include <linux/balloon_compaction.h>
> > > #include "internal.h"
> > >
> > > #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> > > @@ -565,9 +566,24 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > > goto next_pageblock;
> > > }
> > >
> > > - /* Check may be lockless but that's ok as we recheck later */
> > > - if (!PageLRU(page))
> > > + /*
> > > + * Check may be lockless but that's ok as we recheck later.
> > > + * It's possible to migrate LRU pages and balloon pages
> > > + * Skip any other type of page
> > > + */
> > > + if (!PageLRU(page)) {
> > > + if (unlikely(balloon_page_movable(page))) {
> >
> > Because it's lockless, it really seems that the barrier stuck down there
> > is unnecessary. At worst you get a temporarily incorrect answer that you
> > recheck later under page lock in balloon_page_isolate.
>

Sorry for the late reply.

> What happened with this?
>
This Mel's concern were addressed by the last submitted review (v12)


> Also: what barrier?

Mel was refering to these barriers, at balloon_compaction.h:
---8<---
+/*
+ * balloon_page_insert - insert a page into the balloon's page list and make
+ * the page->mapping assignment accordingly.
+ * @page : page to be assigned as a 'balloon page'
+ * @mapping : allocated special 'balloon_mapping'
+ * @head : balloon's device page list head
+ */
+static inline void balloon_page_insert(struct page *page,
+ struct address_space *mapping,
+ struct list_head *head)
+{
+ list_add(&page->lru, head);
+ /*
+ * Make sure the page is already inserted on balloon's page list
+ * before assigning its ->mapping.
+ */
+ smp_wmb();
+ page->mapping = mapping;
+}
+
+/*
+ * balloon_page_delete - clear the page->mapping and delete the page from
+ * balloon's page list accordingly.
+ * @page : page to be released from balloon's page list
+ */
+static inline void balloon_page_delete(struct page *page)
+{
+ page->mapping = NULL;
+ /*
+ * Make sure page->mapping is cleared before we proceed with
+ * balloon's page list deletion.
+ */
+ smp_wmb();
+ list_del(&page->lru);
+}
+
+/*
+ * __is_movable_balloon_page - helper to perform @page mapping->flags tests
+ */
+static inline bool __is_movable_balloon_page(struct page *page)
+{
+ /*
+ * we might attempt to read ->mapping concurrently to other
+ * threads trying to write to it.
+ */
+ struct address_space *mapping = ACCESS_ONCE(page->mapping);
+ smp_read_barrier_depends();
+ return mapping_balloon(mapping);
+}
---8<---

The last review got rid of them to stick with Mel's ACK.


Cheers!
--Rafael