2012-08-10 17:55:46

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v7 0/4] make balloon pages movable by compaction

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch-set follows the main idea discussed at 2012 LSFMMS session:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction & migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue

Rafael Aquini (4):
mm: introduce compaction and migration for virtio ballooned pages
virtio_balloon: introduce migration primitives to balloon pages
mm: introduce putback_movable_pages()
mm: add vm event counters for balloon pages compaction

drivers/virtio/virtio_balloon.c | 139 +++++++++++++++++++++++++++++++++++++---
include/linux/migrate.h | 2 +
include/linux/mm.h | 17 +++++
include/linux/virtio_balloon.h | 4 ++
include/linux/vm_event_item.h | 8 ++-
mm/compaction.c | 131 +++++++++++++++++++++++++++++++------
mm/migrate.c | 51 ++++++++++++++-
mm/page_alloc.c | 2 +-
mm/vmstat.c | 10 ++-
9 files changed, 331 insertions(+), 33 deletions(-)

Change log:
v7:
* fix a potential page leak case at 'putback_balloon_page' (Mel);
* adjust vm-events-counter patch and remove its drop-on-merge message (Rik);
* add 'putback_movable_pages' to avoid hacks on 'putback_lru_pages' (Minchan);
v6:
* rename 'is_balloon_page()' to 'movable_balloon_page()' (Rik);
v5:
* address Andrew Morton's review comments on the patch series;
* address a couple extra nitpick suggestions on PATCH 01 (Minchan);
v4:
* address Rusty Russel's review comments on PATCH 02;
* re-base virtio_balloon patch on 9c378abc5c0c6fc8e3acf5968924d274503819b3;
V3:
* address reviewers nitpick suggestions on PATCH 01 (Mel, Minchan);
V2:
* address Mel Gorman's review comments on PATCH 01;


Preliminary test results:
(2 VCPU 2048mB RAM KVM guest running 3.6.0_rc1+ -- after a reboot)

* 64mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 0
compact_balloon_migrated 0
compact_balloon_returned 0
compact_balloon_released 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null
[1] Done echo 1 > /proc/sys/vm/compact_memory
[2] Done echo 1 > /proc/sys/vm/compact_memory
[3] Done echo 1 > /proc/sys/vm/compact_memory
[4] Done echo 1 > /proc/sys/vm/compact_memory
[5]- Done echo 1 > /proc/sys/vm/compact_memory
[6]+ Done echo 1 > /proc/sys/vm/compact_memory
[root@localhost ~]#
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 6579
compact_pages_moved 50114
compact_pagemigrate_failed 111
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 18361
compact_balloon_migrated 18306
compact_balloon_returned 55
compact_balloon_released 18306


* 128 mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 0
compact_balloon_migrated 0
compact_balloon_returned 0
compact_balloon_released 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null
[1] Done echo 1 > /proc/sys/vm/compact_memory
[2] Done echo 1 > /proc/sys/vm/compact_memory
[3] Done echo 1 > /proc/sys/vm/compact_memory
[4] Done echo 1 > /proc/sys/vm/compact_memory
[5]- Done echo 1 > /proc/sys/vm/compact_memory
[6]+ Done echo 1 > /proc/sys/vm/compact_memory
[root@localhost ~]#
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 6789
compact_pages_moved 64479
compact_pagemigrate_failed 127
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 33937
compact_balloon_migrated 33869
compact_balloon_returned 68
compact_balloon_released 33869

--
1.7.11.2


2012-08-10 17:55:52

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v7 4/4] mm: add vm event counters for balloon pages compaction

This patch introduces a new set of vm event counters to keep track of
ballooned pages compaction activity.

Signed-off-by: Rafael Aquini <[email protected]>
---
drivers/virtio/virtio_balloon.c | 1 +
include/linux/vm_event_item.h | 8 +++++++-
mm/compaction.c | 2 ++
mm/migrate.c | 1 +
mm/vmstat.c | 10 +++++++++-
5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7c937a0..b8f7ea5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -414,6 +414,7 @@ int virtballoon_migratepage(struct address_space *mapping,

mutex_unlock(&balloon_lock);

+ count_vm_event(COMPACTBALLOONMIGRATED);
return 0;
}

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 57f7b10..b1841a2 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,7 +41,13 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
-#endif
+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+ COMPACTBALLOONISOLATED, /* isolated from balloon pagelist */
+ COMPACTBALLOONMIGRATED, /* balloon page sucessfully migrated */
+ COMPACTBALLOONRETURNED, /* putback to pagelist, not-migrated */
+ COMPACTBALLOONRELEASED, /* old-page released after migration */
+#endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */
+#endif /* CONFIG_COMPACTION */
#ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
#endif
diff --git a/mm/compaction.c b/mm/compaction.c
index 8567bb8..ff0f9ac 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -77,6 +77,7 @@ bool isolate_balloon_page(struct page *page)
(page_count(page) == 2)) {
__isolate_balloon_page(page);
unlock_page(page);
+ count_vm_event(COMPACTBALLOONISOLATED);
return true;
}
unlock_page(page);
@@ -97,6 +98,7 @@ void putback_balloon_page(struct page *page)
__putback_balloon_page(page);
put_page(page);
unlock_page(page);
+ count_vm_event(COMPACTBALLOONRETURNED);
}
#endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */

diff --git a/mm/migrate.c b/mm/migrate.c
index 1165134..024566f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -892,6 +892,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
page_is_file_cache(page));
put_page(page);
__free_page(page);
+ count_vm_event(COMPACTBALLOONRELEASED);
return rc;
}
out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index df7a674..ad5c4f1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -768,7 +768,15 @@ const char * const vmstat_text[] = {
"compact_stall",
"compact_fail",
"compact_success",
-#endif
+
+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+ "compact_balloon_isolated",
+ "compact_balloon_migrated",
+ "compact_balloon_returned",
+ "compact_balloon_released",
+#endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */
+
+#endif /* CONFIG_COMPACTION */

#ifdef CONFIG_HUGETLB_PAGE
"htlb_buddy_alloc_success",
--
1.7.11.2

2012-08-10 17:56:17

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme to provide the
proper synchronization and protection for struct virtio_balloon elements
against concurrent accesses due to parallel operations introduced by
memory compaction / page migration.
- balloon_lock (mutex) : synchronizes the access demand to elements of
struct virtio_balloon and its queue operations;
- pages_lock (spinlock): special protection to balloon pages list against
concurrent list handling operations;

Signed-off-by: Rafael Aquini <[email protected]>
---
drivers/virtio/virtio_balloon.c | 138 +++++++++++++++++++++++++++++++++++++---
include/linux/virtio_balloon.h | 4 ++
2 files changed, 134 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..7c937a0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,7 @@
#include <linux/delay.h>
#include <linux/slab.h>
#include <linux/module.h>
+#include <linux/fs.h>

/*
* Balloon device works in 4K page units. So each page is pointed to by
@@ -35,6 +36,12 @@
*/
#define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)

+/* Synchronizes accesses/updates to the struct virtio_balloon elements */
+DEFINE_MUTEX(balloon_lock);
+
+/* Protects 'virtio_balloon->pages' list against concurrent handling */
+DEFINE_SPINLOCK(pages_lock);
+
struct virtio_balloon
{
struct virtio_device *vdev;
@@ -51,6 +58,7 @@ struct virtio_balloon

/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
+
/*
* The pages we've told the Host we're not using.
* Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
@@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));

+ mutex_lock(&balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
- __GFP_NOMEMALLOC | __GFP_NOWARN);
+ struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
+ __GFP_NORETRY | __GFP_NOWARN |
+ __GFP_NOMEMALLOC);
if (!page) {
if (printk_ratelimit())
dev_printk(KERN_INFO, &vb->vdev->dev,
@@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
+ spin_lock(&pages_lock);
list_add(&page->lru, &vb->pages);
+ page->mapping = balloon_mapping;
+ spin_unlock(&pages_lock);
}

/* Didn't get any? Oh well. */
@@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
return;

tell_host(vb, vb->inflate_vq);
+ mutex_unlock(&balloon_lock);
}

static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));

+ mutex_lock(&balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+ /*
+ * We can race against virtballoon_isolatepage() and end up
+ * stumbling across a _temporarily_ empty 'pages' list.
+ */
+ spin_lock(&pages_lock);
+ if (unlikely(list_empty(&vb->pages))) {
+ spin_unlock(&pages_lock);
+ break;
+ }
page = list_first_entry(&vb->pages, struct page, lru);
+ page->mapping = NULL;
list_del(&page->lru);
+ spin_unlock(&pages_lock);
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
@@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
* virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
* is true, we *have* to do it in this order
*/
- tell_host(vb, vb->deflate_vq);
- release_pages_by_pfn(vb->pfns, vb->num_pfns);
+ if (vb->num_pfns > 0) {
+ tell_host(vb, vb->deflate_vq);
+ release_pages_by_pfn(vb->pfns, vb->num_pfns);
+ }
+ mutex_unlock(&balloon_lock);
}

static inline void update_stat(struct virtio_balloon *vb, int idx,
@@ -239,6 +268,7 @@ static void stats_handle_request(struct virtio_balloon *vb)
struct scatterlist sg;
unsigned int len;

+ mutex_lock(&balloon_lock);
vb->need_stats_update = 0;
update_balloon_stats(vb);

@@ -249,6 +279,7 @@ static void stats_handle_request(struct virtio_balloon *vb)
if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
BUG();
virtqueue_kick(vq);
+ mutex_unlock(&balloon_lock);
}

static void virtballoon_changed(struct virtio_device *vdev)
@@ -261,22 +292,27 @@ static void virtballoon_changed(struct virtio_device *vdev)
static inline s64 towards_target(struct virtio_balloon *vb)
{
__le32 v;
- s64 target;
+ s64 target, actual;

+ mutex_lock(&balloon_lock);
+ actual = vb->num_pages;
vb->vdev->config->get(vb->vdev,
offsetof(struct virtio_balloon_config, num_pages),
&v, sizeof(v));
target = le32_to_cpu(v);
- return target - vb->num_pages;
+ mutex_unlock(&balloon_lock);
+ return target - actual;
}

static void update_balloon_size(struct virtio_balloon *vb)
{
- __le32 actual = cpu_to_le32(vb->num_pages);
-
+ __le32 actual;
+ mutex_lock(&balloon_lock);
+ actual = cpu_to_le32(vb->num_pages);
vb->vdev->config->set(vb->vdev,
offsetof(struct virtio_balloon_config, actual),
&actual, sizeof(actual));
+ mutex_unlock(&balloon_lock);
}

static int balloon(void *_vballoon)
@@ -339,6 +375,76 @@ static int init_vqs(struct virtio_balloon *vb)
return 0;
}

+/*
+ * '*vb_ptr' allows virtballoon_migratepage() & virtballoon_putbackpage() to
+ * access pertinent elements from struct virtio_balloon
+ */
+struct virtio_balloon *vb_ptr;
+
+/*
+ * Populate balloon_mapping->a_ops->migratepage method to perform the balloon
+ * page migration task.
+ *
+ * After a ballooned page gets isolated by compaction procedures, this is the
+ * function that performs the page migration on behalf of move_to_new_page(),
+ * when the last calls (page)->mapping->a_ops->migratepage.
+ *
+ * Page migration for virtio balloon is done in a simple swap fashion which
+ * follows these two steps:
+ * 1) insert newpage into vb->pages list and update the host about it;
+ * 2) update the host about the removed old page from vb->pages list;
+ */
+int virtballoon_migratepage(struct address_space *mapping,
+ struct page *newpage, struct page *page, enum migrate_mode mode)
+{
+ mutex_lock(&balloon_lock);
+
+ /* balloon's page migration 1st step */
+ vb_ptr->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+ spin_lock(&pages_lock);
+ list_add(&newpage->lru, &vb_ptr->pages);
+ spin_unlock(&pages_lock);
+ set_page_pfns(vb_ptr->pfns, newpage);
+ tell_host(vb_ptr, vb_ptr->inflate_vq);
+
+ /* balloon's page migration 2nd step */
+ vb_ptr->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+ set_page_pfns(vb_ptr->pfns, page);
+ tell_host(vb_ptr, vb_ptr->deflate_vq);
+
+ mutex_unlock(&balloon_lock);
+
+ return 0;
+}
+
+/*
+ * Populate balloon_mapping->a_ops->invalidatepage method to help compaction on
+ * isolating a page from the balloon page list.
+ */
+void virtballoon_isolatepage(struct page *page, unsigned long mode)
+{
+ spin_lock(&pages_lock);
+ list_del(&page->lru);
+ spin_unlock(&pages_lock);
+}
+
+/*
+ * Populate balloon_mapping->a_ops->freepage method to help compaction on
+ * re-inserting an isolated page into the balloon page list.
+ */
+void virtballoon_putbackpage(struct page *page)
+{
+ spin_lock(&pages_lock);
+ list_add(&page->lru, &vb_ptr->pages);
+ spin_unlock(&pages_lock);
+}
+
+static const struct address_space_operations virtio_balloon_aops = {
+ .migratepage = virtballoon_migratepage,
+ .invalidatepage = virtballoon_isolatepage,
+ .freepage = virtballoon_putbackpage,
+};
+
static int virtballoon_probe(struct virtio_device *vdev)
{
struct virtio_balloon *vb;
@@ -351,11 +457,25 @@ static int virtballoon_probe(struct virtio_device *vdev)
}

INIT_LIST_HEAD(&vb->pages);
+
vb->num_pages = 0;
init_waitqueue_head(&vb->config_change);
init_waitqueue_head(&vb->acked);
vb->vdev = vdev;
vb->need_stats_update = 0;
+ vb_ptr = vb;
+
+ /* Init the ballooned page->mapping special balloon_mapping */
+ balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
+ if (!balloon_mapping) {
+ err = -ENOMEM;
+ goto out_free_vb;
+ }
+
+ INIT_RADIX_TREE(&balloon_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);
+ INIT_LIST_HEAD(&balloon_mapping->i_mmap_nonlinear);
+ spin_lock_init(&balloon_mapping->tree_lock);
+ balloon_mapping->a_ops = &virtio_balloon_aops;

err = init_vqs(vb);
if (err)
@@ -373,6 +493,7 @@ out_del_vqs:
vdev->config->del_vqs(vdev);
out_free_vb:
kfree(vb);
+ kfree(balloon_mapping);
out:
return err;
}
@@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
kthread_stop(vb->thread);
remove_common(vb);
kfree(vb);
+ kfree(balloon_mapping);
}

#ifdef CONFIG_PM
diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
index 652dc8b..930f1b7 100644
--- a/include/linux/virtio_balloon.h
+++ b/include/linux/virtio_balloon.h
@@ -56,4 +56,8 @@ struct virtio_balloon_stat {
u64 val;
} __attribute__((packed));

+#if !defined(CONFIG_COMPACTION)
+struct address_space *balloon_mapping;
+#endif
+
#endif /* _LINUX_VIRTIO_BALLOON_H */
--
1.7.11.2

2012-08-10 17:56:22

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.

Signed-off-by: Rafael Aquini <[email protected]>
---
include/linux/mm.h | 17 ++++++++
mm/compaction.c | 125 +++++++++++++++++++++++++++++++++++++++++++++--------
mm/migrate.c | 30 ++++++++++++-
3 files changed, 152 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..56cc553 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1662,5 +1662,22 @@ static inline unsigned int debug_guardpage_minorder(void) { return 0; }
static inline bool page_is_guard(struct page *page) { return false; }
#endif /* CONFIG_DEBUG_PAGEALLOC */

+#if (defined(CONFIG_VIRTIO_BALLOON) || \
+ defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION)
+extern bool isolate_balloon_page(struct page *);
+extern void putback_balloon_page(struct page *);
+extern struct address_space *balloon_mapping;
+
+static inline bool movable_balloon_page(struct page *page)
+{
+ return (page->mapping && page->mapping == balloon_mapping);
+}
+
+#else
+static inline bool isolate_balloon_page(struct page *page) { return false; }
+static inline void putback_balloon_page(struct page *page) { return false; }
+static inline bool movable_balloon_page(struct page *page) { return false; }
+#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index e78cb96..e4e871b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,6 +14,7 @@
#include <linux/backing-dev.h>
#include <linux/sysctl.h>
#include <linux/sysfs.h>
+#include <linux/export.h>
#include "internal.h"

#if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -21,6 +22,84 @@
#define CREATE_TRACE_POINTS
#include <trace/events/compaction.h>

+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+/*
+ * Balloon pages special page->mapping.
+ * Users must properly allocate and initialize an instance of balloon_mapping,
+ * and set it as the page->mapping for balloon enlisted page instances.
+ * There is no need on utilizing struct address_space locking schemes for
+ * balloon_mapping as, once it gets initialized at balloon driver, it will
+ * remain just like a static reference that helps us on identifying a guest
+ * ballooned page by its mapping, as well as it will keep the 'a_ops' callback
+ * pointers to the functions that will execute the balloon page mobility tasks.
+ *
+ * address_space_operations necessary methods for ballooned pages:
+ * .migratepage - used to perform balloon's page migration (as is)
+ * .invalidatepage - used to isolate a page from balloon's page list
+ * .freepage - used to reinsert an isolated page to balloon's page list
+ */
+struct address_space *balloon_mapping;
+EXPORT_SYMBOL_GPL(balloon_mapping);
+
+static inline void __isolate_balloon_page(struct page *page)
+{
+ page->mapping->a_ops->invalidatepage(page, 0);
+}
+
+static inline void __putback_balloon_page(struct page *page)
+{
+ page->mapping->a_ops->freepage(page);
+}
+
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool isolate_balloon_page(struct page *page)
+{
+ if (WARN_ON(!movable_balloon_page(page)))
+ return false;
+
+ if (likely(get_page_unless_zero(page))) {
+ /*
+ * As balloon pages are not isolated from LRU lists, concurrent
+ * compaction threads can race against page migration functions
+ * move_to_new_page() & __unmap_and_move().
+ * In order to avoid having an already isolated balloon page
+ * being (wrongly) re-isolated while it is under migration,
+ * lets be sure we have the page lock before proceeding with
+ * the balloon page isolation steps.
+ */
+ if (likely(trylock_page(page))) {
+ /*
+ * A ballooned page, by default, has just one refcount.
+ * Prevent concurrent compaction threads from isolating
+ * an already isolated balloon page.
+ */
+ if (movable_balloon_page(page) &&
+ (page_count(page) == 2)) {
+ __isolate_balloon_page(page);
+ unlock_page(page);
+ return true;
+ }
+ unlock_page(page);
+ }
+ /* Drop refcount taken for this already isolated page */
+ put_page(page);
+ }
+ return false;
+}
+
+/* putback_lru_page() counterpart for a ballooned page */
+void putback_balloon_page(struct page *page)
+{
+ if (WARN_ON(!movable_balloon_page(page)))
+ return;
+
+ lock_page(page);
+ __putback_balloon_page(page);
+ put_page(page);
+ unlock_page(page);
+}
+#endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */
+
static unsigned long release_freepages(struct list_head *freelist)
{
struct page *page, *next;
@@ -312,32 +391,40 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
continue;
}

- if (!PageLRU(page))
- continue;
-
/*
- * PageLRU is set, and lru_lock excludes isolation,
- * splitting and collapsing (collapsing has already
- * happened if PageLRU is set).
+ * It is possible to migrate LRU pages and balloon pages.
+ * Skip any other type of page.
*/
- if (PageTransHuge(page)) {
- low_pfn += (1 << compound_order(page)) - 1;
- continue;
- }
+ if (PageLRU(page)) {
+ /*
+ * PageLRU is set, and lru_lock excludes isolation,
+ * splitting and collapsing (collapsing has already
+ * happened if PageLRU is set).
+ */
+ if (PageTransHuge(page)) {
+ low_pfn += (1 << compound_order(page)) - 1;
+ continue;
+ }

- if (!cc->sync)
- mode |= ISOLATE_ASYNC_MIGRATE;
+ if (!cc->sync)
+ mode |= ISOLATE_ASYNC_MIGRATE;

- lruvec = mem_cgroup_page_lruvec(page, zone);
+ lruvec = mem_cgroup_page_lruvec(page, zone);

- /* Try isolate the page */
- if (__isolate_lru_page(page, mode) != 0)
- continue;
+ /* Try isolate the page */
+ if (__isolate_lru_page(page, mode) != 0)
+ continue;
+
+ VM_BUG_ON(PageTransCompound(page));

- VM_BUG_ON(PageTransCompound(page));
+ /* Successfully isolated */
+ del_page_from_lru_list(page, lruvec, page_lru(page));
+ } else if (unlikely(movable_balloon_page(page))) {
+ if (!isolate_balloon_page(page))
+ continue;
+ } else
+ continue;

- /* Successfully isolated */
- del_page_from_lru_list(page, lruvec, page_lru(page));
list_add(&page->lru, migratelist);
cc->nr_migratepages++;
nr_isolated++;
diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..80f22bb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -79,7 +79,10 @@ void putback_lru_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- putback_lru_page(page);
+ if (unlikely(movable_balloon_page(page)))
+ putback_balloon_page(page);
+ else
+ putback_lru_page(page);
}
}

@@ -778,6 +781,17 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
}
}

+ if (unlikely(movable_balloon_page(page))) {
+ /*
+ * A ballooned page does not need any special attention from
+ * physical to virtual reverse mapping procedures.
+ * Skip any attempt to unmap PTEs or to remap swap cache,
+ * in order to avoid burning cycles at rmap level.
+ */
+ remap_swapcache = 0;
+ goto skip_unmap;
+ }
+
/*
* Corner case handling:
* 1. When a new swap-cache page is read into, it is added to the LRU
@@ -846,6 +860,20 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
goto out;

rc = __unmap_and_move(page, newpage, force, offlining, mode);
+
+ if (unlikely(movable_balloon_page(newpage))) {
+ /*
+ * A ballooned page has been migrated already. Now, it is the
+ * time to wrap-up counters, handle the old page back to Buddy
+ * and return.
+ */
+ list_del(&page->lru);
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+ put_page(page);
+ __free_page(page);
+ return rc;
+ }
out:
if (rc != -EAGAIN) {
/*
--
1.7.11.2

2012-08-10 17:56:14

by Rafael Aquini

[permalink] [raw]
Subject: [PATCH v7 3/4] mm: introduce putback_movable_pages()

The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
hacks around putback_lru_pages() in order to allow ballooned pages to be
re-inserted on balloon page list as if a ballooned page was like a LRU page.

As ballooned pages are not legitimate LRU pages, this patch introduces
putback_movable_pages() to properly cope with cases where the isolated
pageset contains ballooned pages and LRU pages, thus fixing the mentioned
inelegant hack around putback_lru_pages().

Signed-off-by: Rafael Aquini <[email protected]>
---
include/linux/migrate.h | 2 ++
mm/compaction.c | 4 ++--
mm/migrate.c | 20 ++++++++++++++++++++
mm/page_alloc.c | 2 +-
4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ce7e667..ff103a1 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -10,6 +10,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
#ifdef CONFIG_MIGRATION

extern void putback_lru_pages(struct list_head *l);
+extern void putback_movable_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
extern int migrate_pages(struct list_head *l, new_page_t x,
@@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
#else

static inline void putback_lru_pages(struct list_head *l) {}
+static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index e4e871b..8567bb8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -837,9 +837,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
nr_remaining);

- /* Release LRU pages not migrated */
+ /* Release isolated pages not migrated */
if (err) {
- putback_lru_pages(&cc->migratepages);
+ putback_movable_pages(&cc->migratepages);
cc->nr_migratepages = 0;
if (err == -ENOMEM) {
ret = COMPACT_PARTIAL;
diff --git a/mm/migrate.c b/mm/migrate.c
index 80f22bb..1165134 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -79,6 +79,26 @@ void putback_lru_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
+ putback_lru_page(page);
+ }
+}
+
+/*
+ * Put previously isolated pages back onto the appropriated lists
+ * from where they were once taken off for compaction/migration.
+ *
+ * This function shall be used instead of putback_lru_pages(),
+ * whenever the isolated pageset has been built by isolate_migratepages_range()
+ */
+void putback_movable_pages(struct list_head *l)
+{
+ struct page *page;
+ struct page *page2;
+
+ list_for_each_entry_safe(page, page2, l, lru) {
+ list_del(&page->lru);
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
if (unlikely(movable_balloon_page(page)))
putback_balloon_page(page);
else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 009ac28..78b7663 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5669,7 +5669,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
0, false, MIGRATE_SYNC);
}

- putback_lru_pages(&cc.migratepages);
+ putback_movable_pages(&cc.migratepages);
return ret > 0 ? 0 : ret;
}

--
1.7.11.2

2012-08-12 23:12:28

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Fri, Aug 10, 2012 at 02:55:14PM -0300, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
>
> This patch introduces the helper functions as well as the necessary changes
> to teach compaction and migration bits how to cope with pages which are
> part of a guest memory balloon, in order to make them movable by memory
> compaction procedures.
>
> Signed-off-by: Rafael Aquini <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

--
Kind regards,
Minchan Kim

2012-08-12 23:24:44

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v7 3/4] mm: introduce putback_movable_pages()

On Fri, Aug 10, 2012 at 02:55:16PM -0300, Rafael Aquini wrote:
> The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
> hacks around putback_lru_pages() in order to allow ballooned pages to be
> re-inserted on balloon page list as if a ballooned page was like a LRU page.
>
> As ballooned pages are not legitimate LRU pages, this patch introduces
> putback_movable_pages() to properly cope with cases where the isolated
> pageset contains ballooned pages and LRU pages, thus fixing the mentioned
> inelegant hack around putback_lru_pages().
>
> Signed-off-by: Rafael Aquini <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

Thanks for your good work, Rafael.

--
Kind regards,
Minchan Kim

2012-08-13 08:25:39

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Fri, Aug 10, 2012 at 02:55:14PM -0300, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
>
> This patch introduces the helper functions as well as the necessary changes
> to teach compaction and migration bits how to cope with pages which are
> part of a guest memory balloon, in order to make them movable by memory
> compaction procedures.
>
> Signed-off-by: Rafael Aquini <[email protected]>
> ---
> include/linux/mm.h | 17 ++++++++
> mm/compaction.c | 125 +++++++++++++++++++++++++++++++++++++++++++++--------
> mm/migrate.c | 30 ++++++++++++-
> 3 files changed, 152 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 311be90..56cc553 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1662,5 +1662,22 @@ static inline unsigned int debug_guardpage_minorder(void) { return 0; }
> static inline bool page_is_guard(struct page *page) { return false; }
> #endif /* CONFIG_DEBUG_PAGEALLOC */
>
> +#if (defined(CONFIG_VIRTIO_BALLOON) || \
> + defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION)
> +extern bool isolate_balloon_page(struct page *);
> +extern void putback_balloon_page(struct page *);
> +extern struct address_space *balloon_mapping;
> +
> +static inline bool movable_balloon_page(struct page *page)
> +{
> + return (page->mapping && page->mapping == balloon_mapping);

I am guessing this needs smp_read_barrier_depends, and maybe
ACCESS_ONCE ...

> +}
> +
> +#else
> +static inline bool isolate_balloon_page(struct page *page) { return false; }
> +static inline void putback_balloon_page(struct page *page) { return false; }
> +static inline bool movable_balloon_page(struct page *page) { return false; }
> +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
> +

This does mean that only one type of balloon is useable at a time.
I wonder whether using a flag in address_space structure instead
is possible ...

> #endif /* __KERNEL__ */
> #endif /* _LINUX_MM_H */
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e78cb96..e4e871b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -14,6 +14,7 @@
> #include <linux/backing-dev.h>
> #include <linux/sysctl.h>
> #include <linux/sysfs.h>
> +#include <linux/export.h>
> #include "internal.h"
>
> #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> @@ -21,6 +22,84 @@
> #define CREATE_TRACE_POINTS
> #include <trace/events/compaction.h>
>
> +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
> +/*
> + * Balloon pages special page->mapping.
> + * Users must properly allocate and initialize an instance of balloon_mapping,
> + * and set it as the page->mapping for balloon enlisted page instances.
> + * There is no need on utilizing struct address_space locking schemes for
> + * balloon_mapping as, once it gets initialized at balloon driver, it will
> + * remain just like a static reference that helps us on identifying a guest
> + * ballooned page by its mapping, as well as it will keep the 'a_ops' callback
> + * pointers to the functions that will execute the balloon page mobility tasks.
> + *
> + * address_space_operations necessary methods for ballooned pages:
> + * .migratepage - used to perform balloon's page migration (as is)
> + * .invalidatepage - used to isolate a page from balloon's page list
> + * .freepage - used to reinsert an isolated page to balloon's page list
> + */
> +struct address_space *balloon_mapping;
> +EXPORT_SYMBOL_GPL(balloon_mapping);
> +
> +static inline void __isolate_balloon_page(struct page *page)
> +{
> + page->mapping->a_ops->invalidatepage(page, 0);
> +}
> +
> +static inline void __putback_balloon_page(struct page *page)
> +{
> + page->mapping->a_ops->freepage(page);
> +}
> +
> +/* __isolate_lru_page() counterpart for a ballooned page */
> +bool isolate_balloon_page(struct page *page)
> +{
> + if (WARN_ON(!movable_balloon_page(page)))

Looks like this actually can happen if the page is leaked
between previous movable_balloon_page and here.

> + return false;
> +
> + if (likely(get_page_unless_zero(page))) {
> + /*
> + * As balloon pages are not isolated from LRU lists, concurrent
> + * compaction threads can race against page migration functions
> + * move_to_new_page() & __unmap_and_move().
> + * In order to avoid having an already isolated balloon page
> + * being (wrongly) re-isolated while it is under migration,
> + * lets be sure we have the page lock before proceeding with
> + * the balloon page isolation steps.
> + */
> + if (likely(trylock_page(page))) {
> + /*
> + * A ballooned page, by default, has just one refcount.
> + * Prevent concurrent compaction threads from isolating
> + * an already isolated balloon page.
> + */
> + if (movable_balloon_page(page) &&
> + (page_count(page) == 2)) {
> + __isolate_balloon_page(page);
> + unlock_page(page);
> + return true;
> + }
> + unlock_page(page);
> + }
> + /* Drop refcount taken for this already isolated page */
> + put_page(page);
> + }
> + return false;
> +}
> +
> +/* putback_lru_page() counterpart for a ballooned page */
> +void putback_balloon_page(struct page *page)
> +{
> + if (WARN_ON(!movable_balloon_page(page)))
> + return;
> +
> + lock_page(page);
> + __putback_balloon_page(page);
> + put_page(page);
> + unlock_page(page);
> +}
> +#endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */
> +
> static unsigned long release_freepages(struct list_head *freelist)
> {
> struct page *page, *next;
> @@ -312,32 +391,40 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> continue;
> }
>
> - if (!PageLRU(page))
> - continue;
> -
> /*
> - * PageLRU is set, and lru_lock excludes isolation,
> - * splitting and collapsing (collapsing has already
> - * happened if PageLRU is set).
> + * It is possible to migrate LRU pages and balloon pages.
> + * Skip any other type of page.
> */
> - if (PageTransHuge(page)) {
> - low_pfn += (1 << compound_order(page)) - 1;
> - continue;
> - }
> + if (PageLRU(page)) {
> + /*
> + * PageLRU is set, and lru_lock excludes isolation,
> + * splitting and collapsing (collapsing has already
> + * happened if PageLRU is set).
> + */
> + if (PageTransHuge(page)) {
> + low_pfn += (1 << compound_order(page)) - 1;
> + continue;
> + }
>
> - if (!cc->sync)
> - mode |= ISOLATE_ASYNC_MIGRATE;
> + if (!cc->sync)
> + mode |= ISOLATE_ASYNC_MIGRATE;
>
> - lruvec = mem_cgroup_page_lruvec(page, zone);
> + lruvec = mem_cgroup_page_lruvec(page, zone);
>
> - /* Try isolate the page */
> - if (__isolate_lru_page(page, mode) != 0)
> - continue;
> + /* Try isolate the page */
> + if (__isolate_lru_page(page, mode) != 0)
> + continue;
> +
> + VM_BUG_ON(PageTransCompound(page));
>
> - VM_BUG_ON(PageTransCompound(page));
> + /* Successfully isolated */
> + del_page_from_lru_list(page, lruvec, page_lru(page));
> + } else if (unlikely(movable_balloon_page(page))) {
> + if (!isolate_balloon_page(page))
> + continue;
> + } else
> + continue;
>
> - /* Successfully isolated */
> - del_page_from_lru_list(page, lruvec, page_lru(page));
> list_add(&page->lru, migratelist);
> cc->nr_migratepages++;
> nr_isolated++;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 77ed2d7..80f22bb 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -79,7 +79,10 @@ void putback_lru_pages(struct list_head *l)
> list_del(&page->lru);
> dec_zone_page_state(page, NR_ISOLATED_ANON +
> page_is_file_cache(page));
> - putback_lru_page(page);
> + if (unlikely(movable_balloon_page(page)))
> + putback_balloon_page(page);
> + else
> + putback_lru_page(page);
> }
> }
>
> @@ -778,6 +781,17 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> }
> }
>
> + if (unlikely(movable_balloon_page(page))) {
> + /*
> + * A ballooned page does not need any special attention from
> + * physical to virtual reverse mapping procedures.
> + * Skip any attempt to unmap PTEs or to remap swap cache,
> + * in order to avoid burning cycles at rmap level.
> + */
> + remap_swapcache = 0;
> + goto skip_unmap;
> + }
> +
> /*
> * Corner case handling:
> * 1. When a new swap-cache page is read into, it is added to the LRU
> @@ -846,6 +860,20 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
> goto out;
>
> rc = __unmap_and_move(page, newpage, force, offlining, mode);
> +
> + if (unlikely(movable_balloon_page(newpage))) {
> + /*
> + * A ballooned page has been migrated already. Now, it is the
> + * time to wrap-up counters, handle the old page back to Buddy
> + * and return.
> + */
> + list_del(&page->lru);
> + dec_zone_page_state(page, NR_ISOLATED_ANON +
> + page_is_file_cache(page));
> + put_page(page);
> + __free_page(page);
> + return rc;
> + }
> out:
> if (rc != -EAGAIN) {
> /*
> --
> 1.7.11.2

2012-08-13 08:40:40

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
>
> Besides making balloon pages movable at allocation time and introducing
> the necessary primitives to perform balloon page migration/compaction,
> this patch also introduces the following locking scheme to provide the
> proper synchronization and protection for struct virtio_balloon elements
> against concurrent accesses due to parallel operations introduced by
> memory compaction / page migration.
> - balloon_lock (mutex) : synchronizes the access demand to elements of
> struct virtio_balloon and its queue operations;
> - pages_lock (spinlock): special protection to balloon pages list against
> concurrent list handling operations;
>
> Signed-off-by: Rafael Aquini <[email protected]>
> ---
> drivers/virtio/virtio_balloon.c | 138 +++++++++++++++++++++++++++++++++++++---
> include/linux/virtio_balloon.h | 4 ++
> 2 files changed, 134 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 0908e60..7c937a0 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -27,6 +27,7 @@
> #include <linux/delay.h>
> #include <linux/slab.h>
> #include <linux/module.h>
> +#include <linux/fs.h>
>
> /*
> * Balloon device works in 4K page units. So each page is pointed to by
> @@ -35,6 +36,12 @@
> */
> #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
>
> +/* Synchronizes accesses/updates to the struct virtio_balloon elements */
> +DEFINE_MUTEX(balloon_lock);
> +
> +/* Protects 'virtio_balloon->pages' list against concurrent handling */
> +DEFINE_SPINLOCK(pages_lock);
> +
> struct virtio_balloon
> {
> struct virtio_device *vdev;
> @@ -51,6 +58,7 @@ struct virtio_balloon
>
> /* Number of balloon pages we've told the Host we're not using. */
> unsigned int num_pages;
> +
> /*
> * The pages we've told the Host we're not using.
> * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
> @@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> /* We can only do one array worth at a time. */
> num = min(num, ARRAY_SIZE(vb->pfns));
>
> + mutex_lock(&balloon_lock);
> for (vb->num_pfns = 0; vb->num_pfns < num;
> vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
> - __GFP_NOMEMALLOC | __GFP_NOWARN);
> + struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
> + __GFP_NORETRY | __GFP_NOWARN |
> + __GFP_NOMEMALLOC);
> if (!page) {
> if (printk_ratelimit())
> dev_printk(KERN_INFO, &vb->vdev->dev,
> @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> set_page_pfns(vb->pfns + vb->num_pfns, page);
> vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> totalram_pages--;
> + spin_lock(&pages_lock);
> list_add(&page->lru, &vb->pages);

If list_add above is reordered with mapping assignment below,
then nothing bad happens because balloon_mapping takes
pages_lock.

> + page->mapping = balloon_mapping;
> + spin_unlock(&pages_lock);
> }
>
> /* Didn't get any? Oh well. */
> @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> return;
>
> tell_host(vb, vb->inflate_vq);
> + mutex_unlock(&balloon_lock);
> }
>
> static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
> /* We can only do one array worth at a time. */
> num = min(num, ARRAY_SIZE(vb->pfns));
>
> + mutex_lock(&balloon_lock);
> for (vb->num_pfns = 0; vb->num_pfns < num;
> vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> + /*
> + * We can race against virtballoon_isolatepage() and end up
> + * stumbling across a _temporarily_ empty 'pages' list.
> + */
> + spin_lock(&pages_lock);
> + if (unlikely(list_empty(&vb->pages))) {
> + spin_unlock(&pages_lock);
> + break;
> + }
> page = list_first_entry(&vb->pages, struct page, lru);
> + page->mapping = NULL;

Unlike the case above, here
if = NULL write above is reordered with list_del below,
then isolate_page can run on a page that is not
on lru.

So I think this needs a wmb().
And maybe a comment above explaining why it is safe?

> list_del(&page->lru);

I wonder why changing page->lru here is safe against
races with unmap_and_move in the previous patch.

> + spin_unlock(&pages_lock);
> set_page_pfns(vb->pfns + vb->num_pfns, page);
> vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
> }
> @@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
> * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
> * is true, we *have* to do it in this order
> */
> - tell_host(vb, vb->deflate_vq);
> - release_pages_by_pfn(vb->pfns, vb->num_pfns);
> + if (vb->num_pfns > 0) {
> + tell_host(vb, vb->deflate_vq);
> + release_pages_by_pfn(vb->pfns, vb->num_pfns);
> + }
> + mutex_unlock(&balloon_lock);
> }
>
> static inline void update_stat(struct virtio_balloon *vb, int idx,
> @@ -239,6 +268,7 @@ static void stats_handle_request(struct virtio_balloon *vb)
> struct scatterlist sg;
> unsigned int len;
>
> + mutex_lock(&balloon_lock);
> vb->need_stats_update = 0;
> update_balloon_stats(vb);
>
> @@ -249,6 +279,7 @@ static void stats_handle_request(struct virtio_balloon *vb)
> if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> BUG();
> virtqueue_kick(vq);
> + mutex_unlock(&balloon_lock);
> }
>
> static void virtballoon_changed(struct virtio_device *vdev)
> @@ -261,22 +292,27 @@ static void virtballoon_changed(struct virtio_device *vdev)
> static inline s64 towards_target(struct virtio_balloon *vb)
> {
> __le32 v;
> - s64 target;
> + s64 target, actual;
>
> + mutex_lock(&balloon_lock);
> + actual = vb->num_pages;
> vb->vdev->config->get(vb->vdev,
> offsetof(struct virtio_balloon_config, num_pages),
> &v, sizeof(v));
> target = le32_to_cpu(v);
> - return target - vb->num_pages;
> + mutex_unlock(&balloon_lock);
> + return target - actual;
> }
>
> static void update_balloon_size(struct virtio_balloon *vb)
> {
> - __le32 actual = cpu_to_le32(vb->num_pages);
> -
> + __le32 actual;
> + mutex_lock(&balloon_lock);
> + actual = cpu_to_le32(vb->num_pages);
> vb->vdev->config->set(vb->vdev,
> offsetof(struct virtio_balloon_config, actual),
> &actual, sizeof(actual));
> + mutex_unlock(&balloon_lock);
> }
>
> static int balloon(void *_vballoon)
> @@ -339,6 +375,76 @@ static int init_vqs(struct virtio_balloon *vb)
> return 0;
> }
>
> +/*
> + * '*vb_ptr' allows virtballoon_migratepage() & virtballoon_putbackpage() to
> + * access pertinent elements from struct virtio_balloon
> + */

What if there is more than one balloon device?

> +struct virtio_balloon *vb_ptr;
> +
> +/*
> + * Populate balloon_mapping->a_ops->migratepage method to perform the balloon
> + * page migration task.
> + *
> + * After a ballooned page gets isolated by compaction procedures, this is the
> + * function that performs the page migration on behalf of move_to_new_page(),
> + * when the last calls (page)->mapping->a_ops->migratepage.
> + *
> + * Page migration for virtio balloon is done in a simple swap fashion which
> + * follows these two steps:
> + * 1) insert newpage into vb->pages list and update the host about it;
> + * 2) update the host about the removed old page from vb->pages list;
> + */
> +int virtballoon_migratepage(struct address_space *mapping,
> + struct page *newpage, struct page *page, enum migrate_mode mode)
> +{
> + mutex_lock(&balloon_lock);
> +
> + /* balloon's page migration 1st step */
> + vb_ptr->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> + spin_lock(&pages_lock);
> + list_add(&newpage->lru, &vb_ptr->pages);
> + spin_unlock(&pages_lock);
> + set_page_pfns(vb_ptr->pfns, newpage);
> + tell_host(vb_ptr, vb_ptr->inflate_vq);
> +
> + /* balloon's page migration 2nd step */
> + vb_ptr->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> + set_page_pfns(vb_ptr->pfns, page);
> + tell_host(vb_ptr, vb_ptr->deflate_vq);
> +
> + mutex_unlock(&balloon_lock);
> +
> + return 0;
> +}
> +
> +/*
> + * Populate balloon_mapping->a_ops->invalidatepage method to help compaction on
> + * isolating a page from the balloon page list.
> + */
> +void virtballoon_isolatepage(struct page *page, unsigned long mode)
> +{
> + spin_lock(&pages_lock);
> + list_del(&page->lru);
> + spin_unlock(&pages_lock);
> +}
> +
> +/*
> + * Populate balloon_mapping->a_ops->freepage method to help compaction on
> + * re-inserting an isolated page into the balloon page list.
> + */
> +void virtballoon_putbackpage(struct page *page)
> +{
> + spin_lock(&pages_lock);
> + list_add(&page->lru, &vb_ptr->pages);
> + spin_unlock(&pages_lock);

Could the following race trigger:
migration happens while module unloading is in progress,
module goes away between here and when the function
returns, then code for this function gets overwritten?
If yes we need locking external to module to prevent this.
Maybe add a spinlock to struct address_space?

> +}
> +
> +static const struct address_space_operations virtio_balloon_aops = {
> + .migratepage = virtballoon_migratepage,
> + .invalidatepage = virtballoon_isolatepage,
> + .freepage = virtballoon_putbackpage,
> +};
> +
> static int virtballoon_probe(struct virtio_device *vdev)
> {
> struct virtio_balloon *vb;
> @@ -351,11 +457,25 @@ static int virtballoon_probe(struct virtio_device *vdev)
> }
>
> INIT_LIST_HEAD(&vb->pages);
> +
> vb->num_pages = 0;
> init_waitqueue_head(&vb->config_change);
> init_waitqueue_head(&vb->acked);
> vb->vdev = vdev;
> vb->need_stats_update = 0;
> + vb_ptr = vb;
> +
> + /* Init the ballooned page->mapping special balloon_mapping */
> + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
> + if (!balloon_mapping) {
> + err = -ENOMEM;
> + goto out_free_vb;
> + }

Can balloon_mapping be dereferenced at this point?
Then what happens?

> +
> + INIT_RADIX_TREE(&balloon_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);
> + INIT_LIST_HEAD(&balloon_mapping->i_mmap_nonlinear);
> + spin_lock_init(&balloon_mapping->tree_lock);
> + balloon_mapping->a_ops = &virtio_balloon_aops;
>
> err = init_vqs(vb);
> if (err)
> @@ -373,6 +493,7 @@ out_del_vqs:
> vdev->config->del_vqs(vdev);
> out_free_vb:
> kfree(vb);
> + kfree(balloon_mapping);

No need to set it to NULL? It seems if someone else allocates a mapping
and gets this chunk of memory by chance, the logic in mm will get
confused.

> out:
> return err;
> }
> @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
> kthread_stop(vb->thread);
> remove_common(vb);
> kfree(vb);
> + kfree(balloon_mapping);

Neither here?

> }
>
> #ifdef CONFIG_PM
> diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
> index 652dc8b..930f1b7 100644
> --- a/include/linux/virtio_balloon.h
> +++ b/include/linux/virtio_balloon.h
> @@ -56,4 +56,8 @@ struct virtio_balloon_stat {
> u64 val;
> } __attribute__((packed));
>
> +#if !defined(CONFIG_COMPACTION)
> +struct address_space *balloon_mapping;
> +#endif
> +

Anyone including this header will get a different copy of
balloon_mapping. Besides, need to be ifdef KERNEL.

> #endif /* _LINUX_VIRTIO_BALLOON_H */
> --
> 1.7.11.2

2012-08-14 00:23:06

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > +/*
> > + * Populate balloon_mapping->a_ops->freepage method to help compaction on
> > + * re-inserting an isolated page into the balloon page list.
> > + */
> > +void virtballoon_putbackpage(struct page *page)
> > +{
> > + spin_lock(&pages_lock);
> > + list_add(&page->lru, &vb_ptr->pages);
> > + spin_unlock(&pages_lock);
>
> Could the following race trigger:
> migration happens while module unloading is in progress,
> module goes away between here and when the function
> returns, then code for this function gets overwritten?
> If yes we need locking external to module to prevent this.
> Maybe add a spinlock to struct address_space?

The balloon module cannot be unloaded until it has leaked all its pages,
so I think this is safe:

static void remove_common(struct virtio_balloon *vb)
{
/* There might be pages left in the balloon: free them. */
while (vb->num_pages)
leak_balloon(vb, vb->num_pages);

Cheers,
Rusty.

2012-08-14 08:32:43

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
> On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > > +/*
> > > + * Populate balloon_mapping->a_ops->freepage method to help compaction on
> > > + * re-inserting an isolated page into the balloon page list.
> > > + */
> > > +void virtballoon_putbackpage(struct page *page)
> > > +{
> > > + spin_lock(&pages_lock);
> > > + list_add(&page->lru, &vb_ptr->pages);
> > > + spin_unlock(&pages_lock);
> >
> > Could the following race trigger:
> > migration happens while module unloading is in progress,
> > module goes away between here and when the function
> > returns, then code for this function gets overwritten?
> > If yes we need locking external to module to prevent this.
> > Maybe add a spinlock to struct address_space?
>
> The balloon module cannot be unloaded until it has leaked all its pages,
> so I think this is safe:
>
> static void remove_common(struct virtio_balloon *vb)
> {
> /* There might be pages left in the balloon: free them. */
> while (vb->num_pages)
> leak_balloon(vb, vb->num_pages);
>
> Cheers,
> Rusty.

I know I meant something else.
Let me lay this out:

CPU1 executes:
void virtballoon_putbackpage(struct page *page)
{
spin_lock(&pages_lock);
list_add(&page->lru, &vb_ptr->pages);
spin_unlock(&pages_lock);


at this point CPU2 unloads module:
leak_balloon
......

next CPU2 loads another module so code memory gets overwritten

now CPU1 executes the next instruction:

}

which would normally return to function's caller,
but it has been overwritten by CPU2 so we get corruption.

No?

--
MST

2012-08-14 17:44:44

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Mon, Aug 13, 2012 at 11:26:19AM +0300, Michael S. Tsirkin wrote:
> > +static inline bool movable_balloon_page(struct page *page)
> > +{
> > + return (page->mapping && page->mapping == balloon_mapping);
>
> I am guessing this needs smp_read_barrier_depends, and maybe
> ACCESS_ONCE ...
>

I'm curious about your guessing here. Could you ellaborate it further, please?


> > +#else
> > +static inline bool isolate_balloon_page(struct page *page) { return false; }
> > +static inline void putback_balloon_page(struct page *page) { return false; }
> > +static inline bool movable_balloon_page(struct page *page) { return false; }
> > +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
> > +
>
> This does mean that only one type of balloon is useable at a time.
> I wonder whether using a flag in address_space structure instead
> is possible ...

This means we are only introducing this feature for virtio_balloon by now.
Despite the flagging address_space stuff is something we surely can look in the
future, I quite didn't get how we could be using two different types of balloon
devices at the same time for the same system. Could you ellaborate it a little
more, please?


> > +/* __isolate_lru_page() counterpart for a ballooned page */
> > +bool isolate_balloon_page(struct page *page)
> > +{
> > + if (WARN_ON(!movable_balloon_page(page)))
>
> Looks like this actually can happen if the page is leaked
> between previous movable_balloon_page and here.
>
> > + return false;

Yes, it surely can happen, and it does not harm to catch it here, print a warn and
return. While testing it, I wasn't lucky to see this small window opening, though.

2012-08-14 18:23:14

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote:
> > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> > set_page_pfns(vb->pfns + vb->num_pfns, page);
> > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > totalram_pages--;
> > + spin_lock(&pages_lock);
> > list_add(&page->lru, &vb->pages);
>
> If list_add above is reordered with mapping assignment below,
> then nothing bad happens because balloon_mapping takes
> pages_lock.
>
> > + page->mapping = balloon_mapping;
> > + spin_unlock(&pages_lock);
> > }
> >
> > /* Didn't get any? Oh well. */
> > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> > return;
> >
> > tell_host(vb, vb->inflate_vq);
> > + mutex_unlock(&balloon_lock);
> > }
> >
> > static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
> > /* We can only do one array worth at a time. */
> > num = min(num, ARRAY_SIZE(vb->pfns));
> >
> > + mutex_lock(&balloon_lock);
> > for (vb->num_pfns = 0; vb->num_pfns < num;
> > vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> > + /*
> > + * We can race against virtballoon_isolatepage() and end up
> > + * stumbling across a _temporarily_ empty 'pages' list.
> > + */
> > + spin_lock(&pages_lock);
> > + if (unlikely(list_empty(&vb->pages))) {
> > + spin_unlock(&pages_lock);
> > + break;
> > + }
> > page = list_first_entry(&vb->pages, struct page, lru);
> > + page->mapping = NULL;
>
> Unlike the case above, here
> if = NULL write above is reordered with list_del below,
> then isolate_page can run on a page that is not
> on lru.
>
> So I think this needs a wmb().
> And maybe a comment above explaining why it is safe?

Good point. Presumably, this nit has potential to explain your guessing on the
read barrier stuff at movable_balloon_page() on the other patch.


> > list_del(&page->lru);
>
> I wonder why changing page->lru here is safe against
> races with unmap_and_move in the previous patch.

leak_balloon() doesn't race against unmap_and_move() because the later works on
an isolated page set. So, theoretically, pages being dequeued from balloon page
list here are either migrated (already) or they were not isolated yet.


> > +/*
> > + * '*vb_ptr' allows virtballoon_migratepage() & virtballoon_putbackpage() to
> > + * access pertinent elements from struct virtio_balloon
> > + */
>
> What if there is more than one balloon device?

Is it possible to load this driver twice, or are you foreseeing a future case
where this driver will be able to manage several distinct memory balloons for
the same guest?


> > + /* Init the ballooned page->mapping special balloon_mapping */
> > + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
> > + if (!balloon_mapping) {
> > + err = -ENOMEM;
> > + goto out_free_vb;
> > + }
>
> Can balloon_mapping be dereferenced at this point?
> Then what happens?

Since there's no balloon page enqueued for this balloon driver yet, there's no
chance of balloon_mapping being dereferenced at this point.


> > +
> > + INIT_RADIX_TREE(&balloon_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);
> > + INIT_LIST_HEAD(&balloon_mapping->i_mmap_nonlinear);
> > + spin_lock_init(&balloon_mapping->tree_lock);
> > + balloon_mapping->a_ops = &virtio_balloon_aops;
> >
> > err = init_vqs(vb);
> > if (err)
> > @@ -373,6 +493,7 @@ out_del_vqs:
> > vdev->config->del_vqs(vdev);
> > out_free_vb:
> > kfree(vb);
> > + kfree(balloon_mapping);
>
> No need to set it to NULL? It seems if someone else allocates a mapping
> and gets this chunk of memory by chance, the logic in mm will get
> confused.

Good point. It surely doesn't hurt be asured of this sort of safety.

> > out:
> > return err;
> > }
> > @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
> > kthread_stop(vb->thread);
> > remove_common(vb);
> > kfree(vb);
> > + kfree(balloon_mapping);
>
> Neither here?

ditto.


> > }
> >
> > #ifdef CONFIG_PM
> > diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
> > index 652dc8b..930f1b7 100644
> > --- a/include/linux/virtio_balloon.h
> > +++ b/include/linux/virtio_balloon.h
> > @@ -56,4 +56,8 @@ struct virtio_balloon_stat {
> > u64 val;
> > } __attribute__((packed));
> >
> > +#if !defined(CONFIG_COMPACTION)
> > +struct address_space *balloon_mapping;
> > +#endif
> > +
>
> Anyone including this header will get a different copy of
> balloon_mapping. Besides, need to be ifdef KERNEL.

Good point. I'll move this hunk to the balloon driver itself.

2012-08-14 18:44:33

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
> > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > > > +/*
> > > > + * Populate balloon_mapping->a_ops->freepage method to help compaction on
> > > > + * re-inserting an isolated page into the balloon page list.
> > > > + */
> > > > +void virtballoon_putbackpage(struct page *page)
> > > > +{
> > > > + spin_lock(&pages_lock);
> > > > + list_add(&page->lru, &vb_ptr->pages);
> > > > + spin_unlock(&pages_lock);
> > >
> > > Could the following race trigger:
> > > migration happens while module unloading is in progress,
> > > module goes away between here and when the function
> > > returns, then code for this function gets overwritten?
> > > If yes we need locking external to module to prevent this.
> > > Maybe add a spinlock to struct address_space?
> >
> > The balloon module cannot be unloaded until it has leaked all its pages,
> > so I think this is safe:
> >
> > static void remove_common(struct virtio_balloon *vb)
> > {
> > /* There might be pages left in the balloon: free them. */
> > while (vb->num_pages)
> > leak_balloon(vb, vb->num_pages);
> >
> > Cheers,
> > Rusty.
>
> I know I meant something else.
> Let me lay this out:
>
> CPU1 executes:
> void virtballoon_putbackpage(struct page *page)
> {
> spin_lock(&pages_lock);
> list_add(&page->lru, &vb_ptr->pages);
> spin_unlock(&pages_lock);
>
>
> at this point CPU2 unloads module:
> leak_balloon
> ......
>
> next CPU2 loads another module so code memory gets overwritten
>
> now CPU1 executes the next instruction:
>
> }
>
> which would normally return to function's caller,
> but it has been overwritten by CPU2 so we get corruption.
>
> No?

At the point CPU2 is unloading the module, it will be kept looping at the
snippet Rusty pointed out because the isolation / migration steps do not mess
with 'vb->num_pages'. The driver will only unload after leaking the total amount
of balloon's inflated pages, which means (for this hypothetical case) CPU2 will
wait until CPU1 finishes the putaback procedure.

2012-08-14 19:30:25

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 03:44:09PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 11:33:20AM +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
> > > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> > > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > > > > +/*
> > > > > + * Populate balloon_mapping->a_ops->freepage method to help compaction on
> > > > > + * re-inserting an isolated page into the balloon page list.
> > > > > + */
> > > > > +void virtballoon_putbackpage(struct page *page)
> > > > > +{
> > > > > + spin_lock(&pages_lock);
> > > > > + list_add(&page->lru, &vb_ptr->pages);
> > > > > + spin_unlock(&pages_lock);
> > > >
> > > > Could the following race trigger:
> > > > migration happens while module unloading is in progress,
> > > > module goes away between here and when the function
> > > > returns, then code for this function gets overwritten?
> > > > If yes we need locking external to module to prevent this.
> > > > Maybe add a spinlock to struct address_space?
> > >
> > > The balloon module cannot be unloaded until it has leaked all its pages,
> > > so I think this is safe:
> > >
> > > static void remove_common(struct virtio_balloon *vb)
> > > {
> > > /* There might be pages left in the balloon: free them. */
> > > while (vb->num_pages)
> > > leak_balloon(vb, vb->num_pages);
> > >
> > > Cheers,
> > > Rusty.
> >
> > I know I meant something else.
> > Let me lay this out:
> >
> > CPU1 executes:
> > void virtballoon_putbackpage(struct page *page)
> > {
> > spin_lock(&pages_lock);
> > list_add(&page->lru, &vb_ptr->pages);
> > spin_unlock(&pages_lock);
> >
> >
> > at this point CPU2 unloads module:
> > leak_balloon
> > ......
> >
> > next CPU2 loads another module so code memory gets overwritten
> >
> > now CPU1 executes the next instruction:
> >
> > }
> >
> > which would normally return to function's caller,
> > but it has been overwritten by CPU2 so we get corruption.
> >
> > No?
>
> At the point CPU2 is unloading the module, it will be kept looping at the
> snippet Rusty pointed out because the isolation / migration steps do not mess
> with 'vb->num_pages'. The driver will only unload after leaking the total amount
> of balloon's inflated pages, which means (for this hypothetical case) CPU2 will
> wait until CPU1 finishes the putaback procedure.
>

Yes but only until unlock finishes. The last return from function
is not guarded and can be overwritten.

2012-08-14 19:34:35

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Tue, Aug 14, 2012 at 02:44:05PM -0300, Rafael Aquini wrote:
> On Mon, Aug 13, 2012 at 11:26:19AM +0300, Michael S. Tsirkin wrote:
> > > +static inline bool movable_balloon_page(struct page *page)
> > > +{
> > > + return (page->mapping && page->mapping == balloon_mapping);
> >
> > I am guessing this needs smp_read_barrier_depends, and maybe
> > ACCESS_ONCE ...
> >
>
> I'm curious about your guessing here. Could you ellaborate it further, please?
>
>
> > > +#else
> > > +static inline bool isolate_balloon_page(struct page *page) { return false; }
> > > +static inline void putback_balloon_page(struct page *page) { return false; }
> > > +static inline bool movable_balloon_page(struct page *page) { return false; }
> > > +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
> > > +
> >
> > This does mean that only one type of balloon is useable at a time.
> > I wonder whether using a flag in address_space structure instead
> > is possible ...
>
> This means we are only introducing this feature for virtio_balloon by now.
> Despite the flagging address_space stuff is something we surely can look in the
> future, I quite didn't get how we could be using two different types of balloon
> devices at the same time for the same system. Could you ellaborate it a little
> more, please?
>

E.g. kvm can emulate hyperv so it could in theory have hyperv balloon.
This is mm stuff it is best not to tie it to specific drivers.

> > > +/* __isolate_lru_page() counterpart for a ballooned page */
> > > +bool isolate_balloon_page(struct page *page)
> > > +{
> > > + if (WARN_ON(!movable_balloon_page(page)))
> >
> > Looks like this actually can happen if the page is leaked
> > between previous movable_balloon_page and here.
> >
> > > + return false;
>
> Yes, it surely can happen, and it does not harm to catch it here, print a warn and
> return.

If it is legal, why warn? For that matter why test here at all?

> While testing it, I wasn't lucky to see this small window opening, though.

2012-08-14 19:47:52

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Tue, Aug 14, 2012 at 10:35:25PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 02:44:05PM -0300, Rafael Aquini wrote:
> > On Mon, Aug 13, 2012 at 11:26:19AM +0300, Michael S. Tsirkin wrote:
> > > > +static inline bool movable_balloon_page(struct page *page)
> > > > +{
> > > > + return (page->mapping && page->mapping == balloon_mapping);
> > >
> > > I am guessing this needs smp_read_barrier_depends, and maybe
> > > ACCESS_ONCE ...
> > >
> >
> > I'm curious about your guessing here. Could you ellaborate it further, please?
> >
> >
> > > > +#else
> > > > +static inline bool isolate_balloon_page(struct page *page) { return false; }
> > > > +static inline void putback_balloon_page(struct page *page) { return false; }
> > > > +static inline bool movable_balloon_page(struct page *page) { return false; }
> > > > +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
> > > > +
> > >
> > > This does mean that only one type of balloon is useable at a time.
> > > I wonder whether using a flag in address_space structure instead
> > > is possible ...
> >
> > This means we are only introducing this feature for virtio_balloon by now.
> > Despite the flagging address_space stuff is something we surely can look in the
> > future, I quite didn't get how we could be using two different types of balloon
> > devices at the same time for the same system. Could you ellaborate it a little
> > more, please?
> >
>
> E.g. kvm can emulate hyperv so it could in theory have hyperv balloon.
> This is mm stuff it is best not to tie it to specific drivers.

But of course I agree this is not top priority, no need
to block submission on this, just nice to have.

> > > > +/* __isolate_lru_page() counterpart for a ballooned page */
> > > > +bool isolate_balloon_page(struct page *page)
> > > > +{
> > > > + if (WARN_ON(!movable_balloon_page(page)))
> > >
> > > Looks like this actually can happen if the page is leaked
> > > between previous movable_balloon_page and here.
> > >
> > > > + return false;
> >
> > Yes, it surely can happen, and it does not harm to catch it here, print a warn and
> > return.
>
> If it is legal, why warn? For that matter why test here at all?
>
> > While testing it, I wasn't lucky to see this small window opening, though.

2012-08-14 19:50:49

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote:
> On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote:
> > > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> > > set_page_pfns(vb->pfns + vb->num_pfns, page);
> > > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > > totalram_pages--;
> > > + spin_lock(&pages_lock);
> > > list_add(&page->lru, &vb->pages);
> >
> > If list_add above is reordered with mapping assignment below,
> > then nothing bad happens because balloon_mapping takes
> > pages_lock.
> >
> > > + page->mapping = balloon_mapping;
> > > + spin_unlock(&pages_lock);
> > > }
> > >
> > > /* Didn't get any? Oh well. */
> > > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> > > return;
> > >
> > > tell_host(vb, vb->inflate_vq);
> > > + mutex_unlock(&balloon_lock);
> > > }
> > >
> > > static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> > > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
> > > /* We can only do one array worth at a time. */
> > > num = min(num, ARRAY_SIZE(vb->pfns));
> > >
> > > + mutex_lock(&balloon_lock);
> > > for (vb->num_pfns = 0; vb->num_pfns < num;
> > > vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> > > + /*
> > > + * We can race against virtballoon_isolatepage() and end up
> > > + * stumbling across a _temporarily_ empty 'pages' list.
> > > + */
> > > + spin_lock(&pages_lock);
> > > + if (unlikely(list_empty(&vb->pages))) {
> > > + spin_unlock(&pages_lock);
> > > + break;
> > > + }
> > > page = list_first_entry(&vb->pages, struct page, lru);
> > > + page->mapping = NULL;
> >
> > Unlike the case above, here
> > if = NULL write above is reordered with list_del below,
> > then isolate_page can run on a page that is not
> > on lru.
> >
> > So I think this needs a wmb().
> > And maybe a comment above explaining why it is safe?
>
> Good point. Presumably, this nit has potential to explain your guessing on the
> read barrier stuff at movable_balloon_page() on the other patch.
>


What I think you should do is use rcu for access.
And here sync rcu before freeing.
Maybe an overkill but at least a documented synchronization
primitive, and it is very light weight.

> > > list_del(&page->lru);
> >
> > I wonder why changing page->lru here is safe against
> > races with unmap_and_move in the previous patch.
>
> leak_balloon() doesn't race against unmap_and_move() because the later works on
> an isolated page set. So, theoretically, pages being dequeued from balloon page
> list here are either migrated (already) or they were not isolated yet.
>

So add a comment explaining why it is safe pls.

> > > +/*
> > > + * '*vb_ptr' allows virtballoon_migratepage() & virtballoon_putbackpage() to
> > > + * access pertinent elements from struct virtio_balloon
> > > + */
> >
> > What if there is more than one balloon device?
>
> Is it possible to load this driver twice, or are you foreseeing a future case
> where this driver will be able to manage several distinct memory balloons for
> the same guest?
>

Second.
It is easy to create several balloons they are just
pci devices.

> > > + /* Init the ballooned page->mapping special balloon_mapping */
> > > + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
> > > + if (!balloon_mapping) {
> > > + err = -ENOMEM;
> > > + goto out_free_vb;
> > > + }
> >
> > Can balloon_mapping be dereferenced at this point?
> > Then what happens?
>
> Since there's no balloon page enqueued for this balloon driver yet, there's no
> chance of balloon_mapping being dereferenced at this point.
>
>
> > > +
> > > + INIT_RADIX_TREE(&balloon_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);
> > > + INIT_LIST_HEAD(&balloon_mapping->i_mmap_nonlinear);
> > > + spin_lock_init(&balloon_mapping->tree_lock);
> > > + balloon_mapping->a_ops = &virtio_balloon_aops;
> > >
> > > err = init_vqs(vb);
> > > if (err)
> > > @@ -373,6 +493,7 @@ out_del_vqs:
> > > vdev->config->del_vqs(vdev);
> > > out_free_vb:
> > > kfree(vb);
> > > + kfree(balloon_mapping);
> >
> > No need to set it to NULL? It seems if someone else allocates a mapping
> > and gets this chunk of memory by chance, the logic in mm will get
> > confused.
>
> Good point. It surely doesn't hurt be asured of this sort of safety.
>
> > > out:
> > > return err;
> > > }
> > > @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
> > > kthread_stop(vb->thread);
> > > remove_common(vb);
> > > kfree(vb);
> > > + kfree(balloon_mapping);
> >
> > Neither here?
>
> ditto.
>
>
> > > }
> > >
> > > #ifdef CONFIG_PM
> > > diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
> > > index 652dc8b..930f1b7 100644
> > > --- a/include/linux/virtio_balloon.h
> > > +++ b/include/linux/virtio_balloon.h
> > > @@ -56,4 +56,8 @@ struct virtio_balloon_stat {
> > > u64 val;
> > > } __attribute__((packed));
> > >
> > > +#if !defined(CONFIG_COMPACTION)
> > > +struct address_space *balloon_mapping;
> > > +#endif
> > > +
> >
> > Anyone including this header will get a different copy of
> > balloon_mapping. Besides, need to be ifdef KERNEL.
>
> Good point. I'll move this hunk to the balloon driver itself.
>

2012-08-14 19:51:35

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Tue, Aug 14, 2012 at 02:44:05PM -0300, Rafael Aquini wrote:
> On Mon, Aug 13, 2012 at 11:26:19AM +0300, Michael S. Tsirkin wrote:
> > > +static inline bool movable_balloon_page(struct page *page)
> > > +{
> > > + return (page->mapping && page->mapping == balloon_mapping);
> >
> > I am guessing this needs smp_read_barrier_depends, and maybe
> > ACCESS_ONCE ...
> >
>
> I'm curious about your guessing here. Could you ellaborate it further, please?

well balloon_mapping can change dynamically.


I think actually rcu is a good fit here.

>
> > > +#else
> > > +static inline bool isolate_balloon_page(struct page *page) { return false; }
> > > +static inline void putback_balloon_page(struct page *page) { return false; }
> > > +static inline bool movable_balloon_page(struct page *page) { return false; }
> > > +#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
> > > +
> >
> > This does mean that only one type of balloon is useable at a time.
> > I wonder whether using a flag in address_space structure instead
> > is possible ...
>
> This means we are only introducing this feature for virtio_balloon by now.
> Despite the flagging address_space stuff is something we surely can look in the
> future, I quite didn't get how we could be using two different types of balloon
> devices at the same time for the same system. Could you ellaborate it a little
> more, please?
>
>
> > > +/* __isolate_lru_page() counterpart for a ballooned page */
> > > +bool isolate_balloon_page(struct page *page)
> > > +{
> > > + if (WARN_ON(!movable_balloon_page(page)))
> >
> > Looks like this actually can happen if the page is leaked
> > between previous movable_balloon_page and here.
> >
> > > + return false;
>
> Yes, it surely can happen, and it does not harm to catch it here, print a warn and
> return. While testing it, I wasn't lucky to see this small window opening, though.

2012-08-14 19:58:28

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 03:22:45PM -0300, Rafael Aquini wrote:
> > On Mon, Aug 13, 2012 at 11:41:23AM +0300, Michael S. Tsirkin wrote:
> > > > @@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> > > > set_page_pfns(vb->pfns + vb->num_pfns, page);
> > > > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> > > > totalram_pages--;
> > > > + spin_lock(&pages_lock);
> > > > list_add(&page->lru, &vb->pages);
> > >
> > > If list_add above is reordered with mapping assignment below,
> > > then nothing bad happens because balloon_mapping takes
> > > pages_lock.
> > >
> > > > + page->mapping = balloon_mapping;
> > > > + spin_unlock(&pages_lock);
> > > > }
> > > >
> > > > /* Didn't get any? Oh well. */
> > > > @@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> > > > return;
> > > >
> > > > tell_host(vb, vb->inflate_vq);
> > > > + mutex_unlock(&balloon_lock);
> > > > }
> > > >
> > > > static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
> > > > @@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
> > > > /* We can only do one array worth at a time. */
> > > > num = min(num, ARRAY_SIZE(vb->pfns));
> > > >
> > > > + mutex_lock(&balloon_lock);
> > > > for (vb->num_pfns = 0; vb->num_pfns < num;
> > > > vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> > > > + /*
> > > > + * We can race against virtballoon_isolatepage() and end up
> > > > + * stumbling across a _temporarily_ empty 'pages' list.
> > > > + */
> > > > + spin_lock(&pages_lock);
> > > > + if (unlikely(list_empty(&vb->pages))) {
> > > > + spin_unlock(&pages_lock);
> > > > + break;
> > > > + }
> > > > page = list_first_entry(&vb->pages, struct page, lru);
> > > > + page->mapping = NULL;
> > >
> > > Unlike the case above, here
> > > if = NULL write above is reordered with list_del below,
> > > then isolate_page can run on a page that is not
> > > on lru.
> > >
> > > So I think this needs a wmb().
> > > And maybe a comment above explaining why it is safe?
> >
> > Good point. Presumably, this nit has potential to explain your guessing on the
> > read barrier stuff at movable_balloon_page() on the other patch.
> >
>
>
> What I think you should do is use rcu for access.
> And here sync rcu before freeing.
> Maybe an overkill but at least a documented synchronization
> primitive, and it is very light weight.
>
> > > > list_del(&page->lru);
> > >
> > > I wonder why changing page->lru here is safe against
> > > races with unmap_and_move in the previous patch.
> >
> > leak_balloon() doesn't race against unmap_and_move() because the later works on
> > an isolated page set. So, theoretically, pages being dequeued from balloon page
> > list here are either migrated (already) or they were not isolated yet.
> >
>
> So add a comment explaining why it is safe pls.
>
> > > > +/*
> > > > + * '*vb_ptr' allows virtballoon_migratepage() & virtballoon_putbackpage() to
> > > > + * access pertinent elements from struct virtio_balloon
> > > > + */
> > >
> > > What if there is more than one balloon device?
> >
> > Is it possible to load this driver twice, or are you foreseeing a future case
> > where this driver will be able to manage several distinct memory balloons for
> > the same guest?
> >
>
> Second.
> It is easy to create several balloons they are just
> pci devices.



and it might not be too important to make it work but
at least would be nice not to have a crash in this
setup.


> > > > + /* Init the ballooned page->mapping special balloon_mapping */
> > > > + balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
> > > > + if (!balloon_mapping) {
> > > > + err = -ENOMEM;
> > > > + goto out_free_vb;
> > > > + }
> > >
> > > Can balloon_mapping be dereferenced at this point?
> > > Then what happens?
> >
> > Since there's no balloon page enqueued for this balloon driver yet, there's no
> > chance of balloon_mapping being dereferenced at this point.
> >
> >
> > > > +
> > > > + INIT_RADIX_TREE(&balloon_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);
> > > > + INIT_LIST_HEAD(&balloon_mapping->i_mmap_nonlinear);
> > > > + spin_lock_init(&balloon_mapping->tree_lock);
> > > > + balloon_mapping->a_ops = &virtio_balloon_aops;
> > > >
> > > > err = init_vqs(vb);
> > > > if (err)
> > > > @@ -373,6 +493,7 @@ out_del_vqs:
> > > > vdev->config->del_vqs(vdev);
> > > > out_free_vb:
> > > > kfree(vb);
> > > > + kfree(balloon_mapping);
> > >
> > > No need to set it to NULL? It seems if someone else allocates a mapping
> > > and gets this chunk of memory by chance, the logic in mm will get
> > > confused.
> >
> > Good point. It surely doesn't hurt be asured of this sort of safety.
> >
> > > > out:
> > > > return err;
> > > > }
> > > > @@ -397,6 +518,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
> > > > kthread_stop(vb->thread);
> > > > remove_common(vb);
> > > > kfree(vb);
> > > > + kfree(balloon_mapping);
> > >
> > > Neither here?
> >
> > ditto.
> >
> >
> > > > }
> > > >
> > > > #ifdef CONFIG_PM
> > > > diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
> > > > index 652dc8b..930f1b7 100644
> > > > --- a/include/linux/virtio_balloon.h
> > > > +++ b/include/linux/virtio_balloon.h
> > > > @@ -56,4 +56,8 @@ struct virtio_balloon_stat {
> > > > u64 val;
> > > > } __attribute__((packed));
> > > >
> > > > +#if !defined(CONFIG_COMPACTION)
> > > > +struct address_space *balloon_mapping;
> > > > +#endif
> > > > +
> > >
> > > Anyone including this header will get a different copy of
> > > balloon_mapping. Besides, need to be ifdef KERNEL.
> >
> > Good point. I'll move this hunk to the balloon driver itself.
> >

2012-08-14 20:01:10

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Tue, Aug 14, 2012 at 10:35:25PM +0300, Michael S. Tsirkin wrote:
> > > > +/* __isolate_lru_page() counterpart for a ballooned page */
> > > > +bool isolate_balloon_page(struct page *page)
> > > > +{
> > > > + if (WARN_ON(!movable_balloon_page(page)))
> > >
> > > Looks like this actually can happen if the page is leaked
> > > between previous movable_balloon_page and here.
> > >
> > > > + return false;
> >
> > Yes, it surely can happen, and it does not harm to catch it here, print a warn and
> > return.
>
> If it is legal, why warn? For that matter why test here at all?
>

As this is a public symbol, and despite the usage we introduce is sane, the warn
was placed as an insurance policy to let us know about any insane attempt to use
the procedure in the future. That was due to a nice review nitpick, actually.

Even though the code already had a test to properly avoid this race you
mention, I thought that sustaining the warn was a good thing. As I told you,
despite real, I've never got (un)lucky enough to stumble across that race window
while testing the patch.

If your concern is about being too much verbose on logging, under certain
conditions, perhaps we can change that test to a WARN_ON_ONCE() ?

Mel, what are your thoughts here?

> > While testing it, I wasn't lucky to see this small window opening, though.

2012-08-14 20:03:45

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Tue, Aug 14, 2012 at 10:48:37PM +0300, Michael S. Tsirkin wrote:
> >
> > E.g. kvm can emulate hyperv so it could in theory have hyperv balloon.
> > This is mm stuff it is best not to tie it to specific drivers.
>
> But of course I agree this is not top priority, no need
> to block submission on this, just nice to have.
>
This surely is interesting to look at, in the near future.

2012-08-14 20:08:58

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > What if there is more than one balloon device?
> > >
> > > Is it possible to load this driver twice, or are you foreseeing a future case
> > > where this driver will be able to manage several distinct memory balloons for
> > > the same guest?
> > >
> >
> > Second.
> > It is easy to create several balloons they are just
> > pci devices.
>
>
>
> and it might not be too important to make it work but
> at least would be nice not to have a crash in this
> setup.
>
Fair enough. For now, as I believe it's safe to assume we are only inflating one
balloon per guest, I'd like to propose this as a future enhancement. Sounds
good?

2012-08-14 20:11:33

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> What I think you should do is use rcu for access.
> And here sync rcu before freeing.
> Maybe an overkill but at least a documented synchronization
> primitive, and it is very light weight.
>

I liked your suggestion on barriers, as well.

Rik, Mel ?

2012-08-14 20:22:21

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Tue, Aug 14, 2012 at 05:00:49PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:35:25PM +0300, Michael S. Tsirkin wrote:
> > > > > +/* __isolate_lru_page() counterpart for a ballooned page */
> > > > > +bool isolate_balloon_page(struct page *page)
> > > > > +{
> > > > > + if (WARN_ON(!movable_balloon_page(page)))
> > > >
> > > > Looks like this actually can happen if the page is leaked
> > > > between previous movable_balloon_page and here.
> > > >
> > > > > + return false;
> > >
> > > Yes, it surely can happen, and it does not harm to catch it here, print a warn and
> > > return.
> >
> > If it is legal, why warn? For that matter why test here at all?
> >
>
> As this is a public symbol, and despite the usage we introduce is sane, the warn
> was placed as an insurance policy to let us know about any insane attempt to use
> the procedure in the future. That was due to a nice review nitpick, actually.
>
> Even though the code already had a test to properly avoid this race you
> mention, I thought that sustaining the warn was a good thing. As I told you,
> despite real, I've never got (un)lucky enough to stumble across that race window
> while testing the patch.
>
> If your concern is about being too much verbose on logging, under certain
> conditions, perhaps we can change that test to a WARN_ON_ONCE() ?
>
> Mel, what are your thoughts here?
>
> > > While testing it, I wasn't lucky to see this small window opening, though.


think about it: you see it in log. so you know race triggered.
now what? why is it useful to know this?

2012-08-14 20:23:10

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > What if there is more than one balloon device?
> > > >
> > > > Is it possible to load this driver twice, or are you foreseeing a future case
> > > > where this driver will be able to manage several distinct memory balloons for
> > > > the same guest?
> > > >
> > >
> > > Second.
> > > It is easy to create several balloons they are just
> > > pci devices.
> >
> >
> >
> > and it might not be too important to make it work but
> > at least would be nice not to have a crash in this
> > setup.
> >
> Fair enough. For now, as I believe it's safe to assume we are only inflating one
> balloon per guest, I'd like to propose this as a future enhancement. Sounds
> good?
>

Since guest crashes when it's not the case, no it doesn't, sorry :(.

2012-08-14 20:30:14

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > What if there is more than one balloon device?
> > > > >
> > > > > Is it possible to load this driver twice, or are you foreseeing a future case
> > > > > where this driver will be able to manage several distinct memory balloons for
> > > > > the same guest?
> > > > >
> > > >
> > > > Second.
> > > > It is easy to create several balloons they are just
> > > > pci devices.
> > >
> > >
> > >
> > > and it might not be too important to make it work but
> > > at least would be nice not to have a crash in this
> > > setup.
> > >
> > Fair enough. For now, as I believe it's safe to assume we are only inflating one
> > balloon per guest, I'd like to propose this as a future enhancement. Sounds
> > good?
> >
>
> Since guest crashes when it's not the case, no it doesn't, sorry :(.
>
Ok, but right now this driver only takes care of 1 balloon per guest, so how
could this approach crash it?

Your point is a good thing to be on a to-do list for future enhancements, but
it's not a dealbreaker for the present balloon driver implementation, IMHO.

2012-08-14 20:30:33

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > What I think you should do is use rcu for access.
> > And here sync rcu before freeing.
> > Maybe an overkill but at least a documented synchronization
> > primitive, and it is very light weight.
> >
>
> I liked your suggestion on barriers, as well.
>
> Rik, Mel ?

Further instead of simple assignment I would add an api
in mm to set/clear the balloon mapping, with proper locking.

This could fail if already set, and thus fix crash with
many ballons.

2012-08-14 20:48:25

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > > What if there is more than one balloon device?
> > > > > >
> > > > > > Is it possible to load this driver twice, or are you foreseeing a future case
> > > > > > where this driver will be able to manage several distinct memory balloons for
> > > > > > the same guest?
> > > > > >
> > > > >
> > > > > Second.
> > > > > It is easy to create several balloons they are just
> > > > > pci devices.
> > > >
> > > >
> > > >
> > > > and it might not be too important to make it work but
> > > > at least would be nice not to have a crash in this
> > > > setup.
> > > >
> > > Fair enough. For now, as I believe it's safe to assume we are only inflating one
> > > balloon per guest, I'd like to propose this as a future enhancement. Sounds
> > > good?
> > >
> >
> > Since guest crashes when it's not the case, no it doesn't, sorry :(.
> >
> Ok, but right now this driver only takes care of 1 balloon per guest,

It does? Are you sure? There is no global state as far as I can see. So
I can create 2 devices and driver will happily create two instances,
each one can be inflated/deflated independently.

> so how
> could this approach crash it?

Add device. inflate. Add another device. inflate. deflate. unplug.
Now you have pointer to freed memory and when mm touches
page from first device, you ge use after free.

> Your point is a good thing to be on a to-do list for future enhancements, but
> it's not a dealbreaker for the present balloon driver implementation, IMHO.
>

Yes it looks like a dealbreaker to me.

--
MST

2012-08-14 20:53:36

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > What if there is more than one balloon device?
> > > > > > >
> > > > > > > Is it possible to load this driver twice, or are you foreseeing a future case
> > > > > > > where this driver will be able to manage several distinct memory balloons for
> > > > > > > the same guest?
> > > > > > >
> > > > > >
> > > > > > Second.
> > > > > > It is easy to create several balloons they are just
> > > > > > pci devices.
> > > > >
> > > > >
> > > > >
> > > > > and it might not be too important to make it work but
> > > > > at least would be nice not to have a crash in this
> > > > > setup.
> > > > >
> > > > Fair enough. For now, as I believe it's safe to assume we are only inflating one
> > > > balloon per guest, I'd like to propose this as a future enhancement. Sounds
> > > > good?
> > > >
> > >
> > > Since guest crashes when it's not the case, no it doesn't, sorry :(.
> > >
> > Ok, but right now this driver only takes care of 1 balloon per guest,
>
> It does? Are you sure? There is no global state as far as I can see. So
> I can create 2 devices and driver will happily create two instances,
> each one can be inflated/deflated independently.
>
> > so how
> > could this approach crash it?
>
> Add device. inflate. Add another device. inflate. deflate. unplug.
> Now you have pointer to freed memory and when mm touches
> page from first device, you ge use after free.
>
> > Your point is a good thing to be on a to-do list for future enhancements, but
> > it's not a dealbreaker for the present balloon driver implementation, IMHO.
> >
>
> Yes it looks like a dealbreaker to me.

To clarify, the global state that this patch adds, is ugly
even if we didn't support multiple balloons yet.
So I don't think I can accept such a patch.
Rusty has a final word here, maybe he thinks differently.

> --
> MST

2012-08-14 20:58:31

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote:

> To clarify, the global state that this patch adds, is ugly
> even if we didn't support multiple balloons yet.
> So I don't think I can accept such a patch.
> Rusty has a final word here, maybe he thinks differently.

Before deciding that "does not support multiple balloon drivers
at once" is an issue, is there any use case at all for having
multiple balloon drivers active at a time?

I do not see any.

--
All rights reversed

2012-08-14 21:34:46

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > What if there is more than one balloon device?
> > > > > > >
> > > > > > > Is it possible to load this driver twice, or are you foreseeing a future case
> > > > > > > where this driver will be able to manage several distinct memory balloons for
> > > > > > > the same guest?
> > > > > > >
> > > > > >
> > > > > > Second.
> > > > > > It is easy to create several balloons they are just
> > > > > > pci devices.
> > > > >
> > > > >
> > > > >
> > > > > and it might not be too important to make it work but
> > > > > at least would be nice not to have a crash in this
> > > > > setup.
> > > > >
> > > > Fair enough. For now, as I believe it's safe to assume we are only inflating one
> > > > balloon per guest, I'd like to propose this as a future enhancement. Sounds
> > > > good?
> > > >
> > >
> > > Since guest crashes when it's not the case, no it doesn't, sorry :(.
> > >
> > Ok, but right now this driver only takes care of 1 balloon per guest,
>
> It does? Are you sure? There is no global state as far as I can see. So
> I can create 2 devices and driver will happily create two instances,
> each one can be inflated/deflated independently.
>
> > so how
> > could this approach crash it?
>
> Add device. inflate. Add another device. inflate. deflate. unplug.
> Now you have pointer to freed memory and when mm touches
> page from first device, you ge use after free.
>
> > Your point is a good thing to be on a to-do list for future enhancements, but
> > it's not a dealbreaker for the present balloon driver implementation, IMHO.
> >
>
> Yes it looks like a dealbreaker to me.

Sorry. You're right, I'm wrong.

I'll get back to the scracthpad to overcome this constraint. I believe the way
this patch was at its v4 revision (wrt this particular case) could possibly
address this concern of yours.

2012-08-14 21:37:40

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 04:56:59PM -0400, Rik van Riel wrote:
> On 08/14/2012 04:54 PM, Michael S. Tsirkin wrote:
>
> >To clarify, the global state that this patch adds, is ugly
> >even if we didn't support multiple balloons yet.
> >So I don't think I can accept such a patch.
> >Rusty has a final word here, maybe he thinks differently.
>
> Before deciding that "does not support multiple balloon drivers
> at once" is an issue, is there any use case at all for having
> multiple balloon drivers active at a time?
>
> I do not see any.

For example, we had a proposal for a page-cache backed
device. So it could be useful to have two, a regular balloon
and a pagecache backed one.
There could be other uses - it certainly looks like it
works so how can you be sure it's useless?

And even ignoring that, global pointer to a device
is an ugly hack and ugly hacks tend to explode.

And even ignoring estetics, and if we decide we are fine
with a single balloon, it needs to fail gracefully not
crash like it does now.

> --
> All rights reversed

2012-08-14 22:47:31

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 06:34:13PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 11:49:06PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 14, 2012 at 05:29:50PM -0300, Rafael Aquini wrote:
> > > On Tue, Aug 14, 2012 at 11:24:01PM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 14, 2012 at 05:08:31PM -0300, Rafael Aquini wrote:
> > > > > On Tue, Aug 14, 2012 at 10:59:16PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > What if there is more than one balloon device?
> > > > > > > >
> > > > > > > > Is it possible to load this driver twice, or are you foreseeing a future case
> > > > > > > > where this driver will be able to manage several distinct memory balloons for
> > > > > > > > the same guest?
> > > > > > > >
> > > > > > >
> > > > > > > Second.
> > > > > > > It is easy to create several balloons they are just
> > > > > > > pci devices.
> > > > > >
> > > > > >
> > > > > >
> > > > > > and it might not be too important to make it work but
> > > > > > at least would be nice not to have a crash in this
> > > > > > setup.
> > > > > >
> > > > > Fair enough. For now, as I believe it's safe to assume we are only inflating one
> > > > > balloon per guest, I'd like to propose this as a future enhancement. Sounds
> > > > > good?
> > > > >
> > > >
> > > > Since guest crashes when it's not the case, no it doesn't, sorry :(.
> > > >
> > > Ok, but right now this driver only takes care of 1 balloon per guest,
> >
> > It does? Are you sure? There is no global state as far as I can see. So
> > I can create 2 devices and driver will happily create two instances,
> > each one can be inflated/deflated independently.
> >
> > > so how
> > > could this approach crash it?
> >
> > Add device. inflate. Add another device. inflate. deflate. unplug.
> > Now you have pointer to freed memory and when mm touches
> > page from first device, you ge use after free.
> >
> > > Your point is a good thing to be on a to-do list for future enhancements, but
> > > it's not a dealbreaker for the present balloon driver implementation, IMHO.
> > >
> >
> > Yes it looks like a dealbreaker to me.
>
> Sorry. You're right, I'm wrong.
>
> I'll get back to the scracthpad to overcome this constraint. I believe the way
> this patch was at its v4 revision (wrt this particular case) could possibly
> address this concern of yours.

Almost. We still have a global balloon_mapping. The only reason for
it to exist seems solely to detect balloon mappings, so it
can just be replaced by a flag in the mapping, or in mapping
ops, or elsewhere. Also, please add APIs to mm so we can
avoid doing internal mm stuff like

INIT_RADIX_TREE(&balloon_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);

in the driver. It should be
alloc_address_mapping(&virtio_balloon_aops);
free_address_mapping

Make page->mapping use rcu, and sync rcu in
free_address_mapping.

--
MST

2012-08-15 03:52:01

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, 14 Aug 2012 11:33:20 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
> > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > > > +/*
> > > > + * Populate balloon_mapping->a_ops->freepage method to help compaction on
> > > > + * re-inserting an isolated page into the balloon page list.
> > > > + */
> > > > +void virtballoon_putbackpage(struct page *page)
> > > > +{
> > > > + spin_lock(&pages_lock);
> > > > + list_add(&page->lru, &vb_ptr->pages);
> > > > + spin_unlock(&pages_lock);
> > >
> > > Could the following race trigger:
> > > migration happens while module unloading is in progress,
> > > module goes away between here and when the function
> > > returns, then code for this function gets overwritten?
> > > If yes we need locking external to module to prevent this.
> > > Maybe add a spinlock to struct address_space?
> >
> > The balloon module cannot be unloaded until it has leaked all its pages,
> > so I think this is safe:
> >
> > static void remove_common(struct virtio_balloon *vb)
> > {
> > /* There might be pages left in the balloon: free them. */
> > while (vb->num_pages)
> > leak_balloon(vb, vb->num_pages);
> >
> > Cheers,
> > Rusty.
>
> I know I meant something else.
> Let me lay this out:
>
> CPU1 executes:
> void virtballoon_putbackpage(struct page *page)
> {
> spin_lock(&pages_lock);
> list_add(&page->lru, &vb_ptr->pages);
> spin_unlock(&pages_lock);
>
>
> at this point CPU2 unloads module:
> leak_balloon
> ......
>
> next CPU2 loads another module so code memory gets overwritten
>
> now CPU1 executes the next instruction:
>
> }
>
> which would normally return to function's caller,
> but it has been overwritten by CPU2 so we get corruption.

Actually, I have no idea.

Where does virtballoon_putbackpage get called from? It's some weird mm
thing, and I stay out of that mess.

The vb thread is stopped before we spin checking vb->num_pages, so it's
not touching pages; who would be calling this?

Confused,
Rusty.

2012-08-15 08:52:29

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

On Tue, Aug 14, 2012 at 05:00:49PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:35:25PM +0300, Michael S. Tsirkin wrote:
> > > > > +/* __isolate_lru_page() counterpart for a ballooned page */
> > > > > +bool isolate_balloon_page(struct page *page)
> > > > > +{
> > > > > + if (WARN_ON(!movable_balloon_page(page)))
> > > >
> > > > Looks like this actually can happen if the page is leaked
> > > > between previous movable_balloon_page and here.
> > > >
> > > > > + return false;
> > >
> > > Yes, it surely can happen, and it does not harm to catch it here, print a warn and
> > > return.
> >
> > If it is legal, why warn? For that matter why test here at all?
> >
>
> As this is a public symbol, and despite the usage we introduce is sane, the warn
> was placed as an insurance policy to let us know about any insane attempt to use
> the procedure in the future. That was due to a nice review nitpick, actually.
>
> Even though the code already had a test to properly avoid this race you
> mention, I thought that sustaining the warn was a good thing. As I told you,
> despite real, I've never got (un)lucky enough to stumble across that race window
> while testing the patch.
>
> If your concern is about being too much verbose on logging, under certain
> conditions, perhaps we can change that test to a WARN_ON_ONCE() ?
>
> Mel, what are your thoughts here?
>

I viewed it as being defensive programming. VM_BUG_ON would be less
useful as it can be compiled out. If the race can be routinely hit then
multiple warnings is instructive in itself. I have no strong feelings
about this though. I see little harm in making the check but in light of
this conversation add a short comment explaining that the check should
be redundant.

--
Mel Gorman
SUSE Labs

2012-08-15 09:05:34

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > What I think you should do is use rcu for access.
> > And here sync rcu before freeing.
> > Maybe an overkill but at least a documented synchronization
> > primitive, and it is very light weight.
> >
>
> I liked your suggestion on barriers, as well.
>

I have not thought about this as deeply as I shouold but is simply rechecking
the mapping under the pages_lock to make sure the page is still a balloon
page an option? i.e. use pages_lock to stabilise page->mapping.

--
Mel Gorman
SUSE Labs

2012-08-15 09:24:43

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > What I think you should do is use rcu for access.
> > > And here sync rcu before freeing.
> > > Maybe an overkill but at least a documented synchronization
> > > primitive, and it is very light weight.
> > >
> >
> > I liked your suggestion on barriers, as well.
> >
>
> I have not thought about this as deeply as I shouold but is simply rechecking
> the mapping under the pages_lock to make sure the page is still a balloon
> page an option? i.e. use pages_lock to stabilise page->mapping.

To clarify, are you concerned about cost of rcu_read_lock
for non balloon pages?

> --
> Mel Gorman
> SUSE Labs

2012-08-15 09:48:44

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > What I think you should do is use rcu for access.
> > > > And here sync rcu before freeing.
> > > > Maybe an overkill but at least a documented synchronization
> > > > primitive, and it is very light weight.
> > > >
> > >
> > > I liked your suggestion on barriers, as well.
> > >
> >
> > I have not thought about this as deeply as I shouold but is simply rechecking
> > the mapping under the pages_lock to make sure the page is still a balloon
> > page an option? i.e. use pages_lock to stabilise page->mapping.
>
> To clarify, are you concerned about cost of rcu_read_lock
> for non balloon pages?
>

Not as such, but given the choice between introducing RCU locking and
rechecking page->mapping under a spinlock I would choose the latter as it
is more straight-forward.

--
Mel Gorman
SUSE Labs

2012-08-15 10:00:25

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
> On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > > What I think you should do is use rcu for access.
> > > > > And here sync rcu before freeing.
> > > > > Maybe an overkill but at least a documented synchronization
> > > > > primitive, and it is very light weight.
> > > > >
> > > >
> > > > I liked your suggestion on barriers, as well.
> > > >
> > >
> > > I have not thought about this as deeply as I shouold but is simply rechecking
> > > the mapping under the pages_lock to make sure the page is still a balloon
> > > page an option? i.e. use pages_lock to stabilise page->mapping.
> >
> > To clarify, are you concerned about cost of rcu_read_lock
> > for non balloon pages?
> >
>
> Not as such, but given the choice between introducing RCU locking and
> rechecking page->mapping under a spinlock I would choose the latter as it
> is more straight-forward.

OK but checking it how? page->mapping == balloon_mapping does not scale to
multiple balloons, so I hoped we can switch to
page->mapping->flags & BALLOON_MAPPING or some such,
but this means we dereference it outside the lock ...

We will also need to add some API to set/clear mapping so that driver
does not need to poke in mm internals, but that's easy.

> --
> Mel Gorman
> SUSE Labs

2012-08-15 11:16:58

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
> > On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > > > What I think you should do is use rcu for access.
> > > > > > And here sync rcu before freeing.
> > > > > > Maybe an overkill but at least a documented synchronization
> > > > > > primitive, and it is very light weight.
> > > > > >
> > > > >
> > > > > I liked your suggestion on barriers, as well.
> > > > >
> > > >
> > > > I have not thought about this as deeply as I shouold but is simply rechecking
> > > > the mapping under the pages_lock to make sure the page is still a balloon
> > > > page an option? i.e. use pages_lock to stabilise page->mapping.
> > >
> > > To clarify, are you concerned about cost of rcu_read_lock
> > > for non balloon pages?
> > >
> >
> > Not as such, but given the choice between introducing RCU locking and
> > rechecking page->mapping under a spinlock I would choose the latter as it
> > is more straight-forward.
>
> OK but checking it how? page->mapping == balloon_mapping does not scale to
> multiple balloons,

I was thinking of exactly that page->mapping == balloon_mapping check. As I
do not know how many active balloon drivers there might be I cannot guess
in advance how much of a scalability problem it will be.

> so I hoped we can switch to
> page->mapping->flags & BALLOON_MAPPING or some such,
> but this means we dereference it outside the lock ...
>

That also sounded like future stuff to me that would be justified with
profiling if necessary. Personally I would have started with the spinlock
and a simple check and moved to RCU later when either scalability was a
problem or it was found there was a need to stabilise whether a page was
a balloon page or not outside a spinlock.

This is not a NAK to the idea and I'm not objecting to RCU being used now
if that is what is really desired. I just suspect it's making the series
more complex than it needs to be right now.

--
Mel Gorman
SUSE Labs

2012-08-15 11:28:04

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
> On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
> > > On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > > > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > > > > What I think you should do is use rcu for access.
> > > > > > > And here sync rcu before freeing.
> > > > > > > Maybe an overkill but at least a documented synchronization
> > > > > > > primitive, and it is very light weight.
> > > > > > >
> > > > > >
> > > > > > I liked your suggestion on barriers, as well.
> > > > > >
> > > > >
> > > > > I have not thought about this as deeply as I shouold but is simply rechecking
> > > > > the mapping under the pages_lock to make sure the page is still a balloon
> > > > > page an option? i.e. use pages_lock to stabilise page->mapping.
> > > >
> > > > To clarify, are you concerned about cost of rcu_read_lock
> > > > for non balloon pages?
> > > >
> > >
> > > Not as such, but given the choice between introducing RCU locking and
> > > rechecking page->mapping under a spinlock I would choose the latter as it
> > > is more straight-forward.
> >
> > OK but checking it how? page->mapping == balloon_mapping does not scale to
> > multiple balloons,
>
> I was thinking of exactly that page->mapping == balloon_mapping check. As I
> do not know how many active balloon drivers there might be I cannot guess
> in advance how much of a scalability problem it will be.

Not at all sure multiple drivers are worth supporting, but multiple
*devices* is I think worth supporting, if for no other reason than that
they can work today. For that, we need a device pointer which Rafael
wants to put into the mapping, this means multiple balloon mappings.


> > so I hoped we can switch to
> > page->mapping->flags & BALLOON_MAPPING or some such,
> > but this means we dereference it outside the lock ...
> >
>
> That also sounded like future stuff to me that would be justified with
> profiling if necessary. Personally I would have started with the spinlock
> and a simple check and moved to RCU later when either scalability was a
> problem or it was found there was a need to stabilise whether a page was
> a balloon page or not outside a spinlock.
>
> This is not a NAK to the idea and I'm not objecting to RCU being used now
> if that is what is really desired. I just suspect it's making the series
> more complex than it needs to be right now.
>
> --
> Mel Gorman
> SUSE Labs

2012-08-15 12:35:25

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
> > > now CPU1 executes the next instruction:
> > >
> > > }
> > >
> > > which would normally return to function's caller,
> > > but it has been overwritten by CPU2 so we get corruption.
> > >
> > > No?
> >
> > At the point CPU2 is unloading the module, it will be kept looping at the
> > snippet Rusty pointed out because the isolation / migration steps do not mess
> > with 'vb->num_pages'. The driver will only unload after leaking the total amount
> > of balloon's inflated pages, which means (for this hypothetical case) CPU2 will
> > wait until CPU1 finishes the putaback procedure.
> >
>
> Yes but only until unlock finishes. The last return from function
> is not guarded and can be overwritten.

CPU1 will be returning to putback_balloon_page() which code is located at core
mm/compaction.c, outside the driver.

2012-08-15 14:39:35

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
> > > > now CPU1 executes the next instruction:
> > > >
> > > > }
> > > >
> > > > which would normally return to function's caller,
> > > > but it has been overwritten by CPU2 so we get corruption.
> > > >
> > > > No?
> > >
> > > At the point CPU2 is unloading the module, it will be kept looping at the
> > > snippet Rusty pointed out because the isolation / migration steps do not mess
> > > with 'vb->num_pages'. The driver will only unload after leaking the total amount
> > > of balloon's inflated pages, which means (for this hypothetical case) CPU2 will
> > > wait until CPU1 finishes the putaback procedure.
> > >
> >
> > Yes but only until unlock finishes. The last return from function
> > is not guarded and can be overwritten.
>
> CPU1 will be returning to putback_balloon_page() which code is located at core
> mm/compaction.c, outside the driver.

Sorry, I don't seem to be able to articulate this clearly.
But this is a correctness issue so I am compelled to try again.

Here is some pseudo code:

int pages_lock;

void virtballoon_isolatepage(void *page, unsigned long mode)
{
pages_lock = 0;
}

assignment of 0 emulates spin unlock.
I removed all other content. Now look at disassembly:

080483d0 <virtballoon_isolatepage>:
virtballoon_isolatepage():
80483d0: c7 05 88 96 04 08 00 movl $0x0,0x8049688
80483d7: 00 00 00

<----------- Above is "spin unlock"

80483da: c3 ret

^^^^
Here we are still executing module code (one instruction of it!)
after we have dropped the lock.


So if module goes away at the point marked by <--------
above (and nothing seems to prevent that,
since pages_lock is unlocked), the last instruction can get overwritten
and then random code will get executed instead.

In the end the rule is simple: you can not
prevent module unloading from within module
itself. It always must be the caller of your
module that uses some lock to do this.

My proposal is to use rcu for this since it is lightweight
and also does not require us to pass extra
state around.

--
MST

2012-08-15 15:32:44

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On 08/14/2012 05:38 PM, Michael S. Tsirkin wrote:

> And even ignoring that, global pointer to a device
> is an ugly hack and ugly hacks tend to explode.
>
> And even ignoring estetics, and if we decide we are fine
> with a single balloon, it needs to fail gracefully not
> crash like it does now.

Fair enough. That certainly seems easy enough to fix.

Each balloon driver can have its own struct address_space,
and simply point mapping->host (or any of the others) at
a global balloon thing somewhere.

if (page->mapping && page->mapping->host == balloon_magic)

2012-08-20 05:01:28

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Wed, 15 Aug 2012 17:40:19 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
> > > > > now CPU1 executes the next instruction:
> > > > >
> > > > > }
> > > > >
> > > > > which would normally return to function's caller,
> > > > > but it has been overwritten by CPU2 so we get corruption.
> > > > >
> > > > > No?
> > > >
> > > > At the point CPU2 is unloading the module, it will be kept looping at the
> > > > snippet Rusty pointed out because the isolation / migration steps do not mess
> > > > with 'vb->num_pages'. The driver will only unload after leaking the total amount
> > > > of balloon's inflated pages, which means (for this hypothetical case) CPU2 will
> > > > wait until CPU1 finishes the putaback procedure.
> > > >
> > >
> > > Yes but only until unlock finishes. The last return from function
> > > is not guarded and can be overwritten.
> >
> > CPU1 will be returning to putback_balloon_page() which code is located at core
> > mm/compaction.c, outside the driver.
>
> Sorry, I don't seem to be able to articulate this clearly.
> But this is a correctness issue so I am compelled to try again.

But if there are 0 balloon pages, how is it migrating a page?

> In the end the rule is simple: you can not
> prevent module unloading from within module
> itself. It always must be the caller of your
> module that uses some lock to do this.

Not quite. If you clean up everything in your cleanup function, it also
works, which is what this does, right?

Cheers,
Rusty.

2012-08-20 05:11:42

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Mon, Aug 20, 2012 at 11:59:11AM +0930, Rusty Russell wrote:
> On Wed, 15 Aug 2012 17:40:19 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> > On Wed, Aug 15, 2012 at 09:34:58AM -0300, Rafael Aquini wrote:
> > > On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
> > > > > > now CPU1 executes the next instruction:
> > > > > >
> > > > > > }
> > > > > >
> > > > > > which would normally return to function's caller,
> > > > > > but it has been overwritten by CPU2 so we get corruption.
> > > > > >
> > > > > > No?
> > > > >
> > > > > At the point CPU2 is unloading the module, it will be kept looping at the
> > > > > snippet Rusty pointed out because the isolation / migration steps do not mess
> > > > > with 'vb->num_pages'. The driver will only unload after leaking the total amount
> > > > > of balloon's inflated pages, which means (for this hypothetical case) CPU2 will
> > > > > wait until CPU1 finishes the putaback procedure.
> > > > >
> > > >
> > > > Yes but only until unlock finishes. The last return from function
> > > > is not guarded and can be overwritten.
> > >
> > > CPU1 will be returning to putback_balloon_page() which code is located at core
> > > mm/compaction.c, outside the driver.
> >
> > Sorry, I don't seem to be able to articulate this clearly.
> > But this is a correctness issue so I am compelled to try again.
>
> But if there are 0 balloon pages, how is it migrating a page?

It could be we just finished migrating a page
dropped page lock and are 1 instruction away from
returning from callback.

> > In the end the rule is simple: you can not
> > prevent module unloading from within module
> > itself. It always must be the caller of your
> > module that uses some lock to do this.
>
> Not quite. If you clean up everything in your cleanup function, it also
> works,

No, we also need a way to make sure we returned
to caller, this is missing here.

> which is what this does, right?
>
> Cheers,
> Rusty.


This makes sure callback was invoked but not that it returned
to caller.

All will be well if callbacks are done in rcu critical section
and we synchronise it before unload.


--
MST

2012-08-21 23:42:46

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Wed, 15 Aug 2012 14:28:51 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
> > I was thinking of exactly that page->mapping == balloon_mapping check. As I
> > do not know how many active balloon drivers there might be I cannot guess
> > in advance how much of a scalability problem it will be.
>
> Not at all sure multiple drivers are worth supporting, but multiple
> *devices* is I think worth supporting, if for no other reason than that
> they can work today. For that, we need a device pointer which Rafael
> wants to put into the mapping, this means multiple balloon mappings.

Rafael, please make sure that the balloon driver fails on the second and
subsequent balloon devices.

Michael, we only allow multiple balloon devices because it fell out of
the implementation. If it causes us even the slightest issue, we should
not support it. It's not a sensible setup.

Cheers,
Rusty.

2012-08-22 00:36:42

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

On Tue, Aug 21, 2012 at 03:01:37PM +0930, Rusty Russell wrote:
> On Wed, 15 Aug 2012 14:28:51 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> > On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
> > > I was thinking of exactly that page->mapping == balloon_mapping check. As I
> > > do not know how many active balloon drivers there might be I cannot guess
> > > in advance how much of a scalability problem it will be.
> >
> > Not at all sure multiple drivers are worth supporting, but multiple
> > *devices* is I think worth supporting, if for no other reason than that
> > they can work today. For that, we need a device pointer which Rafael
> > wants to put into the mapping, this means multiple balloon mappings.
>
> Rafael, please make sure that the balloon driver fails on the second and
> subsequent balloon devices.
>
> Michael, we only allow multiple balloon devices because it fell out of
> the implementation. If it causes us even the slightest issue, we should
> not support it. It's not a sensible setup.
>
> Cheers,
> Rusty.

Looks like latest revision does it using a flag which seems cleaner,
so I think the point is moot.

--
MST