2020-07-16 02:43:05

by Hui Zhu

[permalink] [raw]
Subject: [RFC for Linux v4 0/2] virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES to report continuous pages

The first, second and third version are in [1], [2] and [3].
Code of current version for Linux and qemu is available in [4] and [5].
Update of this version:
1. Report continuous pages will increase the speed. So added deflate
continuous pages.
2. According to the comments from David in [6], added 2 new vqs inflate_cont_vq
and deflate_cont_vq to report continuous pages with format 32 bits pfn and 32
bits size.
Following is the introduction of the function.
These patches add VIRTIO_BALLOON_F_CONT_PAGES to virtio_balloon. With this
flag, balloon tries to use continuous pages to inflate and deflate.
Opening this flag can bring two benefits:
1. Report continuous pages will increase memory report size of each time
call tell_host. Then it will increase the speed of balloon inflate and
deflate.
2. Host THPs will be splitted when qemu release the page of balloon inflate.
Inflate balloon with continuous pages will let QEMU release the pages
of same THPs. That will help decrease the splitted THPs number in
the host.
Following is an example in a VM with 1G memory 1CPU. This test setups an
environment that has a lot of fragmentation pages. Then inflate balloon will
split the THPs.
// This is the THP number before VM execution in the host.
// None use THP.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages: 0 kB
// After VM start, use usemem
// (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git)
// punch-holes function generates 400m fragmentation pages in the guest
// kernel.
usemem --punch-holes -s -1 800m &
// This is the THP number after this command in the host.
// Some THP is used by VM because usemem will access 800M memory
// in the guest.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages: 911360 kB
// Connect to the QEMU monitor, setup balloon, and set it size to 600M.
(qemu) device_add virtio-balloon-pci,id=balloon1
(qemu) info balloon
balloon: actual=1024
(qemu) balloon 600
(qemu) info balloon
balloon: actual=600
// This is the THP number after inflate the balloon in the host.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages: 88064 kB
// Set the size back to 1024M in the QEMU monitor.
(qemu) balloon 1024
(qemu) info balloon
balloon: actual=1024
// Use usemem to increase the memory usage of QEMU.
killall usemem
usemem 800m
// This is the THP number after this operation.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages: 65536 kB

Following example change to use continuous pages balloon. The number of
splitted THPs is decreased.
// This is the THP number before VM execution in the host.
// None use THP.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages: 0 kB
// After VM start, use usemem punch-holes function generates 400M
// fragmentation pages in the guest kernel.
usemem --punch-holes -s -1 800m &
// This is the THP number after this command in the host.
// Some THP is used by VM because usemem will access 800M memory
// in the guest.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages: 911360 kB
// Connect to the QEMU monitor, setup balloon, and set it size to 600M.
(qemu) device_add virtio-balloon-pci,id=balloon1,cont-pages=on
(qemu) info balloon
balloon: actual=1024
(qemu) balloon 600
(qemu) info balloon
balloon: actual=600
// This is the THP number after inflate the balloon in the host.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages: 616448 kB
// Set the size back to 1024M in the QEMU monitor.
(qemu) balloon 1024
(qemu) info balloon
balloon: actual=1024
// Use usemem to increase the memory usage of QEMU.
killall usemem
usemem 800m
// This is the THP number after this operation.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages: 907264 kB

[1] https://lkml.org/lkml/2020/3/12/144
[2] https://lore.kernel.org/linux-mm/[email protected]/
[3] https://lkml.org/lkml/2020/5/12/324
[4] https://github.com/teawater/linux/tree/balloon_conts
[5] https://github.com/teawater/qemu/tree/balloon_conts
[6] https://lkml.org/lkml/2020/5/13/1211

Hui Zhu (2):
virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES and inflate_cont_vq
virtio_balloon: Add deflate_cont_vq to deflate continuous pages

drivers/virtio/virtio_balloon.c | 180 +++++++++++++++++++++++++++++++-----
include/linux/balloon_compaction.h | 12 ++
include/uapi/linux/virtio_balloon.h | 1
mm/balloon_compaction.c | 117 +++++++++++++++++++++--
4 files changed, 280 insertions(+), 30 deletions(-)


2020-07-16 02:43:12

by Hui Zhu

[permalink] [raw]
Subject: [RFC for Linux v4 1/2] virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES and inflate_cont_vq

This commit adds a new flag VIRTIO_BALLOON_F_CONT_PAGES to virtio_balloon.
Add it adds a vq inflate_cont_vq to inflate continuous pages.
When VIRTIO_BALLOON_F_CONT_PAGES is set, try to allocate continuous pages
and report them use inflate_cont_vq.

Signed-off-by: Hui Zhu <[email protected]>
---
drivers/virtio/virtio_balloon.c | 119 ++++++++++++++++++++++++++++++------
include/linux/balloon_compaction.h | 9 ++-
include/uapi/linux/virtio_balloon.h | 1 +
mm/balloon_compaction.c | 41 ++++++++++---
4 files changed, 142 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 1f157d2..b89f566 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -42,6 +42,9 @@
(1 << (VIRTIO_BALLOON_HINT_BLOCK_ORDER + PAGE_SHIFT))
#define VIRTIO_BALLOON_HINT_BLOCK_PAGES (1 << VIRTIO_BALLOON_HINT_BLOCK_ORDER)

+#define VIRTIO_BALLOON_INFLATE_MAX_ORDER min((int) (sizeof(__virtio32) * BITS_PER_BYTE - \
+ 1 - PAGE_SHIFT), (MAX_ORDER-1))
+
#ifdef CONFIG_BALLOON_COMPACTION
static struct vfsmount *balloon_mnt;
#endif
@@ -52,6 +55,7 @@ enum virtio_balloon_vq {
VIRTIO_BALLOON_VQ_STATS,
VIRTIO_BALLOON_VQ_FREE_PAGE,
VIRTIO_BALLOON_VQ_REPORTING,
+ VIRTIO_BALLOON_VQ_INFLATE_CONT,
VIRTIO_BALLOON_VQ_MAX
};

@@ -61,7 +65,7 @@ enum virtio_balloon_config_read {

struct virtio_balloon {
struct virtio_device *vdev;
- struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+ struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq, *inflate_cont_vq;

/* Balloon's own wq for cpu-intensive work items */
struct workqueue_struct *balloon_wq;
@@ -126,6 +130,9 @@ struct virtio_balloon {
/* Free page reporting device */
struct virtqueue *reporting_vq;
struct page_reporting_dev_info pr_dev_info;
+
+ /* Current order of inflate continuous pages - VIRTIO_BALLOON_F_CONT_PAGES */
+ __u32 current_pages_order;
};

static struct virtio_device_id id_table[] = {
@@ -208,19 +215,59 @@ static void set_page_pfns(struct virtio_balloon *vb,
page_to_balloon_pfn(page) + i);
}

+static void set_page_pfns_order(struct virtio_balloon *vb,
+ __virtio32 pfns[], struct page *page,
+ unsigned int order)
+{
+ if (order == 0)
+ return set_page_pfns(vb, pfns, page);
+
+ /* Set the first pfn of the continuous pages. */
+ pfns[0] = cpu_to_virtio32(vb->vdev, page_to_balloon_pfn(page));
+ /* Set the size of the continuous pages. */
+ pfns[1] = PAGE_SIZE << order;
+}
+
static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
{
unsigned num_allocated_pages;
- unsigned num_pfns;
+ unsigned int num_pfns, pfn_per_alloc;
struct page *page;
LIST_HEAD(pages);
+ bool is_cont = vb->current_pages_order != 0;

- /* We can only do one array worth at a time. */
- num = min(num, ARRAY_SIZE(vb->pfns));
-
- for (num_pfns = 0; num_pfns < num;
- num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- struct page *page = balloon_page_alloc();
+ if (is_cont)
+ pfn_per_alloc = 2;
+ else
+ pfn_per_alloc = VIRTIO_BALLOON_PAGES_PER_PAGE;
+
+ for (num_pfns = 0, num_allocated_pages = 0;
+ num_pfns < ARRAY_SIZE(vb->pfns) && num_allocated_pages < num;
+ num_pfns += pfn_per_alloc,
+ num_allocated_pages += VIRTIO_BALLOON_PAGES_PER_PAGE << vb->current_pages_order) {
+ struct page *page;
+
+ for (; vb->current_pages_order >= 0; vb->current_pages_order--) {
+ if (vb->current_pages_order &&
+ num - num_allocated_pages <
+ VIRTIO_BALLOON_PAGES_PER_PAGE << vb->current_pages_order)
+ continue;
+ page = balloon_pages_alloc(vb->current_pages_order);
+ if (page) {
+ /* If the first allocated page is not continuous pages,
+ * go back to transport page as signle page.
+ */
+ if (is_cont && num_pfns == 0 && !vb->current_pages_order) {
+ is_cont = false;
+ pfn_per_alloc = VIRTIO_BALLOON_PAGES_PER_PAGE;
+ }
+ set_page_private(page, vb->current_pages_order);
+ balloon_page_push(&pages, page);
+ break;
+ }
+ if (!vb->current_pages_order)
+ break;
+ }

if (!page) {
dev_info_ratelimited(&vb->vdev->dev,
@@ -230,8 +277,6 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
msleep(200);
break;
}
-
- balloon_page_push(&pages, page);
}

mutex_lock(&vb->balloon_lock);
@@ -239,20 +284,34 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
vb->num_pfns = 0;

while ((page = balloon_page_pop(&pages))) {
- balloon_page_enqueue(&vb->vb_dev_info, page);
+ unsigned int order = page_private(page);
+
+ set_page_private(page, 0);
+
+ /* Split the continuous pages because they will be freed
+ * by release_pages_balloon respectively.
+ */
+ if (order)
+ split_page(page, order);
+
+ balloon_pages_enqueue(&vb->vb_dev_info, page, order);
+
+ set_page_pfns_order(vb, vb->pfns + vb->num_pfns, page, order);

- set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
- vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
if (!virtio_has_feature(vb->vdev,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
- adjust_managed_page_count(page, -1);
- vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
+ adjust_managed_page_count(page, -(1 << order));
+ vb->num_pfns += pfn_per_alloc;
}
+ vb->num_pages += num_allocated_pages;

- num_allocated_pages = vb->num_pfns;
/* Did we get any? */
- if (vb->num_pfns != 0)
- tell_host(vb, vb->inflate_vq);
+ if (vb->num_pfns != 0) {
+ if (is_cont)
+ tell_host(vb, vb->inflate_cont_vq);
+ else
+ tell_host(vb, vb->inflate_vq);
+ }
mutex_unlock(&vb->balloon_lock);

return num_allocated_pages;
@@ -488,7 +547,7 @@ static void update_balloon_size_func(struct work_struct *work)
diff = towards_target(vb);

if (!diff)
- return;
+ goto stop_out;

if (diff > 0)
diff -= fill_balloon(vb, diff);
@@ -498,6 +557,11 @@ static void update_balloon_size_func(struct work_struct *work)

if (diff)
queue_work(system_freezable_wq, work);
+ else {
+stop_out:
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES))
+ vb->current_pages_order = VIRTIO_BALLOON_INFLATE_MAX_ORDER;
+ }
}

static int init_vqs(struct virtio_balloon *vb)
@@ -521,6 +585,8 @@ static int init_vqs(struct virtio_balloon *vb)
callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
names[VIRTIO_BALLOON_VQ_REPORTING] = NULL;
+ names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
+ callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;

if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
names[VIRTIO_BALLOON_VQ_STATS] = "stats";
@@ -537,6 +603,11 @@ static int init_vqs(struct virtio_balloon *vb)
callbacks[VIRTIO_BALLOON_VQ_REPORTING] = balloon_ack;
}

+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
+ names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = "inflate_cont";
+ callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = balloon_ack;
+ }
+
err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
vqs, callbacks, names, NULL, NULL);
if (err)
@@ -572,6 +643,10 @@ static int init_vqs(struct virtio_balloon *vb)
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING))
vb->reporting_vq = vqs[VIRTIO_BALLOON_VQ_REPORTING];

+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES))
+ vb->inflate_cont_vq
+ = vqs[VIRTIO_BALLOON_VQ_INFLATE_CONT];
+
return 0;
}

@@ -997,6 +1072,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
goto out_unregister_oom;
}

+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES))
+ vb->current_pages_order = VIRTIO_BALLOON_INFLATE_MAX_ORDER;
+ else
+ vb->current_pages_order = 0;
+
virtio_device_ready(vdev);

if (towards_target(vb))
@@ -1131,6 +1211,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
VIRTIO_BALLOON_F_PAGE_POISON,
VIRTIO_BALLOON_F_REPORTING,
+ VIRTIO_BALLOON_F_CONT_PAGES,
};

static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index 338aa27..8180bbf 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -60,15 +60,22 @@ struct balloon_dev_info {
struct inode *inode;
};

-extern struct page *balloon_page_alloc(void);
+extern struct page *balloon_pages_alloc(unsigned int order);
extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
struct page *page);
+extern void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
+ struct page *page, unsigned int order);
extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
struct list_head *pages);
extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
struct list_head *pages, size_t n_req_pages);

+static inline struct page *balloon_page_alloc(void)
+{
+ return balloon_pages_alloc(0);
+}
+
static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
{
balloon->isolated_pages = 0;
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index dc3e656..4d0151a 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -37,6 +37,7 @@
#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
#define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */
#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
+#define VIRTIO_BALLOON_F_CONT_PAGES 6 /* VQ to report continuous pages */

/* Size of a PFN in the balloon interface. */
#define VIRTIO_BALLOON_PFN_SHIFT 12
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 26de020..397d0b9 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -112,8 +112,8 @@ size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);

/*
- * balloon_page_alloc - allocates a new page for insertion into the balloon
- * page list.
+ * balloon_pages_alloc - allocates a new page for insertion into the balloon
+ * page list.
*
* Driver must call this function to properly allocate a new balloon page.
* Driver must call balloon_page_enqueue before definitively removing the page
@@ -121,14 +121,19 @@ EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
*
* Return: struct page for the allocated page or NULL on allocation failure.
*/
-struct page *balloon_page_alloc(void)
+struct page *balloon_pages_alloc(unsigned int order)
{
- struct page *page = alloc_page(balloon_mapping_gfp_mask() |
- __GFP_NOMEMALLOC | __GFP_NORETRY |
- __GFP_NOWARN);
- return page;
+ gfp_t gfp_mask;
+
+ if (order > 1)
+ gfp_mask = __GFP_RETRY_MAYFAIL;
+ else
+ gfp_mask = __GFP_NORETRY;
+
+ return alloc_pages(balloon_mapping_gfp_mask() |
+ gfp_mask | __GFP_NOMEMALLOC | __GFP_NOWARN, order);
}
-EXPORT_SYMBOL_GPL(balloon_page_alloc);
+EXPORT_SYMBOL_GPL(balloon_pages_alloc);

/*
* balloon_page_enqueue - inserts a new page into the balloon page list.
@@ -155,6 +160,26 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
EXPORT_SYMBOL_GPL(balloon_page_enqueue);

/*
+ * balloon_pages_enqueue - inserts continuous pages into the balloon page list.
+ */
+void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
+ struct page *page, unsigned int order)
+{
+ unsigned long flags, pfn, last_pfn;
+
+ pfn = page_to_pfn(page);
+ last_pfn = pfn + (1 << order) - 1;
+
+ spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+ for (; pfn <= last_pfn; pfn++) {
+ page = pfn_to_page(pfn);
+ balloon_page_enqueue_one(b_dev_info, page);
+ }
+ spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+}
+EXPORT_SYMBOL_GPL(balloon_pages_enqueue);
+
+/*
* balloon_page_dequeue - removes a page from balloon's page list and returns
* its address to allow the driver to release the page.
* @b_dev_info: balloon device decriptor where we will grab a page from.
--
2.7.4

2020-07-16 02:43:14

by Hui Zhu

[permalink] [raw]
Subject: [RFC for qemu v4 0/2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_F_CONT_PAGES

Code of current version for Linux and qemu is available in [1] and [2].
Update of this version:
1. Report continuous pages will increase the speed. So added deflate
continuous pages.
2. According to the comments from David in [3], added 2 new vqs icvq and
dcvq to get continuous pages with format 32 bits pfn and 32 bits size.

Following is the introduction of the function.
Set option cont-pages to on will open flags VIRTIO_BALLOON_F_CONT_PAGES.
qemu will get continuous pages from icvq and dcvq and do madvise
MADV_WILLNEED and MADV_DONTNEED with the pages.
Opening this flag can bring two benefits:
1. Increase the speed of balloon inflate and deflate.
2. Decrease the splitted THPs number in the host.

[1] https://github.com/teawater/linux/tree/balloon_conts
[2] https://github.com/teawater/qemu/tree/balloon_conts
[3] https://lkml.org/lkml/2020/5/13/1211

Hui Zhu (2):
virtio_balloon: Add cont-pages and icvq
virtio_balloon: Add dcvq to deflate continuous pages

hw/virtio/virtio-balloon.c | 92 +++++++++++++++---------
include/hw/virtio/virtio-balloon.h | 2
include/standard-headers/linux/virtio_balloon.h | 1
3 files changed, 63 insertions(+), 32 deletions(-)

2020-07-16 02:43:48

by Hui Zhu

[permalink] [raw]
Subject: [RFC for qemu v4 1/2] virtio_balloon: Add cont-pages and icvq

This commit adds cont-pages option to virtio_balloon. virtio_balloon
will open flags VIRTIO_BALLOON_F_CONT_PAGES with this option.
And it add a vq icvq to inflate continuous pages.
When VIRTIO_BALLOON_F_CONT_PAGES is set, try to get continuous pages
from icvq and use madvise MADV_DONTNEED release the pages.

Signed-off-by: Hui Zhu <[email protected]>
---
hw/virtio/virtio-balloon.c | 80 ++++++++++++++++---------
include/hw/virtio/virtio-balloon.h | 2 +-
include/standard-headers/linux/virtio_balloon.h | 1 +
3 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7..d36a5c8 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -65,23 +65,26 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,

static void balloon_inflate_page(VirtIOBalloon *balloon,
MemoryRegion *mr, hwaddr mr_offset,
+ size_t size,
PartiallyBalloonedPage *pbp)
{
void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
ram_addr_t rb_offset, rb_aligned_offset, base_gpa;
RAMBlock *rb;
size_t rb_page_size;
- int subpages;
+ int subpages, pages_num;

/* XXX is there a better way to get to the RAMBlock than via a
* host address? */
rb = qemu_ram_block_from_host(addr, false, &rb_offset);
rb_page_size = qemu_ram_pagesize(rb);

+ size &= ~(rb_page_size - 1);
+
if (rb_page_size == BALLOON_PAGE_SIZE) {
/* Easy case */

- ram_block_discard_range(rb, rb_offset, rb_page_size);
+ ram_block_discard_range(rb, rb_offset, size);
/* We ignore errors from ram_block_discard_range(), because it
* has already reported them, and failing to discard a balloon
* page is not fatal */
@@ -99,32 +102,38 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,

rb_aligned_offset = QEMU_ALIGN_DOWN(rb_offset, rb_page_size);
subpages = rb_page_size / BALLOON_PAGE_SIZE;
- base_gpa = memory_region_get_ram_addr(mr) + mr_offset -
- (rb_offset - rb_aligned_offset);

- if (pbp->bitmap && !virtio_balloon_pbp_matches(pbp, base_gpa)) {
- /* We've partially ballooned part of a host page, but now
- * we're trying to balloon part of a different one. Too hard,
- * give up on the old partial page */
- virtio_balloon_pbp_free(pbp);
- }
+ for (pages_num = size / BALLOON_PAGE_SIZE;
+ pages_num > 0; pages_num--) {
+ base_gpa = memory_region_get_ram_addr(mr) + mr_offset -
+ (rb_offset - rb_aligned_offset);

- if (!pbp->bitmap) {
- virtio_balloon_pbp_alloc(pbp, base_gpa, subpages);
- }
+ if (pbp->bitmap && !virtio_balloon_pbp_matches(pbp, base_gpa)) {
+ /* We've partially ballooned part of a host page, but now
+ * we're trying to balloon part of a different one. Too hard,
+ * give up on the old partial page */
+ virtio_balloon_pbp_free(pbp);
+ }

- set_bit((rb_offset - rb_aligned_offset) / BALLOON_PAGE_SIZE,
- pbp->bitmap);
+ if (!pbp->bitmap) {
+ virtio_balloon_pbp_alloc(pbp, base_gpa, subpages);
+ }

- if (bitmap_full(pbp->bitmap, subpages)) {
- /* We've accumulated a full host page, we can actually discard
- * it now */
+ set_bit((rb_offset - rb_aligned_offset) / BALLOON_PAGE_SIZE,
+ pbp->bitmap);

- ram_block_discard_range(rb, rb_aligned_offset, rb_page_size);
- /* We ignore errors from ram_block_discard_range(), because it
- * has already reported them, and failing to discard a balloon
- * page is not fatal */
- virtio_balloon_pbp_free(pbp);
+ if (bitmap_full(pbp->bitmap, subpages)) {
+ /* We've accumulated a full host page, we can actually discard
+ * it now */
+
+ ram_block_discard_range(rb, rb_aligned_offset, rb_page_size);
+ /* We ignore errors from ram_block_discard_range(), because it
+ * has already reported them, and failing to discard a balloon
+ * page is not fatal */
+ virtio_balloon_pbp_free(pbp);
+ }
+
+ mr_offset += BALLOON_PAGE_SIZE;
}
}

@@ -340,12 +349,21 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
unsigned int p = virtio_ldl_p(vdev, &pfn);
hwaddr pa;
+ unsigned int psize = BALLOON_PAGE_SIZE;

pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
offset += 4;

- section = memory_region_find(get_system_memory(), pa,
- BALLOON_PAGE_SIZE);
+ if (vq == s->icvq) {
+ uint32_t psize_ptr;
+ if (iov_to_buf(elem->out_sg, elem->out_num, offset, &psize_ptr, 4) != 4) {
+ break;
+ }
+ psize = virtio_ldl_p(vdev, &psize_ptr);
+ offset += 4;
+ }
+
+ section = memory_region_find(get_system_memory(), pa, psize);
if (!section.mr) {
trace_virtio_balloon_bad_addr(pa);
continue;
@@ -361,9 +379,10 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
trace_virtio_balloon_handle_output(memory_region_name(section.mr),
pa);
if (!qemu_balloon_is_inhibited()) {
- if (vq == s->ivq) {
+ if (vq == s->ivq || vq == s->icvq) {
balloon_inflate_page(s, section.mr,
- section.offset_within_region, &pbp);
+ section.offset_within_region,
+ psize, &pbp);
} else if (vq == s->dvq) {
balloon_deflate_page(s, section.mr, section.offset_within_region);
} else {
@@ -816,6 +835,11 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
virtio_error(vdev, "iothread is missing");
}
}
+
+ if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
+ s->icvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
+ }
+
reset_stats(s);
}

@@ -916,6 +940,8 @@ static Property virtio_balloon_properties[] = {
VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+ DEFINE_PROP_BIT("cont-pages", VirtIOBalloon, host_features,
+ VIRTIO_BALLOON_F_CONT_PAGES, false),
/* QEMU 4.0 accidentally changed the config size even when free-page-hint
* is disabled, resulting in QEMU 3.1 migration incompatibility. This
* property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index d1c968d..6a2514d 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {

typedef struct VirtIOBalloon {
VirtIODevice parent_obj;
- VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+ VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq;
uint32_t free_page_report_status;
uint32_t num_pages;
uint32_t actual;
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2..033926c 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
#define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */
#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
#define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_CONT_PAGES 6 /* VQ to report continuous pages */

/* Size of a PFN in the balloon interface. */
#define VIRTIO_BALLOON_PFN_SHIFT 12
--
2.7.4

2020-07-16 02:43:50

by Hui Zhu

[permalink] [raw]
Subject: [RFC for qemu v4 2/2] virtio_balloon: Add dcvq to deflate continuous pages

This commit adds a vq dcvq to deflate continuous pages.
When VIRTIO_BALLOON_F_CONT_PAGES is set, try to get continuous pages
from icvq and use madvise MADV_WILLNEED with the pages.

Signed-off-by: Hui Zhu <[email protected]>
---
hw/virtio/virtio-balloon.c | 14 +++++++++-----
include/hw/virtio/virtio-balloon.h | 2 +-
2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index d36a5c8..165adf7 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -138,7 +138,8 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
}

static void balloon_deflate_page(VirtIOBalloon *balloon,
- MemoryRegion *mr, hwaddr mr_offset)
+ MemoryRegion *mr, hwaddr mr_offset,
+ size_t size)
{
void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
ram_addr_t rb_offset;
@@ -153,10 +154,11 @@ static void balloon_deflate_page(VirtIOBalloon *balloon,
rb_page_size = qemu_ram_pagesize(rb);

host_addr = (void *)((uintptr_t)addr & ~(rb_page_size - 1));
+ size &= ~(rb_page_size - 1);

/* When a page is deflated, we hint the whole host page it lives
* on, since we can't do anything smaller */
- ret = qemu_madvise(host_addr, rb_page_size, QEMU_MADV_WILLNEED);
+ ret = qemu_madvise(host_addr, size, QEMU_MADV_WILLNEED);
if (ret != 0) {
warn_report("Couldn't MADV_WILLNEED on balloon deflate: %s",
strerror(errno));
@@ -354,7 +356,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
offset += 4;

- if (vq == s->icvq) {
+ if (vq == s->icvq || vq == s->dcvq) {
uint32_t psize_ptr;
if (iov_to_buf(elem->out_sg, elem->out_num, offset, &psize_ptr, 4) != 4) {
break;
@@ -383,8 +385,9 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
balloon_inflate_page(s, section.mr,
section.offset_within_region,
psize, &pbp);
- } else if (vq == s->dvq) {
- balloon_deflate_page(s, section.mr, section.offset_within_region);
+ } else if (vq == s->dvq || vq == s->dcvq) {
+ balloon_deflate_page(s, section.mr, section.offset_within_region,
+ psize);
} else {
g_assert_not_reached();
}
@@ -838,6 +841,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)

if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
s->icvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
+ s->dcvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
}

reset_stats(s);
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 6a2514d..848a7fb 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {

typedef struct VirtIOBalloon {
VirtIODevice parent_obj;
- VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq;
+ VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq, *dcvq;
uint32_t free_page_report_status;
uint32_t num_pages;
uint32_t actual;
--
2.7.4

2020-07-16 02:44:15

by Hui Zhu

[permalink] [raw]
Subject: [RFC for Linux v4 2/2] virtio_balloon: Add deflate_cont_vq to deflate continuous pages

This commit adds a vq deflate_cont_vq to deflate continuous pages.
When VIRTIO_BALLOON_F_CONT_PAGES is set, call leak_balloon_cont to leak
the balloon.
leak_balloon_cont will call balloon_page_list_dequeue_cont get continuous
pages from balloon and report them use deflate_cont_vq.

Signed-off-by: Hui Zhu <[email protected]>
---
drivers/virtio/virtio_balloon.c | 73 ++++++++++++++++++++++++++++++++----
include/linux/balloon_compaction.h | 3 ++
mm/balloon_compaction.c | 76 ++++++++++++++++++++++++++++++++++++++
3 files changed, 144 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index b89f566..258b3d9 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -44,6 +44,7 @@

#define VIRTIO_BALLOON_INFLATE_MAX_ORDER min((int) (sizeof(__virtio32) * BITS_PER_BYTE - \
1 - PAGE_SHIFT), (MAX_ORDER-1))
+#define VIRTIO_BALLOON_DEFLATE_MAX_PAGES_NUM (((__virtio32)~0U) >> PAGE_SHIFT)

#ifdef CONFIG_BALLOON_COMPACTION
static struct vfsmount *balloon_mnt;
@@ -56,6 +57,7 @@ enum virtio_balloon_vq {
VIRTIO_BALLOON_VQ_FREE_PAGE,
VIRTIO_BALLOON_VQ_REPORTING,
VIRTIO_BALLOON_VQ_INFLATE_CONT,
+ VIRTIO_BALLOON_VQ_DEFLATE_CONT,
VIRTIO_BALLOON_VQ_MAX
};

@@ -65,7 +67,8 @@ enum virtio_balloon_config_read {

struct virtio_balloon {
struct virtio_device *vdev;
- struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq, *inflate_cont_vq;
+ struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq,
+ *inflate_cont_vq, *deflate_cont_vq;

/* Balloon's own wq for cpu-intensive work items */
struct workqueue_struct *balloon_wq;
@@ -215,6 +218,16 @@ static void set_page_pfns(struct virtio_balloon *vb,
page_to_balloon_pfn(page) + i);
}

+static void set_page_pfns_size(struct virtio_balloon *vb,
+ __virtio32 pfns[], struct page *page,
+ size_t size)
+{
+ /* Set the first pfn of the continuous pages. */
+ pfns[0] = cpu_to_virtio32(vb->vdev, page_to_balloon_pfn(page));
+ /* Set the size of the continuous pages. */
+ pfns[1] = (__virtio32) size;
+}
+
static void set_page_pfns_order(struct virtio_balloon *vb,
__virtio32 pfns[], struct page *page,
unsigned int order)
@@ -222,10 +235,7 @@ static void set_page_pfns_order(struct virtio_balloon *vb,
if (order == 0)
return set_page_pfns(vb, pfns, page);

- /* Set the first pfn of the continuous pages. */
- pfns[0] = cpu_to_virtio32(vb->vdev, page_to_balloon_pfn(page));
- /* Set the size of the continuous pages. */
- pfns[1] = PAGE_SIZE << order;
+ set_page_pfns_size(vb, pfns, page, PAGE_SIZE << order);
}

static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
@@ -367,6 +377,42 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
return num_freed_pages;
}

+static unsigned int leak_balloon_cont(struct virtio_balloon *vb, size_t num)
+{
+ unsigned int num_freed_pages;
+ struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
+ LIST_HEAD(pages);
+ size_t num_pages;
+
+ mutex_lock(&vb->balloon_lock);
+ for (vb->num_pfns = 0, num_freed_pages = 0;
+ vb->num_pfns < ARRAY_SIZE(vb->pfns) && num_freed_pages < num;
+ vb->num_pfns += 2,
+ num_freed_pages += num_pages << (PAGE_SHIFT - VIRTIO_BALLOON_PFN_SHIFT)) {
+ struct page *page;
+
+ num_pages = balloon_page_list_dequeue_cont(vb_dev_info, &pages, &page,
+ min_t(size_t,
+ VIRTIO_BALLOON_DEFLATE_MAX_PAGES_NUM,
+ num - num_freed_pages));
+ if (!num_pages)
+ break;
+ set_page_pfns_size(vb, vb->pfns + vb->num_pfns, page, num_pages << PAGE_SHIFT);
+ }
+ vb->num_pages -= num_freed_pages;
+
+ /*
+ * Note that if
+ * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
+ * is true, we *have* to do it in this order
+ */
+ if (vb->num_pfns != 0)
+ tell_host(vb, vb->deflate_cont_vq);
+ release_pages_balloon(vb, &pages);
+ mutex_unlock(&vb->balloon_lock);
+ return num_freed_pages;
+}
+
static inline void update_stat(struct virtio_balloon *vb, int idx,
u16 tag, u64 val)
{
@@ -551,8 +597,12 @@ static void update_balloon_size_func(struct work_struct *work)

if (diff > 0)
diff -= fill_balloon(vb, diff);
- else
- diff += leak_balloon(vb, -diff);
+ else {
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES))
+ diff += leak_balloon_cont(vb, -diff);
+ else
+ diff += leak_balloon(vb, -diff);
+ }
update_balloon_size(vb);

if (diff)
@@ -587,6 +637,8 @@ static int init_vqs(struct virtio_balloon *vb)
names[VIRTIO_BALLOON_VQ_REPORTING] = NULL;
names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
+ names[VIRTIO_BALLOON_VQ_DEFLATE_CONT] = NULL;
+ callbacks[VIRTIO_BALLOON_VQ_DEFLATE_CONT] = NULL;

if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
names[VIRTIO_BALLOON_VQ_STATS] = "stats";
@@ -606,6 +658,8 @@ static int init_vqs(struct virtio_balloon *vb)
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = "inflate_cont";
callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = balloon_ack;
+ names[VIRTIO_BALLOON_VQ_DEFLATE_CONT] = "deflate_cont";
+ callbacks[VIRTIO_BALLOON_VQ_DEFLATE_CONT] = balloon_ack;
}

err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
@@ -643,9 +697,12 @@ static int init_vqs(struct virtio_balloon *vb)
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING))
vb->reporting_vq = vqs[VIRTIO_BALLOON_VQ_REPORTING];

- if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES))
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
vb->inflate_cont_vq
= vqs[VIRTIO_BALLOON_VQ_INFLATE_CONT];
+ vb->deflate_cont_vq
+ = vqs[VIRTIO_BALLOON_VQ_DEFLATE_CONT];
+ }

return 0;
}
diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index 8180bbf..7cb2a75 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -70,6 +70,9 @@ extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
struct list_head *pages);
extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
struct list_head *pages, size_t n_req_pages);
+extern size_t balloon_page_list_dequeue_cont(struct balloon_dev_info *b_dev_info,
+ struct list_head *pages, struct page **first_page,
+ size_t max_req_pages);

static inline struct page *balloon_page_alloc(void)
{
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 397d0b9..ea7d91f 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -111,6 +111,82 @@ size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
}
EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);

+/**
+ * balloon_page_list_dequeue_cont() - removes continuous pages from balloon's page list
+ * and returns a list of the continuous pages.
+ * @b_dev_info: balloon device decriptor where we will grab a page from.
+ * @pages: pointer to the list of pages that would be returned to the caller.
+ * @max_req_pages: max number of requested pages.
+ *
+ * Driver must call this function to properly de-allocate a previous enlisted
+ * balloon pages before definitively releasing it back to the guest system.
+ * This function tries to remove @max_req_pages continuous pages from the ballooned
+ * pages and return them to the caller in the @pages list.
+ *
+ * Note that this function may fail to dequeue some pages even if the balloon
+ * isn't empty - since the page list can be temporarily empty due to compaction
+ * of isolated pages.
+ *
+ * Return: number of pages that were added to the @pages list.
+ */
+size_t balloon_page_list_dequeue_cont(struct balloon_dev_info *b_dev_info,
+ struct list_head *pages, struct page **first_page,
+ size_t max_req_pages)
+{
+ struct page *page, *tmp;
+ unsigned long flags, tail_pfn;
+ size_t n_pages = 0;
+ bool got_first = false;
+
+ spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+ list_for_each_entry_safe_reverse(page, tmp, &b_dev_info->pages, lru) {
+ unsigned long pfn;
+
+ if (n_pages == max_req_pages)
+ break;
+
+ pfn = page_to_pfn(page);
+
+ if (got_first && pfn != tail_pfn + 1)
+ break;
+
+ /*
+ * Block others from accessing the 'page' while we get around to
+ * establishing additional references and preparing the 'page'
+ * to be released by the balloon driver.
+ */
+ if (!trylock_page(page)) {
+ if (!got_first)
+ continue;
+ else
+ break;
+ }
+
+ if (IS_ENABLED(CONFIG_BALLOON_COMPACTION) && PageIsolated(page)) {
+ /* raced with isolation */
+ unlock_page(page);
+ if (!got_first)
+ continue;
+ else
+ break;
+ }
+ balloon_page_delete(page);
+ __count_vm_event(BALLOON_DEFLATE);
+ list_add(&page->lru, pages);
+ unlock_page(page);
+ n_pages++;
+ tail_pfn = pfn;
+ if (!got_first) {
+ got_first = true;
+ *first_page = page;
+ }
+ }
+ spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+
+ return n_pages;
+}
+EXPORT_SYMBOL_GPL(balloon_page_list_dequeue_cont);
+
/*
* balloon_pages_alloc - allocates a new page for insertion into the balloon
* page list.
--
2.7.4

2020-07-16 06:40:39

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [RFC for qemu v4 2/2] virtio_balloon: Add dcvq to deflate continuous pages

On Thu, Jul 16, 2020 at 10:41:55AM +0800, Hui Zhu wrote:
> This commit adds a vq dcvq to deflate continuous pages.
> When VIRTIO_BALLOON_F_CONT_PAGES is set, try to get continuous pages
> from icvq and use madvise MADV_WILLNEED with the pages.
>
> Signed-off-by: Hui Zhu <[email protected]>

This is arguably something to benchmark. Does guest benefit
from MADV_WILLNEED or loose performance?

> ---
> hw/virtio/virtio-balloon.c | 14 +++++++++-----
> include/hw/virtio/virtio-balloon.h | 2 +-
> 2 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index d36a5c8..165adf7 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -138,7 +138,8 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
> }
>
> static void balloon_deflate_page(VirtIOBalloon *balloon,
> - MemoryRegion *mr, hwaddr mr_offset)
> + MemoryRegion *mr, hwaddr mr_offset,
> + size_t size)
> {
> void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
> ram_addr_t rb_offset;
> @@ -153,10 +154,11 @@ static void balloon_deflate_page(VirtIOBalloon *balloon,
> rb_page_size = qemu_ram_pagesize(rb);
>
> host_addr = (void *)((uintptr_t)addr & ~(rb_page_size - 1));
> + size &= ~(rb_page_size - 1);
>
> /* When a page is deflated, we hint the whole host page it lives
> * on, since we can't do anything smaller */
> - ret = qemu_madvise(host_addr, rb_page_size, QEMU_MADV_WILLNEED);
> + ret = qemu_madvise(host_addr, size, QEMU_MADV_WILLNEED);
> if (ret != 0) {
> warn_report("Couldn't MADV_WILLNEED on balloon deflate: %s",
> strerror(errno));
> @@ -354,7 +356,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
> offset += 4;
>
> - if (vq == s->icvq) {
> + if (vq == s->icvq || vq == s->dcvq) {
> uint32_t psize_ptr;
> if (iov_to_buf(elem->out_sg, elem->out_num, offset, &psize_ptr, 4) != 4) {
> break;
> @@ -383,8 +385,9 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> balloon_inflate_page(s, section.mr,
> section.offset_within_region,
> psize, &pbp);
> - } else if (vq == s->dvq) {
> - balloon_deflate_page(s, section.mr, section.offset_within_region);
> + } else if (vq == s->dvq || vq == s->dcvq) {
> + balloon_deflate_page(s, section.mr, section.offset_within_region,
> + psize);
> } else {
> g_assert_not_reached();
> }
> @@ -838,6 +841,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>
> if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
> s->icvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
> + s->dcvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
> }
>
> reset_stats(s);
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 6a2514d..848a7fb 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
>
> typedef struct VirtIOBalloon {
> VirtIODevice parent_obj;
> - VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq;
> + VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq, *dcvq;
> uint32_t free_page_report_status;
> uint32_t num_pages;
> uint32_t actual;
> --
> 2.7.4

2020-07-16 06:41:27

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [RFC for Linux v4 0/2] virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES to report continuous pages

On Thu, Jul 16, 2020 at 10:41:50AM +0800, Hui Zhu wrote:
> The first, second and third version are in [1], [2] and [3].
> Code of current version for Linux and qemu is available in [4] and [5].
> Update of this version:
> 1. Report continuous pages will increase the speed. So added deflate
> continuous pages.
> 2. According to the comments from David in [6], added 2 new vqs inflate_cont_vq
> and deflate_cont_vq to report continuous pages with format 32 bits pfn and 32
> bits size.
> Following is the introduction of the function.
> These patches add VIRTIO_BALLOON_F_CONT_PAGES to virtio_balloon. With this
> flag, balloon tries to use continuous pages to inflate and deflate.
> Opening this flag can bring two benefits:
> 1. Report continuous pages will increase memory report size of each time
> call tell_host. Then it will increase the speed of balloon inflate and
> deflate.
> 2. Host THPs will be splitted when qemu release the page of balloon inflate.
> Inflate balloon with continuous pages will let QEMU release the pages
> of same THPs. That will help decrease the splitted THPs number in
> the host.
> Following is an example in a VM with 1G memory 1CPU. This test setups an
> environment that has a lot of fragmentation pages. Then inflate balloon will
> split the THPs.
> // This is the THP number before VM execution in the host.
> // None use THP.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages: 0 kB
> // After VM start, use usemem
> // (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git)
> // punch-holes function generates 400m fragmentation pages in the guest
> // kernel.
> usemem --punch-holes -s -1 800m &
> // This is the THP number after this command in the host.
> // Some THP is used by VM because usemem will access 800M memory
> // in the guest.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages: 911360 kB
> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
> (qemu) device_add virtio-balloon-pci,id=balloon1
> (qemu) info balloon
> balloon: actual=1024
> (qemu) balloon 600
> (qemu) info balloon
> balloon: actual=600
> // This is the THP number after inflate the balloon in the host.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages: 88064 kB
> // Set the size back to 1024M in the QEMU monitor.
> (qemu) balloon 1024
> (qemu) info balloon
> balloon: actual=1024
> // Use usemem to increase the memory usage of QEMU.
> killall usemem
> usemem 800m
> // This is the THP number after this operation.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages: 65536 kB
>
> Following example change to use continuous pages balloon. The number of
> splitted THPs is decreased.
> // This is the THP number before VM execution in the host.
> // None use THP.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages: 0 kB
> // After VM start, use usemem punch-holes function generates 400M
> // fragmentation pages in the guest kernel.
> usemem --punch-holes -s -1 800m &
> // This is the THP number after this command in the host.
> // Some THP is used by VM because usemem will access 800M memory
> // in the guest.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages: 911360 kB
> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
> (qemu) device_add virtio-balloon-pci,id=balloon1,cont-pages=on
> (qemu) info balloon
> balloon: actual=1024
> (qemu) balloon 600
> (qemu) info balloon
> balloon: actual=600
> // This is the THP number after inflate the balloon in the host.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages: 616448 kB
> // Set the size back to 1024M in the QEMU monitor.
> (qemu) balloon 1024
> (qemu) info balloon
> balloon: actual=1024
> // Use usemem to increase the memory usage of QEMU.
> killall usemem
> usemem 800m
> // This is the THP number after this operation.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages: 907264 kB

I'm a bit confused about which of the above run within guest,
and which run within host. Could you explain pls?



> [1] https://lkml.org/lkml/2020/3/12/144
> [2] https://lore.kernel.org/linux-mm/[email protected]/
> [3] https://lkml.org/lkml/2020/5/12/324
> [4] https://github.com/teawater/linux/tree/balloon_conts
> [5] https://github.com/teawater/qemu/tree/balloon_conts
> [6] https://lkml.org/lkml/2020/5/13/1211
>
> Hui Zhu (2):
> virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES and inflate_cont_vq
> virtio_balloon: Add deflate_cont_vq to deflate continuous pages
>
> drivers/virtio/virtio_balloon.c | 180 +++++++++++++++++++++++++++++++-----
> include/linux/balloon_compaction.h | 12 ++
> include/uapi/linux/virtio_balloon.h | 1
> mm/balloon_compaction.c | 117 +++++++++++++++++++++--
> 4 files changed, 280 insertions(+), 30 deletions(-)

2020-07-16 06:46:15

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [RFC for Linux v4 1/2] virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES and inflate_cont_vq

On Thu, Jul 16, 2020 at 10:41:51AM +0800, Hui Zhu wrote:
> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> index dc3e656..4d0151a 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -37,6 +37,7 @@
> #define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
> #define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */
> #define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
> +#define VIRTIO_BALLOON_F_CONT_PAGES 6 /* VQ to report continuous pages */
>
> /* Size of a PFN in the balloon interface. */
> #define VIRTIO_BALLOON_PFN_SHIFT 12

So how does the guest/host interface look like?
Could you write up something about it?

2020-07-16 07:03:14

by Hui Zhu

[permalink] [raw]
Subject: Re: [virtio-dev] [RFC for Linux v4 0/2] virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES to report continuous pages



> 2020年7月16日 14:38,Michael S. Tsirkin <[email protected]> 写道:
>
> On Thu, Jul 16, 2020 at 10:41:50AM +0800, Hui Zhu wrote:
>> The first, second and third version are in [1], [2] and [3].
>> Code of current version for Linux and qemu is available in [4] and [5].
>> Update of this version:
>> 1. Report continuous pages will increase the speed. So added deflate
>> continuous pages.
>> 2. According to the comments from David in [6], added 2 new vqs inflate_cont_vq
>> and deflate_cont_vq to report continuous pages with format 32 bits pfn and 32
>> bits size.
>> Following is the introduction of the function.
>> These patches add VIRTIO_BALLOON_F_CONT_PAGES to virtio_balloon. With this
>> flag, balloon tries to use continuous pages to inflate and deflate.
>> Opening this flag can bring two benefits:
>> 1. Report continuous pages will increase memory report size of each time
>> call tell_host. Then it will increase the speed of balloon inflate and
>> deflate.
>> 2. Host THPs will be splitted when qemu release the page of balloon inflate.
>> Inflate balloon with continuous pages will let QEMU release the pages
>> of same THPs. That will help decrease the splitted THPs number in
>> the host.
>> Following is an example in a VM with 1G memory 1CPU. This test setups an
>> environment that has a lot of fragmentation pages. Then inflate balloon will
>> split the THPs.


>> // This is the THP number before VM execution in the host.
>> // None use THP.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages: 0 kB
These lines are from host.

>> // After VM start, use usemem
>> // (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git)
>> // punch-holes function generates 400m fragmentation pages in the guest
>> // kernel.
>> usemem --punch-holes -s -1 800m &
These lines are from guest. They setups the environment that has a lot of fragmentation pages.

>> // This is the THP number after this command in the host.
>> // Some THP is used by VM because usemem will access 800M memory
>> // in the guest.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages: 911360 kB
These lines are from host.

>> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
>> (qemu) device_add virtio-balloon-pci,id=balloon1
>> (qemu) info balloon
>> balloon: actual=1024
>> (qemu) balloon 600
>> (qemu) info balloon
>> balloon: actual=600
These lines are from host.

>> // This is the THP number after inflate the balloon in the host.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages: 88064 kB
These lines are from host.

>> // Set the size back to 1024M in the QEMU monitor.
>> (qemu) balloon 1024
>> (qemu) info balloon
>> balloon: actual=1024
These lines are from host.

>> // Use usemem to increase the memory usage of QEMU.
>> killall usemem
>> usemem 800m
These lines are from guest.

>> // This is the THP number after this operation.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages: 65536 kB
These lines are from host.



>>
>> Following example change to use continuous pages balloon. The number of
>> splitted THPs is decreased.
>> // This is the THP number before VM execution in the host.
>> // None use THP.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages: 0 kB
These lines are from host.

>> // After VM start, use usemem punch-holes function generates 400M
>> // fragmentation pages in the guest kernel.
>> usemem --punch-holes -s -1 800m &
These lines are from guest. They setups the environment that has a lot of fragmentation pages.

>> // This is the THP number after this command in the host.
>> // Some THP is used by VM because usemem will access 800M memory
>> // in the guest.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages: 911360 kB
These lines are from host.

>> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
>> (qemu) device_add virtio-balloon-pci,id=balloon1,cont-pages=on
>> (qemu) info balloon
>> balloon: actual=1024
>> (qemu) balloon 600
>> (qemu) info balloon
>> balloon: actual=600
These lines are from host.

>> // This is the THP number after inflate the balloon in the host.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages: 616448 kB
>> // Set the size back to 1024M in the QEMU monitor.
>> (qemu) balloon 1024
>> (qemu) info balloon
>> balloon: actual=1024
These lines are from host.

>> // Use usemem to increase the memory usage of QEMU.
>> killall usemem
>> usemem 800m
These lines are from guest.

>> // This is the THP number after this operation.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages: 907264 kB
These lines are from host.

>
> I'm a bit confused about which of the above run within guest,
> and which run within host. Could you explain pls?
>
>

I added some introduction to show where these lines is get from.

Best,
Hui


>
>> [1] https://lkml.org/lkml/2020/3/12/144
>> [2] https://lore.kernel.org/linux-mm/[email protected]/
>> [3] https://lkml.org/lkml/2020/5/12/324
>> [4] https://github.com/teawater/linux/tree/balloon_conts
>> [5] https://github.com/teawater/qemu/tree/balloon_conts
>> [6] https://lkml.org/lkml/2020/5/13/1211
>>
>> Hui Zhu (2):
>> virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES and inflate_cont_vq
>> virtio_balloon: Add deflate_cont_vq to deflate continuous pages
>>
>> drivers/virtio/virtio_balloon.c | 180 +++++++++++++++++++++++++++++++-----
>> include/linux/balloon_compaction.h | 12 ++
>> include/uapi/linux/virtio_balloon.h | 1
>> mm/balloon_compaction.c | 117 +++++++++++++++++++++--
>> 4 files changed, 280 insertions(+), 30 deletions(-)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]

2020-07-16 07:33:32

by Hui Zhu

[permalink] [raw]
Subject: Re: [virtio-dev] [RFC for qemu v4 2/2] virtio_balloon: Add dcvq to deflate continuous pages



> 2020年7月16日 14:39,Michael S. Tsirkin <[email protected]> 写道:
>
> On Thu, Jul 16, 2020 at 10:41:55AM +0800, Hui Zhu wrote:
>> This commit adds a vq dcvq to deflate continuous pages.
>> When VIRTIO_BALLOON_F_CONT_PAGES is set, try to get continuous pages
>> from icvq and use madvise MADV_WILLNEED with the pages.
>>
>> Signed-off-by: Hui Zhu <[email protected]>
>
> This is arguably something to benchmark. Does guest benefit
> from MADV_WILLNEED or loose performance?

MADV_WILLNEED will call madvise_willneed in the host kernel.
madvise_willneed will schedule all required I/O operations (swap in or vfs_fadvise POSIX_FADV_WILLNEED) of the address.

But the pages of the balloon are released by MADV_DONTNEED.
So I think MADV_WILLNEED will not affect the performance of the guest in the most of situations.

Best,
Hui

>
>> ---
>> hw/virtio/virtio-balloon.c | 14 +++++++++-----
>> include/hw/virtio/virtio-balloon.h | 2 +-
>> 2 files changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>> index d36a5c8..165adf7 100644
>> --- a/hw/virtio/virtio-balloon.c
>> +++ b/hw/virtio/virtio-balloon.c
>> @@ -138,7 +138,8 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
>> }
>>
>> static void balloon_deflate_page(VirtIOBalloon *balloon,
>> - MemoryRegion *mr, hwaddr mr_offset)
>> + MemoryRegion *mr, hwaddr mr_offset,
>> + size_t size)
>> {
>> void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
>> ram_addr_t rb_offset;
>> @@ -153,10 +154,11 @@ static void balloon_deflate_page(VirtIOBalloon *balloon,
>> rb_page_size = qemu_ram_pagesize(rb);
>>
>> host_addr = (void *)((uintptr_t)addr & ~(rb_page_size - 1));
>> + size &= ~(rb_page_size - 1);
>>
>> /* When a page is deflated, we hint the whole host page it lives
>> * on, since we can't do anything smaller */
>> - ret = qemu_madvise(host_addr, rb_page_size, QEMU_MADV_WILLNEED);
>> + ret = qemu_madvise(host_addr, size, QEMU_MADV_WILLNEED);
>> if (ret != 0) {
>> warn_report("Couldn't MADV_WILLNEED on balloon deflate: %s",
>> strerror(errno));
>> @@ -354,7 +356,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>> pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
>> offset += 4;
>>
>> - if (vq == s->icvq) {
>> + if (vq == s->icvq || vq == s->dcvq) {
>> uint32_t psize_ptr;
>> if (iov_to_buf(elem->out_sg, elem->out_num, offset, &psize_ptr, 4) != 4) {
>> break;
>> @@ -383,8 +385,9 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>> balloon_inflate_page(s, section.mr,
>> section.offset_within_region,
>> psize, &pbp);
>> - } else if (vq == s->dvq) {
>> - balloon_deflate_page(s, section.mr, section.offset_within_region);
>> + } else if (vq == s->dvq || vq == s->dcvq) {
>> + balloon_deflate_page(s, section.mr, section.offset_within_region,
>> + psize);
>> } else {
>> g_assert_not_reached();
>> }
>> @@ -838,6 +841,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>>
>> if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
>> s->icvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>> + s->dcvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>> }
>>
>> reset_stats(s);
>> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
>> index 6a2514d..848a7fb 100644
>> --- a/include/hw/virtio/virtio-balloon.h
>> +++ b/include/hw/virtio/virtio-balloon.h
>> @@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
>>
>> typedef struct VirtIOBalloon {
>> VirtIODevice parent_obj;
>> - VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq;
>> + VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq, *dcvq;
>> uint32_t free_page_report_status;
>> uint32_t num_pages;
>> uint32_t actual;
>> --
>> 2.7.4
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]

2020-07-16 08:27:52

by Hui Zhu

[permalink] [raw]
Subject: Re: [RFC for Linux v4 1/2] virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES and inflate_cont_vq



> 2020年7月16日 14:43,Michael S. Tsirkin <[email protected]> 写道:
>
> On Thu, Jul 16, 2020 at 10:41:51AM +0800, Hui Zhu wrote:
>> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
>> index dc3e656..4d0151a 100644
>> --- a/include/uapi/linux/virtio_balloon.h
>> +++ b/include/uapi/linux/virtio_balloon.h
>> @@ -37,6 +37,7 @@
>> #define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
>> #define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */
>> #define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
>> +#define VIRTIO_BALLOON_F_CONT_PAGES 6 /* VQ to report continuous pages */
>>
>> /* Size of a PFN in the balloon interface. */
>> #define VIRTIO_BALLOON_PFN_SHIFT 12
>
> So how does the guest/host interface look like?
> Could you write up something about it?

Continuous pages are report by num_pfns and pfns in virtio_balloon too.
The function to set pfns is set_page_pfns_size in https://github.com/teawater/linux/blob/balloon_conts/drivers/virtio/virtio_balloon.c#L221

static void set_page_pfns_size(struct virtio_balloon *vb,
__virtio32 pfns[], struct page *page,
size_t size)
{
/* Set the first pfn of the continuous pages. */
pfns[0] = cpu_to_virtio32(vb->vdev, page_to_balloon_pfn(page));
/* Set the size of the continuous pages. */
pfns[1] = (__virtio32) size;
}

Each of continuous pages need 2 pfn.
The first pfn of the pages is set to pfns[0]. The size of the pages is set to pfns[1].

The pfn is 32 bits.
So the max order of inflate continuous pages is VIRTIO_BALLOON_INFLATE_MAX_ORDER.
#define VIRTIO_BALLOON_INFLATE_MAX_ORDER min((int) (sizeof(__virtio32) * BITS_PER_BYTE - \
1 - PAGE_SHIFT), (MAX_ORDER-1))

The max page number of deflate continuous pages is VIRTIO_BALLOON_DEFLATE_MAX_PAGES_NUM.
#define VIRTIO_BALLOON_DEFLATE_MAX_PAGES_NUM (((__virtio32)~0U) >> PAGE_SHIFT)

Best,
Hui

2020-07-16 10:46:47

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [virtio-dev] [RFC for Linux v4 0/2] virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES to report continuous pages

On Thu, Jul 16, 2020 at 03:01:18PM +0800, teawater wrote:
>
>
> > 2020年7月16日 14:38,Michael S. Tsirkin <[email protected]> 写道:
> >
> > On Thu, Jul 16, 2020 at 10:41:50AM +0800, Hui Zhu wrote:
> >> The first, second and third version are in [1], [2] and [3].
> >> Code of current version for Linux and qemu is available in [4] and [5].
> >> Update of this version:
> >> 1. Report continuous pages will increase the speed. So added deflate
> >> continuous pages.
> >> 2. According to the comments from David in [6], added 2 new vqs inflate_cont_vq
> >> and deflate_cont_vq to report continuous pages with format 32 bits pfn and 32
> >> bits size.
> >> Following is the introduction of the function.
> >> These patches add VIRTIO_BALLOON_F_CONT_PAGES to virtio_balloon. With this
> >> flag, balloon tries to use continuous pages to inflate and deflate.
> >> Opening this flag can bring two benefits:
> >> 1. Report continuous pages will increase memory report size of each time
> >> call tell_host. Then it will increase the speed of balloon inflate and
> >> deflate.
> >> 2. Host THPs will be splitted when qemu release the page of balloon inflate.
> >> Inflate balloon with continuous pages will let QEMU release the pages
> >> of same THPs. That will help decrease the splitted THPs number in
> >> the host.
> >> Following is an example in a VM with 1G memory 1CPU. This test setups an
> >> environment that has a lot of fragmentation pages. Then inflate balloon will
> >> split the THPs.
>
>
> >> // This is the THP number before VM execution in the host.
> >> // None use THP.
> >> cat /proc/meminfo | grep AnonHugePages:
> >> AnonHugePages: 0 kB
> These lines are from host.
>
> >> // After VM start, use usemem
> >> // (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git)
> >> // punch-holes function generates 400m fragmentation pages in the guest
> >> // kernel.
> >> usemem --punch-holes -s -1 800m &
> These lines are from guest. They setups the environment that has a lot of fragmentation pages.
>
> >> // This is the THP number after this command in the host.
> >> // Some THP is used by VM because usemem will access 800M memory
> >> // in the guest.
> >> cat /proc/meminfo | grep AnonHugePages:
> >> AnonHugePages: 911360 kB
> These lines are from host.
>
> >> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
> >> (qemu) device_add virtio-balloon-pci,id=balloon1
> >> (qemu) info balloon
> >> balloon: actual=1024
> >> (qemu) balloon 600
> >> (qemu) info balloon
> >> balloon: actual=600
> These lines are from host.
>
> >> // This is the THP number after inflate the balloon in the host.
> >> cat /proc/meminfo | grep AnonHugePages:
> >> AnonHugePages: 88064 kB
> These lines are from host.
>
> >> // Set the size back to 1024M in the QEMU monitor.
> >> (qemu) balloon 1024
> >> (qemu) info balloon
> >> balloon: actual=1024
> These lines are from host.
>
> >> // Use usemem to increase the memory usage of QEMU.
> >> killall usemem
> >> usemem 800m
> These lines are from guest.
>
> >> // This is the THP number after this operation.
> >> cat /proc/meminfo | grep AnonHugePages:
> >> AnonHugePages: 65536 kB
> These lines are from host.
>
>
>
> >>
> >> Following example change to use continuous pages balloon. The number of
> >> splitted THPs is decreased.
> >> // This is the THP number before VM execution in the host.
> >> // None use THP.
> >> cat /proc/meminfo | grep AnonHugePages:
> >> AnonHugePages: 0 kB
> These lines are from host.
>
> >> // After VM start, use usemem punch-holes function generates 400M
> >> // fragmentation pages in the guest kernel.
> >> usemem --punch-holes -s -1 800m &
> These lines are from guest. They setups the environment that has a lot of fragmentation pages.
>
> >> // This is the THP number after this command in the host.
> >> // Some THP is used by VM because usemem will access 800M memory
> >> // in the guest.
> >> cat /proc/meminfo | grep AnonHugePages:
> >> AnonHugePages: 911360 kB
> These lines are from host.
>
> >> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
> >> (qemu) device_add virtio-balloon-pci,id=balloon1,cont-pages=on
> >> (qemu) info balloon
> >> balloon: actual=1024
> >> (qemu) balloon 600
> >> (qemu) info balloon
> >> balloon: actual=600
> These lines are from host.
>
> >> // This is the THP number after inflate the balloon in the host.
> >> cat /proc/meminfo | grep AnonHugePages:
> >> AnonHugePages: 616448 kB
> >> // Set the size back to 1024M in the QEMU monitor.
> >> (qemu) balloon 1024
> >> (qemu) info balloon
> >> balloon: actual=1024
> These lines are from host.
>
> >> // Use usemem to increase the memory usage of QEMU.
> >> killall usemem
> >> usemem 800m
> These lines are from guest.
>
> >> // This is the THP number after this operation.
> >> cat /proc/meminfo | grep AnonHugePages:
> >> AnonHugePages: 907264 kB
> These lines are from host.
>
> >
> > I'm a bit confused about which of the above run within guest,
> > and which run within host. Could you explain pls?
> >
> >
>
> I added some introduction to show where these lines is get from.
>
> Best,
> Hui


OK so we see host has more free THPs. But guest has presumably less now - so
the total page table depth is the same. Did we gain anything?

>
> >
> >> [1] https://lkml.org/lkml/2020/3/12/144
> >> [2] https://lore.kernel.org/linux-mm/[email protected]/
> >> [3] https://lkml.org/lkml/2020/5/12/324
> >> [4] https://github.com/teawater/linux/tree/balloon_conts
> >> [5] https://github.com/teawater/qemu/tree/balloon_conts
> >> [6] https://lkml.org/lkml/2020/5/13/1211
> >>
> >> Hui Zhu (2):
> >> virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES and inflate_cont_vq
> >> virtio_balloon: Add deflate_cont_vq to deflate continuous pages
> >>
> >> drivers/virtio/virtio_balloon.c | 180 +++++++++++++++++++++++++++++++-----
> >> include/linux/balloon_compaction.h | 12 ++
> >> include/uapi/linux/virtio_balloon.h | 1
> >> mm/balloon_compaction.c | 117 +++++++++++++++++++++--
> >> 4 files changed, 280 insertions(+), 30 deletions(-)
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]

2020-07-17 03:53:58

by Hui Zhu

[permalink] [raw]
Subject: Re: [virtio-dev] [RFC for Linux v4 0/2] virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES to report continuous pages



> 2020年7月16日 18:45,Michael S. Tsirkin <[email protected]> 写道:
>
> On Thu, Jul 16, 2020 at 03:01:18PM +0800, teawater wrote:
>>
>>
>>> 2020年7月16日 14:38,Michael S. Tsirkin <[email protected]> 写道:
>>>
>>> On Thu, Jul 16, 2020 at 10:41:50AM +0800, Hui Zhu wrote:
>>>> The first, second and third version are in [1], [2] and [3].
>>>> Code of current version for Linux and qemu is available in [4] and [5].
>>>> Update of this version:
>>>> 1. Report continuous pages will increase the speed. So added deflate
>>>> continuous pages.
>>>> 2. According to the comments from David in [6], added 2 new vqs inflate_cont_vq
>>>> and deflate_cont_vq to report continuous pages with format 32 bits pfn and 32
>>>> bits size.
>>>> Following is the introduction of the function.
>>>> These patches add VIRTIO_BALLOON_F_CONT_PAGES to virtio_balloon. With this
>>>> flag, balloon tries to use continuous pages to inflate and deflate.
>>>> Opening this flag can bring two benefits:
>>>> 1. Report continuous pages will increase memory report size of each time
>>>> call tell_host. Then it will increase the speed of balloon inflate and
>>>> deflate.
>>>> 2. Host THPs will be splitted when qemu release the page of balloon inflate.
>>>> Inflate balloon with continuous pages will let QEMU release the pages
>>>> of same THPs. That will help decrease the splitted THPs number in
>>>> the host.
>>>> Following is an example in a VM with 1G memory 1CPU. This test setups an
>>>> environment that has a lot of fragmentation pages. Then inflate balloon will
>>>> split the THPs.
>>
>>
>>>> // This is the THP number before VM execution in the host.
>>>> // None use THP.
>>>> cat /proc/meminfo | grep AnonHugePages:
>>>> AnonHugePages: 0 kB
>> These lines are from host.
>>
>>>> // After VM start, use usemem
>>>> // (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git)
>>>> // punch-holes function generates 400m fragmentation pages in the guest
>>>> // kernel.
>>>> usemem --punch-holes -s -1 800m &
>> These lines are from guest. They setups the environment that has a lot of fragmentation pages.
>>
>>>> // This is the THP number after this command in the host.
>>>> // Some THP is used by VM because usemem will access 800M memory
>>>> // in the guest.
>>>> cat /proc/meminfo | grep AnonHugePages:
>>>> AnonHugePages: 911360 kB
>> These lines are from host.
>>
>>>> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
>>>> (qemu) device_add virtio-balloon-pci,id=balloon1
>>>> (qemu) info balloon
>>>> balloon: actual=1024
>>>> (qemu) balloon 600
>>>> (qemu) info balloon
>>>> balloon: actual=600
>> These lines are from host.
>>
>>>> // This is the THP number after inflate the balloon in the host.
>>>> cat /proc/meminfo | grep AnonHugePages:
>>>> AnonHugePages: 88064 kB
>> These lines are from host.
>>
>>>> // Set the size back to 1024M in the QEMU monitor.
>>>> (qemu) balloon 1024
>>>> (qemu) info balloon
>>>> balloon: actual=1024
>> These lines are from host.
>>
>>>> // Use usemem to increase the memory usage of QEMU.
>>>> killall usemem
>>>> usemem 800m
>> These lines are from guest.
>>
>>>> // This is the THP number after this operation.
>>>> cat /proc/meminfo | grep AnonHugePages:
>>>> AnonHugePages: 65536 kB
>> These lines are from host.
>>
>>
>>
>>>>
>>>> Following example change to use continuous pages balloon. The number of
>>>> splitted THPs is decreased.
>>>> // This is the THP number before VM execution in the host.
>>>> // None use THP.
>>>> cat /proc/meminfo | grep AnonHugePages:
>>>> AnonHugePages: 0 kB
>> These lines are from host.
>>
>>>> // After VM start, use usemem punch-holes function generates 400M
>>>> // fragmentation pages in the guest kernel.
>>>> usemem --punch-holes -s -1 800m &
>> These lines are from guest. They setups the environment that has a lot of fragmentation pages.
>>
>>>> // This is the THP number after this command in the host.
>>>> // Some THP is used by VM because usemem will access 800M memory
>>>> // in the guest.
>>>> cat /proc/meminfo | grep AnonHugePages:
>>>> AnonHugePages: 911360 kB
>> These lines are from host.
>>
>>>> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
>>>> (qemu) device_add virtio-balloon-pci,id=balloon1,cont-pages=on
>>>> (qemu) info balloon
>>>> balloon: actual=1024
>>>> (qemu) balloon 600
>>>> (qemu) info balloon
>>>> balloon: actual=600
>> These lines are from host.
>>
>>>> // This is the THP number after inflate the balloon in the host.
>>>> cat /proc/meminfo | grep AnonHugePages:
>>>> AnonHugePages: 616448 kB
>>>> // Set the size back to 1024M in the QEMU monitor.
>>>> (qemu) balloon 1024
>>>> (qemu) info balloon
>>>> balloon: actual=1024
>> These lines are from host.
>>
>>>> // Use usemem to increase the memory usage of QEMU.
>>>> killall usemem
>>>> usemem 800m
>> These lines are from guest.
>>
>>>> // This is the THP number after this operation.
>>>> cat /proc/meminfo | grep AnonHugePages:
>>>> AnonHugePages: 907264 kB
>> These lines are from host.
>>
>>>
>>> I'm a bit confused about which of the above run within guest,
>>> and which run within host. Could you explain pls?
>>>
>>>
>>
>> I added some introduction to show where these lines is get from.
>>
>> Best,
>> Hui
>
>
> OK so we see host has more free THPs. But guest has presumably less now - so
> the total page table depth is the same. Did we gain anything?
>


cat /proc/meminfo | grep AnonHugePages:
This command will output how many THPs is used by current system.
There is no program using THPs except qemu.
So this command will show how many THPs is used by qemu.

The last outout of “cat /proc/meminfo | grep AnonHugePages:” show how many THPs is used by qemu when this 2 qemu’s anon page number is same.
Without “cont-pages=on”, qemu keep 65536kb THPs.
Wiht “cont-pages=on”, qemu keep 907264kb THPs.
Keep more THPs will make memory access speed high.

This is a test record use this 1G 1 cpu qemu after the fragmentation balloon test:
Without “cont-pages=on”, qemu keep 81920kB THPs.
/ # usemem 800m
943718400 bytes / 489412 usecs = 1883076 KB/s
18725 usecs to free memory
/ # usemem 800m
943718400 bytes / 487070 usecs = 1892130 KB/s
18913 usecs to free memory
/ # usemem 800m
943718400 bytes / 484234 usecs = 1903212 KB/s
18538 usecs to free memory
/ # usemem 800m
943718400 bytes / 486568 usecs = 1894082 KB/s
18982 usecs to free memory

With “cont-pages=on”, qemu keep 907264kb THPs.
/ # usemem 800m
943718400 bytes / 479098 usecs = 1923614 KB/s
18980 usecs to free memory
/ # usemem 800m
943718400 bytes / 477433 usecs = 1930323 KB/s
18562 usecs to free memory
/ # usemem 800m
943718400 bytes / 479790 usecs = 1920840 KB/s
18663 usecs to free memory
/ # usemem 800m
943718400 bytes / 480253 usecs = 1918988 KB/s
19011 usecs to free memory

Best,
Hui



>>
>>>
>>>> [1] https://lkml.org/lkml/2020/3/12/144
>>>> [2] https://lore.kernel.org/linux-mm/[email protected]/
>>>> [3] https://lkml.org/lkml/2020/5/12/324
>>>> [4] https://github.com/teawater/linux/tree/balloon_conts
>>>> [5] https://github.com/teawater/qemu/tree/balloon_conts
>>>> [6] https://lkml.org/lkml/2020/5/13/1211
>>>>
>>>> Hui Zhu (2):
>>>> virtio_balloon: Add VIRTIO_BALLOON_F_CONT_PAGES and inflate_cont_vq
>>>> virtio_balloon: Add deflate_cont_vq to deflate continuous pages
>>>>
>>>> drivers/virtio/virtio_balloon.c | 180 +++++++++++++++++++++++++++++++-----
>>>> include/linux/balloon_compaction.h | 12 ++
>>>> include/uapi/linux/virtio_balloon.h | 1
>>>> mm/balloon_compaction.c | 117 +++++++++++++++++++++--
>>>> 4 files changed, 280 insertions(+), 30 deletions(-)
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]