2014-01-16 16:24:48

by Luiz Capitulino

[permalink] [raw]
Subject: [RFC PATCH 0/4] virtio_balloon: add pressure notification via a new virtqueue

The pressure notification added to the virtio-balloon driver by this series
is going to be used by the host (QEMU, in this case) to implement automatic
balloning support. More details in patch 3/4 and patch 4/4.

Patch 1/4 adds in-kernel vmpressure support and is not really part of this
series, it's added here for the convenience of someone who wants to try
automatic ballooning. Patch 2/4 is a hack to make in-kernel vmpressure work
for something not related to cgroups, I'll improve it in later versions.

Glauber Costa (1):
vmpressure: in-kernel notifications

Luiz capitulino (3):
vmpressure in-kernel: hack
virtio_balloon: add pressure notification via a new virtqueue
virtio_balloon: skip inflation if guest is under pressure

drivers/virtio/virtio_balloon.c | 100 ++++++++++++++++++++++++++++++++----
include/linux/vmpressure.h | 6 +++
include/uapi/linux/virtio_balloon.h | 1 +
mm/vmpressure.c | 58 +++++++++++++++++++--
4 files changed, 151 insertions(+), 14 deletions(-)

--
1.8.1.4


2014-01-16 16:24:46

by Luiz Capitulino

[permalink] [raw]
Subject: [RFC PATCH 2/4] vmpressure in-kernel: hack

From: Luiz capitulino <[email protected]>

1. Allow drivers to register private data
2. Allow drivers to pass css=NULL
3. Pass level to the callback

Signed-off-by: Luiz capitulino <[email protected]>
---
include/linux/vmpressure.h | 3 ++-
mm/vmpressure.c | 13 +++++++++----
2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 9102e53..de416b6 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -42,7 +42,8 @@ extern int vmpressure_register_event(struct cgroup_subsys_state *css,
struct eventfd_ctx *eventfd,
const char *args);
extern int vmpressure_register_kernel_event(struct cgroup_subsys_state *css,
- void (*fn)(void));
+ void (*fn)(void *data, int level),
+ void *data);
extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
struct cftype *cft,
struct eventfd_ctx *eventfd);
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 730e7c1..4ed0e85 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -132,9 +132,10 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
struct vmpressure_event {
union {
struct eventfd_ctx *efd;
- void (*fn)(void);
+ void (*fn)(void *data, int level);
};
enum vmpressure_levels level;
+ void *data;
bool kernel_event;
struct list_head node;
};
@@ -152,7 +153,7 @@ static bool vmpressure_event(struct vmpressure *vmpr,

list_for_each_entry(ev, &vmpr->events, node) {
if (ev->kernel_event) {
- ev->fn();
+ ev->fn(ev->data, level);
} else if (vmpr->notify_userspace && level >= ev->level) {
eventfd_signal(ev->efd, 1);
signalled = true;
@@ -352,21 +353,25 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
* well-defined cgroup aware interface.
*/
int vmpressure_register_kernel_event(struct cgroup_subsys_state *css,
- void (*fn)(void))
+ void (*fn)(void *data, int level), void *data)
{
- struct vmpressure *vmpr = css_to_vmpressure(css);
+ struct vmpressure *vmpr;
struct vmpressure_event *ev;

+ vmpr = css ? css_to_vmpressure(css) : memcg_to_vmpressure(NULL);
+
ev = kzalloc(sizeof(*ev), GFP_KERNEL);
if (!ev)
return -ENOMEM;

ev->kernel_event = true;
+ ev->data = data;
ev->fn = fn;

mutex_lock(&vmpr->events_lock);
list_add(&ev->node, &vmpr->events);
mutex_unlock(&vmpr->events_lock);
+
return 0;
}

--
1.8.1.4

2014-01-16 16:25:24

by Luiz Capitulino

[permalink] [raw]
Subject: [RFC PATCH 1/4] vmpressure: in-kernel notifications

From: Glauber Costa <[email protected]>

During the past weeks, it became clear to us that the shrinker interface
we have right now works very well for some particular types of users,
but not that well for others. The latter are usually people interested
in one-shot notifications, that were forced to adapt themselves to the
count+scan behavior of shrinkers. To do so, they had no choice than to
greatly abuse the shrinker interface producing little monsters all over.

During LSF/MM, one of the proposals that popped out during our session
was to reuse Anton Voronstsov's vmpressure for this. They are designed
for userspace consumption, but also provide a well-stablished,
cgroup-aware entry point for notifications.

This patch extends that to also support in-kernel users. Events that
should be generated for in-kernel consumption will be marked as such,
and for those, we will call a registered function instead of triggering
an eventfd notification.

Please note that due to my lack of understanding of each shrinker user,
I will stay away from converting the actual users, you are all welcome
to do so.

Signed-off-by: Glauber Costa <[email protected]>
Signed-off-by: Vladimir Davydov <[email protected]>
Acked-by: Anton Vorontsov <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Reviewed-by: Greg Thelen <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Kamezawa Hiroyuki <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Luiz capitulino <[email protected]>
---
include/linux/vmpressure.h | 5 +++++
mm/vmpressure.c | 53 +++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 3f3788d..9102e53 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -19,6 +19,9 @@ struct vmpressure {
/* Have to grab the lock on events traversal or modifications. */
struct mutex events_lock;

+ /* False if only kernel users want to be notified, true otherwise. */
+ bool notify_userspace;
+
struct work_struct work;
};

@@ -38,6 +41,8 @@ extern int vmpressure_register_event(struct cgroup_subsys_state *css,
struct cftype *cft,
struct eventfd_ctx *eventfd,
const char *args);
+extern int vmpressure_register_kernel_event(struct cgroup_subsys_state *css,
+ void (*fn)(void));
extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
struct cftype *cft,
struct eventfd_ctx *eventfd);
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index e0f6283..730e7c1 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -130,8 +130,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
}

struct vmpressure_event {
- struct eventfd_ctx *efd;
+ union {
+ struct eventfd_ctx *efd;
+ void (*fn)(void);
+ };
enum vmpressure_levels level;
+ bool kernel_event;
struct list_head node;
};

@@ -147,12 +151,15 @@ static bool vmpressure_event(struct vmpressure *vmpr,
mutex_lock(&vmpr->events_lock);

list_for_each_entry(ev, &vmpr->events, node) {
- if (level >= ev->level) {
+ if (ev->kernel_event) {
+ ev->fn();
+ } else if (vmpr->notify_userspace && level >= ev->level) {
eventfd_signal(ev->efd, 1);
signalled = true;
}
}

+ vmpr->notify_userspace = false;
mutex_unlock(&vmpr->events_lock);

return signalled;
@@ -222,7 +229,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
* we account it too.
*/
if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
- return;
+ goto schedule;

/*
* If we got here with no pages scanned, then that is an indicator
@@ -239,8 +246,15 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
vmpr->scanned += scanned;
vmpr->reclaimed += reclaimed;
scanned = vmpr->scanned;
+ /*
+ * If we didn't reach this point, only kernel events will be triggered.
+ * It is the job of the worker thread to clean this up once the
+ * notifications are all delivered.
+ */
+ vmpr->notify_userspace = true;
spin_unlock(&vmpr->sr_lock);

+schedule:
if (scanned < vmpressure_win)
return;
schedule_work(&vmpr->work);
@@ -324,6 +338,39 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
}

/**
+ * vmpressure_register_kernel_event() - Register kernel-side notification
+ * @css: css that is interested in vmpressure notifications
+ * @fn: function to be called when pressure happens
+ *
+ * This function register in-kernel users interested in receiving notifications
+ * about pressure conditions. Pressure notifications will be triggered at the
+ * same time as userspace notifications (with no particular ordering relative
+ * to it).
+ *
+ * Pressure notifications are a alternative method to shrinkers and will serve
+ * well users that are interested in a one-shot notification, with a
+ * well-defined cgroup aware interface.
+ */
+int vmpressure_register_kernel_event(struct cgroup_subsys_state *css,
+ void (*fn)(void))
+{
+ struct vmpressure *vmpr = css_to_vmpressure(css);
+ struct vmpressure_event *ev;
+
+ ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+ if (!ev)
+ return -ENOMEM;
+
+ ev->kernel_event = true;
+ ev->fn = fn;
+
+ mutex_lock(&vmpr->events_lock);
+ list_add(&ev->node, &vmpr->events);
+ mutex_unlock(&vmpr->events_lock);
+ return 0;
+}
+
+/**
* vmpressure_unregister_event() - Unbind eventfd from vmpressure
* @css: css handle
* @cft: cgroup control files handle
--
1.8.1.4

2014-01-16 16:25:22

by Luiz Capitulino

[permalink] [raw]
Subject: [RFC PATCH 4/4] virtio_balloon: skip inflation if guest is under pressure

From: Luiz capitulino <[email protected]>

This is necessary for automatic ballooning. If the guest gets
under pressure while there's an on-going inflation operation,
we want the guest to do the following:

1. Stop on-going inflation
2. Notify the host we're under pressure
3. Wait for host's acknowledge

While the guest is waiting the host has the opportunity to
reset num_pages to a value before the guest got into pressure.
This will cancel current inflation.

Signed-off-by: Luiz capitulino <[email protected]>
---
drivers/virtio/virtio_balloon.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 1c3ee71..7f5b7d2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -188,8 +188,13 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
mutex_lock(&vb->balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- struct page *page = balloon_page_enqueue(vb_dev_info);
+ struct page *page;

+ if (guest_under_pressure(vb)) {
+ break;
+ }
+
+ page = balloon_page_enqueue(vb_dev_info);
if (!page) {
dev_info_ratelimited(&vb->vdev->dev,
"Out of puff! Can't get %u pages\n",
--
1.8.1.4

2014-01-16 16:26:10

by Luiz Capitulino

[permalink] [raw]
Subject: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue

From: Luiz capitulino <[email protected]>

This commit adds support to a new virtqueue called message virtqueue.

The message virtqueue can be used by guests to notify the host about
important memory-related state changes in the guest. Currently, the
only implemented notification is the "guest is under pressure" one,
which informs the host that the guest is under memory pressure.

This notification can be used to implement automatic memory ballooning
in the host. For example, once learning that the guest is under pressure,
the host could cancel an on-going inflate and/or start a deflate
operation.

Doing this through a virtqueue might seem like overkill, as all
we're doing is to transfer an integer between guest and host. However,
using a virtqueue offers the following advantages:

1. We can realibly synchronize host and guest. That is, the guest
will only continue once the host acknowledges the notification.
This is important, because if the guest gets under pressure while
inflating the balloon, it has to stop to give the host a chance
to reset "num_pages" (see next commit)

2. It's extensible. We may (or may not) want to tell the host
which pressure level the guest finds itself in (ie. low,
medium or critical)

The lightweight alternative is to use a configuration space parameter.
For this to work though, the guest would have to wait the for host
to acknowloedge the receipt of a configuration change update. I could
try this if the virtqueue is too overkill.

Finally, the guest learns it's under pressure by registering a
callback with the in-kernel vmpressure API.

FIXMEs:

- vmpressure API is missing an de-registration routine
- not completely sure my changes in virtballoon_probe() are correct

Signed-off-by: Luiz capitulino <[email protected]>
---
drivers/virtio/virtio_balloon.c | 93 +++++++++++++++++++++++++++++++++----
include/uapi/linux/virtio_balloon.h | 1 +
2 files changed, 84 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 5c4a95b..1c3ee71 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -29,6 +29,9 @@
#include <linux/module.h>
#include <linux/balloon_compaction.h>

+#include <linux/cgroup.h>
+#include <linux/vmpressure.h>
+
/*
* Balloon device works in 4K page units. So each page is pointed to by
* multiple balloon pages. All memory counters in this driver are in balloon
@@ -37,10 +40,12 @@
#define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256

+#define VIRTIO_BALLOON_MSG_PRESSURE 1
+
struct virtio_balloon
{
struct virtio_device *vdev;
- struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+ struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *message_vq;

/* Where the ballooning thread waits for config to change. */
wait_queue_head_t config_change;
@@ -51,6 +56,8 @@ struct virtio_balloon
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;

+ wait_queue_head_t message_acked;
+
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
/*
@@ -71,6 +78,9 @@ struct virtio_balloon
/* Memory statistics */
int need_stats_update;
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
+
+ /* Message virtqueue */
+ atomic_t guest_pressure;
};

static struct virtio_device_id id_table[] = {
@@ -78,6 +88,41 @@ static struct virtio_device_id id_table[] = {
{ 0 },
};

+static inline bool guest_under_pressure(const struct virtio_balloon *vb)
+{
+ return atomic_read(&vb->guest_pressure) == 1;
+}
+
+static void vmpressure_event_handler(void *data, int level)
+{
+ struct virtio_balloon *vb = data;
+
+ atomic_set(&vb->guest_pressure, 1);
+ wake_up(&vb->config_change);
+}
+
+static void tell_host_pressure(struct virtio_balloon *vb)
+{
+ const uint32_t msg = VIRTIO_BALLOON_MSG_PRESSURE;
+ struct scatterlist sg;
+ unsigned int len;
+ int err;
+
+ sg_init_one(&sg, &msg, sizeof(msg));
+
+ err = virtqueue_add_outbuf(vb->message_vq, &sg, 1, vb, GFP_KERNEL);
+ if (err < 0) {
+ printk(KERN_WARNING "virtio-balloon: failed to send host message (%d)\n", err);
+ goto out;
+ }
+ virtqueue_kick(vb->message_vq);
+
+ wait_event(vb->message_acked, virtqueue_get_buf(vb->message_vq, &len));
+
+out:
+ atomic_set(&vb->guest_pressure, 0);
+}
+
static u32 page_to_balloon_pfn(struct page *page)
{
unsigned long pfn = page_to_pfn(page);
@@ -100,6 +145,13 @@ static void balloon_ack(struct virtqueue *vq)
wake_up(&vb->acked);
}

+static void message_ack(struct virtqueue *vq)
+{
+ struct virtio_balloon *vb = vq->vdev->priv;
+
+ wake_up(&vb->message_acked);
+}
+
static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
{
struct scatterlist sg;
@@ -300,6 +352,7 @@ static int balloon(void *_vballoon)
try_to_freeze();
wait_event_interruptible(vb->config_change,
(diff = towards_target(vb)) != 0
+ || guest_under_pressure(vb)
|| vb->need_stats_update
|| kthread_should_stop()
|| freezing(current));
@@ -310,31 +363,38 @@ static int balloon(void *_vballoon)
else if (diff < 0)
leak_balloon(vb, -diff);
update_balloon_size(vb);
+
+ if (guest_under_pressure(vb))
+ tell_host_pressure(vb);
+
}
return 0;
}

static int init_vqs(struct virtio_balloon *vb)
{
- struct virtqueue *vqs[3];
- vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
- const char *names[] = { "inflate", "deflate", "stats" };
- int err, nvqs;
+ struct virtqueue *vqs[4];
+ vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
+ stats_request, message_ack };
+ const char *names[] = { "inflate", "deflate", "stats", "pressure" };
+ int err, nvqs, idx;

/*
* We expect two virtqueues: inflate and deflate, and
- * optionally stat.
+ * optionally stat and message.
*/
- nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
+ nvqs = 2 + virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) +
+ virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ);
err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
if (err)
return err;

- vb->inflate_vq = vqs[0];
- vb->deflate_vq = vqs[1];
+ idx = 0;
+ vb->inflate_vq = vqs[idx++];
+ vb->deflate_vq = vqs[idx++];
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
struct scatterlist sg;
- vb->stats_vq = vqs[2];
+ vb->stats_vq = vqs[idx++];

/*
* Prime this virtqueue with one buffer so the hypervisor can
@@ -346,6 +406,9 @@ static int init_vqs(struct virtio_balloon *vb)
BUG();
virtqueue_kick(vb->stats_vq);
}
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ))
+ vb->message_vq = vqs[idx];
+
return 0;
}

@@ -438,9 +501,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->num_pages = 0;
mutex_init(&vb->balloon_lock);
init_waitqueue_head(&vb->config_change);
+ init_waitqueue_head(&vb->message_acked);
init_waitqueue_head(&vb->acked);
vb->vdev = vdev;
vb->need_stats_update = 0;
+ atomic_set(&vb->guest_pressure, 0);

vb_devinfo = balloon_devinfo_alloc(vb);
if (IS_ERR(vb_devinfo)) {
@@ -467,6 +532,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb_mapping;

+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ)) {
+ err = vmpressure_register_kernel_event(NULL, vmpressure_event_handler, vb);
+ if (err)
+ goto out_free_vb_mapping;
+ }
+
vb->thread = kthread_run(balloon, vb, "vballoon");
if (IS_ERR(vb->thread)) {
err = PTR_ERR(vb->thread);
@@ -476,6 +547,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
return 0;

out_del_vqs:
+ /* FIXME: add vmpressure_deregister_kernel_event() */
vdev->config->del_vqs(vdev);
out_free_vb_mapping:
balloon_mapping_free(vb_mapping);
@@ -543,6 +615,7 @@ static int virtballoon_restore(struct virtio_device *vdev)
static unsigned int features[] = {
VIRTIO_BALLOON_F_MUST_TELL_HOST,
VIRTIO_BALLOON_F_STATS_VQ,
+ VIRTIO_BALLOON_F_MESSAGE_VQ,
};

static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 5e26f61..846e46c 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -31,6 +31,7 @@
/* The feature bitmap for virtio balloon */
#define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
#define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */
+#define VIRTIO_BALLOON_F_MESSAGE_VQ 2 /* Message virtqueue */

/* Size of a PFN in the balloon interface. */
#define VIRTIO_BALLOON_PFN_SHIFT 12
--
1.8.1.4

2014-01-16 22:44:44

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue

Luiz Capitulino <[email protected]> writes:
> From: Luiz capitulino <[email protected]>
>
> This commit adds support to a new virtqueue called message virtqueue.

OK, this needs a lot of thought (especially since reworking the virtio
balloon is on the TODO list for the OASIS virtio technical committee...)

But AFAICT this is a way of explicitly saying "no" to the host's target
(vs the implicit method of just not meeting the target). I'm not sure
that gives enough information to the host. On the other hand, I'm not
sure what information we *can* give.

Should we instead be counter-proposing a target?

What does qemu do with this information?

Thanks,
Rusty.

> The message virtqueue can be used by guests to notify the host about
> important memory-related state changes in the guest. Currently, the
> only implemented notification is the "guest is under pressure" one,
> which informs the host that the guest is under memory pressure.
>
> This notification can be used to implement automatic memory ballooning
> in the host. For example, once learning that the guest is under pressure,
> the host could cancel an on-going inflate and/or start a deflate
> operation.
>
> Doing this through a virtqueue might seem like overkill, as all
> we're doing is to transfer an integer between guest and host. However,
> using a virtqueue offers the following advantages:
>
> 1. We can realibly synchronize host and guest. That is, the guest
> will only continue once the host acknowledges the notification.
> This is important, because if the guest gets under pressure while
> inflating the balloon, it has to stop to give the host a chance
> to reset "num_pages" (see next commit)
>
> 2. It's extensible. We may (or may not) want to tell the host
> which pressure level the guest finds itself in (ie. low,
> medium or critical)
>
> The lightweight alternative is to use a configuration space parameter.
> For this to work though, the guest would have to wait the for host
> to acknowloedge the receipt of a configuration change update. I could
> try this if the virtqueue is too overkill.
>
> Finally, the guest learns it's under pressure by registering a
> callback with the in-kernel vmpressure API.
>
> FIXMEs:
>
> - vmpressure API is missing an de-registration routine
> - not completely sure my changes in virtballoon_probe() are correct
>
> Signed-off-by: Luiz capitulino <[email protected]>
> ---
> drivers/virtio/virtio_balloon.c | 93 +++++++++++++++++++++++++++++++++----
> include/uapi/linux/virtio_balloon.h | 1 +
> 2 files changed, 84 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 5c4a95b..1c3ee71 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -29,6 +29,9 @@
> #include <linux/module.h>
> #include <linux/balloon_compaction.h>
>
> +#include <linux/cgroup.h>
> +#include <linux/vmpressure.h>
> +
> /*
> * Balloon device works in 4K page units. So each page is pointed to by
> * multiple balloon pages. All memory counters in this driver are in balloon
> @@ -37,10 +40,12 @@
> #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
> #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
>
> +#define VIRTIO_BALLOON_MSG_PRESSURE 1
> +
> struct virtio_balloon
> {
> struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *message_vq;
>
> /* Where the ballooning thread waits for config to change. */
> wait_queue_head_t config_change;
> @@ -51,6 +56,8 @@ struct virtio_balloon
> /* Waiting for host to ack the pages we released. */
> wait_queue_head_t acked;
>
> + wait_queue_head_t message_acked;
> +
> /* Number of balloon pages we've told the Host we're not using. */
> unsigned int num_pages;
> /*
> @@ -71,6 +78,9 @@ struct virtio_balloon
> /* Memory statistics */
> int need_stats_update;
> struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
> +
> + /* Message virtqueue */
> + atomic_t guest_pressure;
> };
>
> static struct virtio_device_id id_table[] = {
> @@ -78,6 +88,41 @@ static struct virtio_device_id id_table[] = {
> { 0 },
> };
>
> +static inline bool guest_under_pressure(const struct virtio_balloon *vb)
> +{
> + return atomic_read(&vb->guest_pressure) == 1;
> +}
> +
> +static void vmpressure_event_handler(void *data, int level)
> +{
> + struct virtio_balloon *vb = data;
> +
> + atomic_set(&vb->guest_pressure, 1);
> + wake_up(&vb->config_change);
> +}
> +
> +static void tell_host_pressure(struct virtio_balloon *vb)
> +{
> + const uint32_t msg = VIRTIO_BALLOON_MSG_PRESSURE;
> + struct scatterlist sg;
> + unsigned int len;
> + int err;
> +
> + sg_init_one(&sg, &msg, sizeof(msg));
> +
> + err = virtqueue_add_outbuf(vb->message_vq, &sg, 1, vb, GFP_KERNEL);
> + if (err < 0) {
> + printk(KERN_WARNING "virtio-balloon: failed to send host message (%d)\n", err);
> + goto out;
> + }
> + virtqueue_kick(vb->message_vq);
> +
> + wait_event(vb->message_acked, virtqueue_get_buf(vb->message_vq, &len));
> +
> +out:
> + atomic_set(&vb->guest_pressure, 0);
> +}
> +
> static u32 page_to_balloon_pfn(struct page *page)
> {
> unsigned long pfn = page_to_pfn(page);
> @@ -100,6 +145,13 @@ static void balloon_ack(struct virtqueue *vq)
> wake_up(&vb->acked);
> }
>
> +static void message_ack(struct virtqueue *vq)
> +{
> + struct virtio_balloon *vb = vq->vdev->priv;
> +
> + wake_up(&vb->message_acked);
> +}
> +
> static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> {
> struct scatterlist sg;
> @@ -300,6 +352,7 @@ static int balloon(void *_vballoon)
> try_to_freeze();
> wait_event_interruptible(vb->config_change,
> (diff = towards_target(vb)) != 0
> + || guest_under_pressure(vb)
> || vb->need_stats_update
> || kthread_should_stop()
> || freezing(current));
> @@ -310,31 +363,38 @@ static int balloon(void *_vballoon)
> else if (diff < 0)
> leak_balloon(vb, -diff);
> update_balloon_size(vb);
> +
> + if (guest_under_pressure(vb))
> + tell_host_pressure(vb);
> +
> }
> return 0;
> }
>
> static int init_vqs(struct virtio_balloon *vb)
> {
> - struct virtqueue *vqs[3];
> - vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
> - const char *names[] = { "inflate", "deflate", "stats" };
> - int err, nvqs;
> + struct virtqueue *vqs[4];
> + vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
> + stats_request, message_ack };
> + const char *names[] = { "inflate", "deflate", "stats", "pressure" };
> + int err, nvqs, idx;
>
> /*
> * We expect two virtqueues: inflate and deflate, and
> - * optionally stat.
> + * optionally stat and message.
> */
> - nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> + nvqs = 2 + virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) +
> + virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ);
> err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
> if (err)
> return err;
>
> - vb->inflate_vq = vqs[0];
> - vb->deflate_vq = vqs[1];
> + idx = 0;
> + vb->inflate_vq = vqs[idx++];
> + vb->deflate_vq = vqs[idx++];
> if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
> struct scatterlist sg;
> - vb->stats_vq = vqs[2];
> + vb->stats_vq = vqs[idx++];
>
> /*
> * Prime this virtqueue with one buffer so the hypervisor can
> @@ -346,6 +406,9 @@ static int init_vqs(struct virtio_balloon *vb)
> BUG();
> virtqueue_kick(vb->stats_vq);
> }
> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ))
> + vb->message_vq = vqs[idx];
> +
> return 0;
> }
>
> @@ -438,9 +501,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
> vb->num_pages = 0;
> mutex_init(&vb->balloon_lock);
> init_waitqueue_head(&vb->config_change);
> + init_waitqueue_head(&vb->message_acked);
> init_waitqueue_head(&vb->acked);
> vb->vdev = vdev;
> vb->need_stats_update = 0;
> + atomic_set(&vb->guest_pressure, 0);
>
> vb_devinfo = balloon_devinfo_alloc(vb);
> if (IS_ERR(vb_devinfo)) {
> @@ -467,6 +532,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
> if (err)
> goto out_free_vb_mapping;
>
> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ)) {
> + err = vmpressure_register_kernel_event(NULL, vmpressure_event_handler, vb);
> + if (err)
> + goto out_free_vb_mapping;
> + }
> +
> vb->thread = kthread_run(balloon, vb, "vballoon");
> if (IS_ERR(vb->thread)) {
> err = PTR_ERR(vb->thread);
> @@ -476,6 +547,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
> return 0;
>
> out_del_vqs:
> + /* FIXME: add vmpressure_deregister_kernel_event() */
> vdev->config->del_vqs(vdev);
> out_free_vb_mapping:
> balloon_mapping_free(vb_mapping);
> @@ -543,6 +615,7 @@ static int virtballoon_restore(struct virtio_device *vdev)
> static unsigned int features[] = {
> VIRTIO_BALLOON_F_MUST_TELL_HOST,
> VIRTIO_BALLOON_F_STATS_VQ,
> + VIRTIO_BALLOON_F_MESSAGE_VQ,
> };
>
> static struct virtio_driver virtio_balloon_driver = {
> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> index 5e26f61..846e46c 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -31,6 +31,7 @@
> /* The feature bitmap for virtio balloon */
> #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
> #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */
> +#define VIRTIO_BALLOON_F_MESSAGE_VQ 2 /* Message virtqueue */
>
> /* Size of a PFN in the balloon interface. */
> #define VIRTIO_BALLOON_PFN_SHIFT 12
> --
> 1.8.1.4

2014-01-17 01:38:28

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue

On Fri, 17 Jan 2014 09:10:47 +1030
Rusty Russell <[email protected]> wrote:

> Luiz Capitulino <[email protected]> writes:
> > From: Luiz capitulino <[email protected]>
> >
> > This commit adds support to a new virtqueue called message virtqueue.
>
> OK, this needs a lot of thought (especially since reworking the virtio
> balloon is on the TODO list for the OASIS virtio technical committee...)
>
> But AFAICT this is a way of explicitly saying "no" to the host's target
> (vs the implicit method of just not meeting the target). I'm not sure
> that gives enough information to the host. On the other hand, I'm not
> sure what information we *can* give.
>
> Should we instead be counter-proposing a target?

The problem is how to estimate a target value. I found it simpler
to just try to obey what the host is asking for (and fail if not
possible) than trying to make the guest negotiate with the host.

> What does qemu do with this information?

There are two possible scenarios:

1. The balloon driver is currently inflating when it gets under
pressure

QEMU resets "num_pages" to the current balloon size. This
cancels the on-going inflate

2. The balloon driver is not inflating, eg. it's possibly sleeping

QEMU issues a deflate

But note that those scenarios are not supposed to be used with the
current device, they are part of the automatic ballooning feature.
I CC'ed you on the QEMU patch, you can find it here case you didn't
see it:

http://marc.info/?l=kvm&m=138988966315690&w=2

>
> Thanks,
> Rusty.
>
> > The message virtqueue can be used by guests to notify the host about
> > important memory-related state changes in the guest. Currently, the
> > only implemented notification is the "guest is under pressure" one,
> > which informs the host that the guest is under memory pressure.
> >
> > This notification can be used to implement automatic memory ballooning
> > in the host. For example, once learning that the guest is under pressure,
> > the host could cancel an on-going inflate and/or start a deflate
> > operation.
> >
> > Doing this through a virtqueue might seem like overkill, as all
> > we're doing is to transfer an integer between guest and host. However,
> > using a virtqueue offers the following advantages:
> >
> > 1. We can realibly synchronize host and guest. That is, the guest
> > will only continue once the host acknowledges the notification.
> > This is important, because if the guest gets under pressure while
> > inflating the balloon, it has to stop to give the host a chance
> > to reset "num_pages" (see next commit)
> >
> > 2. It's extensible. We may (or may not) want to tell the host
> > which pressure level the guest finds itself in (ie. low,
> > medium or critical)
> >
> > The lightweight alternative is to use a configuration space parameter.
> > For this to work though, the guest would have to wait the for host
> > to acknowloedge the receipt of a configuration change update. I could
> > try this if the virtqueue is too overkill.
> >
> > Finally, the guest learns it's under pressure by registering a
> > callback with the in-kernel vmpressure API.
> >
> > FIXMEs:
> >
> > - vmpressure API is missing an de-registration routine
> > - not completely sure my changes in virtballoon_probe() are correct
> >
> > Signed-off-by: Luiz capitulino <[email protected]>
> > ---
> > drivers/virtio/virtio_balloon.c | 93 +++++++++++++++++++++++++++++++++----
> > include/uapi/linux/virtio_balloon.h | 1 +
> > 2 files changed, 84 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > index 5c4a95b..1c3ee71 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -29,6 +29,9 @@
> > #include <linux/module.h>
> > #include <linux/balloon_compaction.h>
> >
> > +#include <linux/cgroup.h>
> > +#include <linux/vmpressure.h>
> > +
> > /*
> > * Balloon device works in 4K page units. So each page is pointed to by
> > * multiple balloon pages. All memory counters in this driver are in balloon
> > @@ -37,10 +40,12 @@
> > #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
> > #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
> >
> > +#define VIRTIO_BALLOON_MSG_PRESSURE 1
> > +
> > struct virtio_balloon
> > {
> > struct virtio_device *vdev;
> > - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> > + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *message_vq;
> >
> > /* Where the ballooning thread waits for config to change. */
> > wait_queue_head_t config_change;
> > @@ -51,6 +56,8 @@ struct virtio_balloon
> > /* Waiting for host to ack the pages we released. */
> > wait_queue_head_t acked;
> >
> > + wait_queue_head_t message_acked;
> > +
> > /* Number of balloon pages we've told the Host we're not using. */
> > unsigned int num_pages;
> > /*
> > @@ -71,6 +78,9 @@ struct virtio_balloon
> > /* Memory statistics */
> > int need_stats_update;
> > struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
> > +
> > + /* Message virtqueue */
> > + atomic_t guest_pressure;
> > };
> >
> > static struct virtio_device_id id_table[] = {
> > @@ -78,6 +88,41 @@ static struct virtio_device_id id_table[] = {
> > { 0 },
> > };
> >
> > +static inline bool guest_under_pressure(const struct virtio_balloon *vb)
> > +{
> > + return atomic_read(&vb->guest_pressure) == 1;
> > +}
> > +
> > +static void vmpressure_event_handler(void *data, int level)
> > +{
> > + struct virtio_balloon *vb = data;
> > +
> > + atomic_set(&vb->guest_pressure, 1);
> > + wake_up(&vb->config_change);
> > +}
> > +
> > +static void tell_host_pressure(struct virtio_balloon *vb)
> > +{
> > + const uint32_t msg = VIRTIO_BALLOON_MSG_PRESSURE;
> > + struct scatterlist sg;
> > + unsigned int len;
> > + int err;
> > +
> > + sg_init_one(&sg, &msg, sizeof(msg));
> > +
> > + err = virtqueue_add_outbuf(vb->message_vq, &sg, 1, vb, GFP_KERNEL);
> > + if (err < 0) {
> > + printk(KERN_WARNING "virtio-balloon: failed to send host message (%d)\n", err);
> > + goto out;
> > + }
> > + virtqueue_kick(vb->message_vq);
> > +
> > + wait_event(vb->message_acked, virtqueue_get_buf(vb->message_vq, &len));
> > +
> > +out:
> > + atomic_set(&vb->guest_pressure, 0);
> > +}
> > +
> > static u32 page_to_balloon_pfn(struct page *page)
> > {
> > unsigned long pfn = page_to_pfn(page);
> > @@ -100,6 +145,13 @@ static void balloon_ack(struct virtqueue *vq)
> > wake_up(&vb->acked);
> > }
> >
> > +static void message_ack(struct virtqueue *vq)
> > +{
> > + struct virtio_balloon *vb = vq->vdev->priv;
> > +
> > + wake_up(&vb->message_acked);
> > +}
> > +
> > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> > {
> > struct scatterlist sg;
> > @@ -300,6 +352,7 @@ static int balloon(void *_vballoon)
> > try_to_freeze();
> > wait_event_interruptible(vb->config_change,
> > (diff = towards_target(vb)) != 0
> > + || guest_under_pressure(vb)
> > || vb->need_stats_update
> > || kthread_should_stop()
> > || freezing(current));
> > @@ -310,31 +363,38 @@ static int balloon(void *_vballoon)
> > else if (diff < 0)
> > leak_balloon(vb, -diff);
> > update_balloon_size(vb);
> > +
> > + if (guest_under_pressure(vb))
> > + tell_host_pressure(vb);
> > +
> > }
> > return 0;
> > }
> >
> > static int init_vqs(struct virtio_balloon *vb)
> > {
> > - struct virtqueue *vqs[3];
> > - vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
> > - const char *names[] = { "inflate", "deflate", "stats" };
> > - int err, nvqs;
> > + struct virtqueue *vqs[4];
> > + vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
> > + stats_request, message_ack };
> > + const char *names[] = { "inflate", "deflate", "stats", "pressure" };
> > + int err, nvqs, idx;
> >
> > /*
> > * We expect two virtqueues: inflate and deflate, and
> > - * optionally stat.
> > + * optionally stat and message.
> > */
> > - nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > + nvqs = 2 + virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) +
> > + virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ);
> > err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
> > if (err)
> > return err;
> >
> > - vb->inflate_vq = vqs[0];
> > - vb->deflate_vq = vqs[1];
> > + idx = 0;
> > + vb->inflate_vq = vqs[idx++];
> > + vb->deflate_vq = vqs[idx++];
> > if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
> > struct scatterlist sg;
> > - vb->stats_vq = vqs[2];
> > + vb->stats_vq = vqs[idx++];
> >
> > /*
> > * Prime this virtqueue with one buffer so the hypervisor can
> > @@ -346,6 +406,9 @@ static int init_vqs(struct virtio_balloon *vb)
> > BUG();
> > virtqueue_kick(vb->stats_vq);
> > }
> > + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ))
> > + vb->message_vq = vqs[idx];
> > +
> > return 0;
> > }
> >
> > @@ -438,9 +501,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
> > vb->num_pages = 0;
> > mutex_init(&vb->balloon_lock);
> > init_waitqueue_head(&vb->config_change);
> > + init_waitqueue_head(&vb->message_acked);
> > init_waitqueue_head(&vb->acked);
> > vb->vdev = vdev;
> > vb->need_stats_update = 0;
> > + atomic_set(&vb->guest_pressure, 0);
> >
> > vb_devinfo = balloon_devinfo_alloc(vb);
> > if (IS_ERR(vb_devinfo)) {
> > @@ -467,6 +532,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
> > if (err)
> > goto out_free_vb_mapping;
> >
> > + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ)) {
> > + err = vmpressure_register_kernel_event(NULL, vmpressure_event_handler, vb);
> > + if (err)
> > + goto out_free_vb_mapping;
> > + }
> > +
> > vb->thread = kthread_run(balloon, vb, "vballoon");
> > if (IS_ERR(vb->thread)) {
> > err = PTR_ERR(vb->thread);
> > @@ -476,6 +547,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
> > return 0;
> >
> > out_del_vqs:
> > + /* FIXME: add vmpressure_deregister_kernel_event() */
> > vdev->config->del_vqs(vdev);
> > out_free_vb_mapping:
> > balloon_mapping_free(vb_mapping);
> > @@ -543,6 +615,7 @@ static int virtballoon_restore(struct virtio_device *vdev)
> > static unsigned int features[] = {
> > VIRTIO_BALLOON_F_MUST_TELL_HOST,
> > VIRTIO_BALLOON_F_STATS_VQ,
> > + VIRTIO_BALLOON_F_MESSAGE_VQ,
> > };
> >
> > static struct virtio_driver virtio_balloon_driver = {
> > diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> > index 5e26f61..846e46c 100644
> > --- a/include/uapi/linux/virtio_balloon.h
> > +++ b/include/uapi/linux/virtio_balloon.h
> > @@ -31,6 +31,7 @@
> > /* The feature bitmap for virtio balloon */
> > #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
> > #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */
> > +#define VIRTIO_BALLOON_F_MESSAGE_VQ 2 /* Message virtqueue */
> >
> > /* Size of a PFN in the balloon interface. */
> > #define VIRTIO_BALLOON_PFN_SHIFT 12
> > --
> > 1.8.1.4
>

2014-01-17 01:43:57

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue

On Thu, 16 Jan 2014 20:38:19 -0500
Luiz Capitulino <[email protected]> wrote:

> > What does qemu do with this information?
>
> There are two possible scenarios:
>
> 1. The balloon driver is currently inflating when it gets under
> pressure
>
> QEMU resets "num_pages" to the current balloon size. This
> cancels the on-going inflate
>
> 2. The balloon driver is not inflating, eg. it's possibly sleeping
>
> QEMU issues a deflate
>
> But note that those scenarios are not supposed to be used with the
> current device, they are part of the automatic ballooning feature.
> I CC'ed you on the QEMU patch, you can find it here case you didn't
> see it:
>
> http://marc.info/?l=kvm&m=138988966315690&w=2

By "current device" I meant "outside of automatic ballooning scope".

2014-01-20 02:55:48

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue

Luiz Capitulino <[email protected]> writes:
> On Fri, 17 Jan 2014 09:10:47 +1030
> Rusty Russell <[email protected]> wrote:
>
>> Luiz Capitulino <[email protected]> writes:
>> > From: Luiz capitulino <[email protected]>
>> >
>> > This commit adds support to a new virtqueue called message virtqueue.
>>
>> OK, this needs a lot of thought (especially since reworking the virtio
>> balloon is on the TODO list for the OASIS virtio technical committee...)
>>
>> But AFAICT this is a way of explicitly saying "no" to the host's target
>> (vs the implicit method of just not meeting the target). I'm not sure
>> that gives enough information to the host. On the other hand, I'm not
>> sure what information we *can* give.
>>
>> Should we instead be counter-proposing a target?
>
> The problem is how to estimate a target value. I found it simpler
> to just try to obey what the host is asking for (and fail if not
> possible) than trying to make the guest negotiate with the host.

Understood, but we already do this the other way where the host tells
the guest how much memory to give up.

And is a guest expected to automatically inflate the balloon even if the
host doesn't want the memory, or wait to be asked?

> >> What does qemu do with this information?
> >
> > There are two possible scenarios:
> >
> > 1. The balloon driver is currently inflating when it gets under
> > pressure
> >
> > QEMU resets "num_pages" to the current balloon size. This
> > cancels the on-going inflate
> >
> > 2. The balloon driver is not inflating, eg. it's possibly sleeping
> >
> > QEMU issues a deflate
> >
> > But note that those scenarios are not supposed to be used with the
> > current device, they are part of the automatic ballooning feature.
> > I CC'ed you on the QEMU patch, you can find it here case you didn't
> > see it:
> >
> > http://marc.info/?l=kvm&m=138988966315690&w=2

Yes, caught that after I replied; I should have checked first.

It seems like we are still figuring out how to do ballooning. The
best approach in cases like this is to make it simple, so I don't hate
this.

But note that Daniel Kiper and I have been discussing a new virtio
balloon for draft 2 of the standard. I'll CC you when I post that to
one of the OASIS virtio mailing lists.

Cheers,
Rusty.

2014-01-20 14:59:30

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue

On Mon, 20 Jan 2014 12:43:45 +1030
Rusty Russell <[email protected]> wrote:

> Luiz Capitulino <[email protected]> writes:
> > On Fri, 17 Jan 2014 09:10:47 +1030
> > Rusty Russell <[email protected]> wrote:
> >
> >> Luiz Capitulino <[email protected]> writes:
> >> > From: Luiz capitulino <[email protected]>
> >> >
> >> > This commit adds support to a new virtqueue called message virtqueue.
> >>
> >> OK, this needs a lot of thought (especially since reworking the virtio
> >> balloon is on the TODO list for the OASIS virtio technical committee...)
> >>
> >> But AFAICT this is a way of explicitly saying "no" to the host's target
> >> (vs the implicit method of just not meeting the target). I'm not sure
> >> that gives enough information to the host. On the other hand, I'm not
> >> sure what information we *can* give.
> >>
> >> Should we instead be counter-proposing a target?
> >
> > The problem is how to estimate a target value. I found it simpler
> > to just try to obey what the host is asking for (and fail if not
> > possible) than trying to make the guest negotiate with the host.
>
> Understood, but we already do this the other way where the host tells
> the guest how much memory to give up.
>
> And is a guest expected to automatically inflate the balloon even if the
> host doesn't want the memory, or wait to be asked?

In my current design the guest always waits to be asked. Actually, all
automatic inflates and deflates are started by the host. An advantage
of this approach is that all the logic stays on the host, which makes
things simple as it matches with current balloon design.

Btw, you asked about what information we could provide to the host and
I forgot something important. The vmpressure notification (take a look
at patch 1/4 and patch 2/4) does provide an information that could be
interesting to have in the host: it provides a pressure level ("low",
"medium" or "critical") which is part of kernel's ABI with user-space.

I'm thinking about using this pressure level information to implement
different policies for automatic ballooning, which could be set by
the user or a management tool.

> > >> What does qemu do with this information?
> > >
> > > There are two possible scenarios:
> > >
> > > 1. The balloon driver is currently inflating when it gets under
> > > pressure
> > >
> > > QEMU resets "num_pages" to the current balloon size. This
> > > cancels the on-going inflate
> > >
> > > 2. The balloon driver is not inflating, eg. it's possibly sleeping
> > >
> > > QEMU issues a deflate
> > >
> > > But note that those scenarios are not supposed to be used with the
> > > current device, they are part of the automatic ballooning feature.
> > > I CC'ed you on the QEMU patch, you can find it here case you didn't
> > > see it:
> > >
> > > http://marc.info/?l=kvm&m=138988966315690&w=2
>
> Yes, caught that after I replied; I should have checked first.
>
> It seems like we are still figuring out how to do ballooning. The
> best approach in cases like this is to make it simple, so I don't hate
> this.

Good news to me :)

> But note that Daniel Kiper and I have been discussing a new virtio
> balloon for draft 2 of the standard. I'll CC you when I post that to
> one of the OASIS virtio mailing lists.

Thank you, I appreciate that.