Hey,
These patches were posted way back in Novemeber for issues discovered with Xen 4.4.
I posted the patches and then dropped them to focus on PVH. Now dusting off
my patch queue.
The
[PATCH 1/2] xen/xenbus: Avoid synchronous wait on XenBus stalling
fixes an issue were initial domain would hang if it had guest running.
Usually the toolstack is in charge of such things but in case that had died
- well we still want to reboot but we couldn't because we were stuck waiting
for a reply that we would never get.
The
[PATCH 2/2] xen/manage: Poweroff forcefully if user-space is not yet
fixes a bit of how Linux handles the 'xl shutdown' command. In theory if
the first 'xl shutdown' didn't do it - send another, and then another.
Except that the kernel would gate on the first and ignore the rest. Even
if it failed to shutdown. This mean the user had to destroy the guest.
This patch fixes that and also makes the shutdown work when we are in the
booting stage.
Please pull these in stable/for-linus-3.15 after rc1. Thanks!
drivers/xen/manage.c | 32 ++++++++++++++++++++++++++++--
drivers/xen/xenbus/xenbus_xs.c | 44 +++++++++++++++++++++++++++++++++++++++---
2 files changed, 71 insertions(+), 5 deletions(-)
Konrad Rzeszutek Wilk (2):
xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
xen/manage: Poweroff forcefully if user-space is not yet up.
The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
'process_msg' is running from within the 'xenbus' thread. Whenever
a message shows up in XenBus it is put on a xs_state.reply_list list
and 'read_reply' picks it up.
The problem is if the backend domain or the xenstored process is killed.
In which case 'xenbus' is still awaiting - and 'read_reply' if called -
stuck forever waiting for the reply_list to have some contents.
This is normally not a problem - as the backend domain can come back
or the xenstored process can be restarted. However if the domain
is in process of being powered off/restarted/halted - there is no
point of waiting on it coming back - as we are effectively being
terminated and should not impede the progress.
This patch solves this problem by checking whether the guest is
the right domain. If it is an initial domain and hurtling towards
death - there is no point of continuing the wait. All other type
of guests continue with their behavior.
mechanism a bit more asynchronous.
Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
[v2: Fixed it up per David's suggestions]
Reviewed-by: David Vrabel <[email protected]>
---
drivers/xen/xenbus/xenbus_xs.c | 44 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 41 insertions(+), 3 deletions(-)
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index b6d5fff..ba804f3 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -50,6 +50,7 @@
#include <xen/xenbus.h>
#include <xen/xen.h>
#include "xenbus_comms.h"
+#include "xenbus_probe.h"
struct xs_stored_msg {
struct list_head list;
@@ -139,6 +140,29 @@ static int get_error(const char *errorstring)
return xsd_errors[i].errnum;
}
+static bool xenbus_ok(void)
+{
+ switch (xen_store_domain_type) {
+ case XS_LOCAL:
+ switch (system_state) {
+ case SYSTEM_POWER_OFF:
+ case SYSTEM_RESTART:
+ case SYSTEM_HALT:
+ return false;
+ default:
+ break;
+ }
+ return true;
+ case XS_PV:
+ case XS_HVM:
+ /* FIXME: Could check that the remote domain is alive,
+ * but it is normally initial domain. */
+ return true;
+ default:
+ break;
+ }
+ return false;
+}
static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
{
struct xs_stored_msg *msg;
@@ -148,9 +172,20 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
while (list_empty(&xs_state.reply_list)) {
spin_unlock(&xs_state.reply_lock);
- /* XXX FIXME: Avoid synchronous wait for response here. */
- wait_event(xs_state.reply_waitq,
- !list_empty(&xs_state.reply_list));
+ if (xenbus_ok())
+ /* XXX FIXME: Avoid synchronous wait for response here. */
+ wait_event_timeout(xs_state.reply_waitq,
+ !list_empty(&xs_state.reply_list),
+ msecs_to_jiffies(500));
+ else {
+ /*
+ * If we are in the process of being shut-down there is
+ * no point of trying to contact XenBus - it is either
+ * killed (xenstored application) or the other domain
+ * has been killed or is unreachable.
+ */
+ return ERR_PTR(-EIO);
+ }
spin_lock(&xs_state.reply_lock);
}
@@ -215,6 +250,9 @@ void *xenbus_dev_request_and_reply(struct xsd_sockmsg *msg)
mutex_unlock(&xs_state.request_mutex);
+ if (IS_ERR(ret))
+ return ret;
+
if ((msg->type == XS_TRANSACTION_END) ||
((req_msg.type == XS_TRANSACTION_START) &&
(msg->type == XS_ERROR)))
--
1.8.5.3
The user can launch the guest in this sequence:
xl create -p /vm.cfg [launch, but pause it]
xl shutdown latest [sets control/shutdown=poweroff]
xl unpause latest
xl console latest [and see that the guest has completely
ignored the shutdown request]
In reality the guest hasn't ignored it. It registers a watch
and gets a notification that there is value. It then calls
the shutdown_handler which ends up calling orderly_shutdown.
Unfortunately that is so early in the bootup that there
are no user-space. Which means that the orderly_shutdown fails.
But since the force flag was set to false it continues on without
reporting.
What we really want to is to use the force when we are in the
SYSTEM_BOOTING state and not use the 'force' when SYSTEM_RUNNING.
However, if we are in the running state - and the shutdown command
has been given before the user-space has been setup, there is nothing
we can do. Worst yet, we stop ignoring the 'xl shutdown' requests!
As such, the other part of this patch is to only stop ignoring
the 'xl shutdown' when we are truly in the power off sequence.
That means the user can do multiple 'xl shutdown' and we will try
to act on them instead of ignoring them.
Fixes-Bug: http://bugs.xenproject.org/xen/bug/6
Reported-by: Alex Bligh <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
[v2: Add switch statement]
[v3: Add a reboot notifier]
---
drivers/xen/manage.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 624e8dc..0cf7fe1 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -182,10 +182,32 @@ struct shutdown_handler {
void (*cb)(void);
};
+static int poweroff_nb(struct notifier_block *cb, unsigned long code, void *unused)
+{
+ switch (code) {
+ case SYS_DOWN:
+ case SYS_HALT:
+ case SYS_POWER_OFF:
+ shutting_down = SHUTDOWN_POWEROFF;
+ default:
+ break;
+ }
+ return NOTIFY_DONE;
+}
static void do_poweroff(void)
{
- shutting_down = SHUTDOWN_POWEROFF;
- orderly_poweroff(false);
+ switch (system_state) {
+ case SYSTEM_BOOTING:
+ orderly_poweroff(true);
+ break;
+ case SYSTEM_RUNNING:
+ orderly_poweroff(false);
+ break;
+ default:
+ /* Don't do it when we are halting/rebooting. */
+ pr_info("Ignoring Xen toolstack shutdown.\n");
+ break;
+ }
}
static void do_reboot(void)
@@ -291,6 +313,10 @@ static struct xenbus_watch shutdown_watch = {
.callback = shutdown_handler
};
+static struct notifier_block xen_reboot_nb = {
+ .notifier_call = poweroff_nb,
+};
+
static int setup_shutdown_watcher(void)
{
int err;
@@ -301,6 +327,7 @@ static int setup_shutdown_watcher(void)
return err;
}
+
#ifdef CONFIG_MAGIC_SYSRQ
err = register_xenbus_watch(&sysrq_watch);
if (err) {
@@ -329,6 +356,7 @@ int xen_setup_shutdown_event(void)
if (!xen_domain())
return -ENODEV;
register_xenstore_notifier(&xenstore_notifier);
+ register_reboot_notifier(&xen_reboot_nb);
return 0;
}
--
1.8.5.3
On 04/04/2014 02:53 PM, Konrad Rzeszutek Wilk wrote:
> The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> 'process_msg' is running from within the 'xenbus' thread. Whenever
> a message shows up in XenBus it is put on a xs_state.reply_list list
> and 'read_reply' picks it up.
>
> The problem is if the backend domain or the xenstored process is killed.
> In which case 'xenbus' is still awaiting - and 'read_reply' if called -
> stuck forever waiting for the reply_list to have some contents.
>
> This is normally not a problem - as the backend domain can come back
> or the xenstored process can be restarted. However if the domain
> is in process of being powered off/restarted/halted - there is no
> point of waiting on it coming back - as we are effectively being
> terminated and should not impede the progress.
>
> This patch solves this problem by checking whether the guest is
> the right domain. If it is an initial domain and hurtling towards
> death - there is no point of continuing the wait. All other type
> of guests continue with their behavior.
> mechanism a bit more asynchronous.
This looks like a runaway sentence.
Other than that
Reviewed-by: Boris Ostrovsky <[email protected]>
>
> Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> [v2: Fixed it up per David's suggestions]
> Reviewed-by: David Vrabel <[email protected]>
> ---
> drivers/xen/xenbus/xenbus_xs.c | 44 +++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 41 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
> index b6d5fff..ba804f3 100644
> --- a/drivers/xen/xenbus/xenbus_xs.c
> +++ b/drivers/xen/xenbus/xenbus_xs.c
> @@ -50,6 +50,7 @@
> #include <xen/xenbus.h>
> #include <xen/xen.h>
> #include "xenbus_comms.h"
> +#include "xenbus_probe.h"
>
> struct xs_stored_msg {
> struct list_head list;
> @@ -139,6 +140,29 @@ static int get_error(const char *errorstring)
> return xsd_errors[i].errnum;
> }
>
> +static bool xenbus_ok(void)
> +{
> + switch (xen_store_domain_type) {
> + case XS_LOCAL:
> + switch (system_state) {
> + case SYSTEM_POWER_OFF:
> + case SYSTEM_RESTART:
> + case SYSTEM_HALT:
> + return false;
> + default:
> + break;
> + }
> + return true;
> + case XS_PV:
> + case XS_HVM:
> + /* FIXME: Could check that the remote domain is alive,
> + * but it is normally initial domain. */
> + return true;
> + default:
> + break;
> + }
> + return false;
> +}
> static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
> {
> struct xs_stored_msg *msg;
> @@ -148,9 +172,20 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
>
> while (list_empty(&xs_state.reply_list)) {
> spin_unlock(&xs_state.reply_lock);
> - /* XXX FIXME: Avoid synchronous wait for response here. */
> - wait_event(xs_state.reply_waitq,
> - !list_empty(&xs_state.reply_list));
> + if (xenbus_ok())
> + /* XXX FIXME: Avoid synchronous wait for response here. */
> + wait_event_timeout(xs_state.reply_waitq,
> + !list_empty(&xs_state.reply_list),
> + msecs_to_jiffies(500));
> + else {
> + /*
> + * If we are in the process of being shut-down there is
> + * no point of trying to contact XenBus - it is either
> + * killed (xenstored application) or the other domain
> + * has been killed or is unreachable.
> + */
> + return ERR_PTR(-EIO);
> + }
> spin_lock(&xs_state.reply_lock);
> }
>
> @@ -215,6 +250,9 @@ void *xenbus_dev_request_and_reply(struct xsd_sockmsg *msg)
>
> mutex_unlock(&xs_state.request_mutex);
>
> + if (IS_ERR(ret))
> + return ret;
> +
> if ((msg->type == XS_TRANSACTION_END) ||
> ((req_msg.type == XS_TRANSACTION_START) &&
> (msg->type == XS_ERROR)))
On Fri, Apr 04, 2014 at 04:35:27PM -0400, Boris Ostrovsky wrote:
> On 04/04/2014 02:53 PM, Konrad Rzeszutek Wilk wrote:
> >The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> >'process_msg' is running from within the 'xenbus' thread. Whenever
> >a message shows up in XenBus it is put on a xs_state.reply_list list
> >and 'read_reply' picks it up.
> >
> >The problem is if the backend domain or the xenstored process is killed.
> >In which case 'xenbus' is still awaiting - and 'read_reply' if called -
> >stuck forever waiting for the reply_list to have some contents.
> >
> >This is normally not a problem - as the backend domain can come back
> >or the xenstored process can be restarted. However if the domain
> >is in process of being powered off/restarted/halted - there is no
> >point of waiting on it coming back - as we are effectively being
> >terminated and should not impede the progress.
> >
> >This patch solves this problem by checking whether the guest is
> >the right domain. If it is an initial domain and hurtling towards
> >death - there is no point of continuing the wait. All other type
> >of guests continue with their behavior.
> >mechanism a bit more asynchronous.
>
> This looks like a runaway sentence.
I am not sure what I thought when I wrote that. It should just
go away. Thanks for noticing it!
>
> Other than that
> Reviewed-by: Boris Ostrovsky <[email protected]>
>
> >
> >Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
> >Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> >[v2: Fixed it up per David's suggestions]
> >Reviewed-by: David Vrabel <[email protected]>
> >---
> > drivers/xen/xenbus/xenbus_xs.c | 44 +++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 41 insertions(+), 3 deletions(-)
> >
> >diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
> >index b6d5fff..ba804f3 100644
> >--- a/drivers/xen/xenbus/xenbus_xs.c
> >+++ b/drivers/xen/xenbus/xenbus_xs.c
> >@@ -50,6 +50,7 @@
> > #include <xen/xenbus.h>
> > #include <xen/xen.h>
> > #include "xenbus_comms.h"
> >+#include "xenbus_probe.h"
> > struct xs_stored_msg {
> > struct list_head list;
> >@@ -139,6 +140,29 @@ static int get_error(const char *errorstring)
> > return xsd_errors[i].errnum;
> > }
> >+static bool xenbus_ok(void)
> >+{
> >+ switch (xen_store_domain_type) {
> >+ case XS_LOCAL:
> >+ switch (system_state) {
> >+ case SYSTEM_POWER_OFF:
> >+ case SYSTEM_RESTART:
> >+ case SYSTEM_HALT:
> >+ return false;
> >+ default:
> >+ break;
> >+ }
> >+ return true;
> >+ case XS_PV:
> >+ case XS_HVM:
> >+ /* FIXME: Could check that the remote domain is alive,
> >+ * but it is normally initial domain. */
> >+ return true;
> >+ default:
> >+ break;
> >+ }
> >+ return false;
> >+}
> > static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
> > {
> > struct xs_stored_msg *msg;
> >@@ -148,9 +172,20 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
> > while (list_empty(&xs_state.reply_list)) {
> > spin_unlock(&xs_state.reply_lock);
> >- /* XXX FIXME: Avoid synchronous wait for response here. */
> >- wait_event(xs_state.reply_waitq,
> >- !list_empty(&xs_state.reply_list));
> >+ if (xenbus_ok())
> >+ /* XXX FIXME: Avoid synchronous wait for response here. */
> >+ wait_event_timeout(xs_state.reply_waitq,
> >+ !list_empty(&xs_state.reply_list),
> >+ msecs_to_jiffies(500));
> >+ else {
> >+ /*
> >+ * If we are in the process of being shut-down there is
> >+ * no point of trying to contact XenBus - it is either
> >+ * killed (xenstored application) or the other domain
> >+ * has been killed or is unreachable.
> >+ */
> >+ return ERR_PTR(-EIO);
> >+ }
> > spin_lock(&xs_state.reply_lock);
> > }
> >@@ -215,6 +250,9 @@ void *xenbus_dev_request_and_reply(struct xsd_sockmsg *msg)
> > mutex_unlock(&xs_state.request_mutex);
> >+ if (IS_ERR(ret))
> >+ return ret;
> >+
> > if ((msg->type == XS_TRANSACTION_END) ||
> > ((req_msg.type == XS_TRANSACTION_START) &&
> > (msg->type == XS_ERROR)))
>
On 04/04/14 19:53, Konrad Rzeszutek Wilk wrote:
> The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> 'process_msg' is running from within the 'xenbus' thread. Whenever
> a message shows up in XenBus it is put on a xs_state.reply_list list
> and 'read_reply' picks it up.
>
> The problem is if the backend domain or the xenstored process is killed.
> In which case 'xenbus' is still awaiting - and 'read_reply' if called -
> stuck forever waiting for the reply_list to have some contents.
>
> This is normally not a problem - as the backend domain can come back
> or the xenstored process can be restarted. However if the domain
> is in process of being powered off/restarted/halted - there is no
> point of waiting on it coming back - as we are effectively being
> terminated and should not impede the progress.
>
> This patch solves this problem by checking whether the guest is
> the right domain. If it is an initial domain and hurtling towards
> death - there is no point of continuing the wait. All other type
> of guests continue with their behavior.
> mechanism a bit more asynchronous.
Applied to devel/for-linus-3.15.
I rewrote this last paragraph to:
"This patch solves this problem by checking whether the guest is the
right domain. If it is an initial domain and hurtling towards death -
there is no point of continuing the wait. All other type of guests
continue with their behavior (as Xenstore is expected to still be
running in another domain)."
David
On 04/04/14 19:53, Konrad Rzeszutek Wilk wrote:
> The user can launch the guest in this sequence:
>
> xl create -p /vm.cfg [launch, but pause it]
> xl shutdown latest [sets control/shutdown=poweroff]
> xl unpause latest
> xl console latest [and see that the guest has completely
> ignored the shutdown request]
>
> In reality the guest hasn't ignored it. It registers a watch
> and gets a notification that there is value. It then calls
> the shutdown_handler which ends up calling orderly_shutdown.
>
> Unfortunately that is so early in the bootup that there
> are no user-space. Which means that the orderly_shutdown fails.
> But since the force flag was set to false it continues on without
> reporting.
>
> What we really want to is to use the force when we are in the
> SYSTEM_BOOTING state and not use the 'force' when SYSTEM_RUNNING.
>
> However, if we are in the running state - and the shutdown command
> has been given before the user-space has been setup, there is nothing
> we can do. Worst yet, we stop ignoring the 'xl shutdown' requests!
>
> As such, the other part of this patch is to only stop ignoring
> the 'xl shutdown' when we are truly in the power off sequence.
>
> That means the user can do multiple 'xl shutdown' and we will try
> to act on them instead of ignoring them.
Applied to devel/for-linus-3.15
> Fixes-Bug: http://bugs.xenproject.org/xen/bug/6
> Reported-by: Alex Bligh <[email protected]>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> [v2: Add switch statement]
> [v3: Add a reboot notifier]
Can you put this version information after the --- in future?
Thanks.
David
On Mon, Apr 07, 2014 at 06:19:32PM +0100, David Vrabel wrote:
> On 04/04/14 19:53, Konrad Rzeszutek Wilk wrote:
> > The user can launch the guest in this sequence:
> >
> > xl create -p /vm.cfg [launch, but pause it]
> > xl shutdown latest [sets control/shutdown=poweroff]
> > xl unpause latest
> > xl console latest [and see that the guest has completely
> > ignored the shutdown request]
> >
> > In reality the guest hasn't ignored it. It registers a watch
> > and gets a notification that there is value. It then calls
> > the shutdown_handler which ends up calling orderly_shutdown.
> >
> > Unfortunately that is so early in the bootup that there
> > are no user-space. Which means that the orderly_shutdown fails.
> > But since the force flag was set to false it continues on without
> > reporting.
> >
> > What we really want to is to use the force when we are in the
> > SYSTEM_BOOTING state and not use the 'force' when SYSTEM_RUNNING.
> >
> > However, if we are in the running state - and the shutdown command
> > has been given before the user-space has been setup, there is nothing
> > we can do. Worst yet, we stop ignoring the 'xl shutdown' requests!
> >
> > As such, the other part of this patch is to only stop ignoring
> > the 'xl shutdown' when we are truly in the power off sequence.
> >
> > That means the user can do multiple 'xl shutdown' and we will try
> > to act on them instead of ignoring them.
>
> Applied to devel/for-linus-3.15
>
> > Fixes-Bug: http://bugs.xenproject.org/xen/bug/6
> > Reported-by: Alex Bligh <[email protected]>
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > [v2: Add switch statement]
> > [v3: Add a reboot notifier]
>
> Can you put this version information after the --- in future?
Aye.
>
> Thanks.
>
> David