2009-07-03 13:13:00

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH] xen: wait up to 5 minutes for device connection

Increases the device timeout from 10s to 5 minutes, giving the user a visual
indication during that time in case there are problems. The patch is a
backport of changesets 144, 146, 150 and 909 in the Xenbits tree.

Cc: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
drivers/xen/xenbus/xenbus_probe.c | 42 +++++++++++++++++++++++++-----------
1 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index d42e25d..4f69159 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -843,7 +843,7 @@ postcore_initcall(xenbus_probe_init);

MODULE_LICENSE("GPL");

-static int is_disconnected_device(struct device *dev, void *data)
+static int is_device_connecting(struct device *dev, void *data)
{
struct xenbus_device *xendev = to_xenbus_device(dev);
struct device_driver *drv = data;
@@ -861,14 +861,15 @@ static int is_disconnected_device(struct device *dev, void *data)
return 0;

xendrv = to_xenbus_driver(dev->driver);
- return (xendev->state != XenbusStateConnected ||
- (xendrv->is_ready && !xendrv->is_ready(xendev)));
+ return (xendev->state < XenbusStateConnected ||
+ (xendev->state == XenbusStateConnected &&
+ xendrv->is_ready && !xendrv->is_ready(xendev)));
}

-static int exists_disconnected_device(struct device_driver *drv)
+static int exists_connecting_device(struct device_driver *drv)
{
return bus_for_each_dev(&xenbus_frontend.bus, NULL, drv,
- is_disconnected_device);
+ is_device_connecting);
}

static int print_device_status(struct device *dev, void *data)
@@ -884,10 +885,13 @@ static int print_device_status(struct device *dev, void *data)
/* Information only: is this too noisy? */
printk(KERN_INFO "XENBUS: Device with no driver: %s\n",
xendev->nodename);
- } else if (xendev->state != XenbusStateConnected) {
+ } else if (xendev->state < XenbusStateConnected) {
+ enum xenbus_state rstate = XenbusStateUnknown;
+ if (xendev->otherend)
+ rstate = xenbus_read_driver_state(xendev->otherend);
printk(KERN_WARNING "XENBUS: Timeout connecting "
- "to device: %s (state %d)\n",
- xendev->nodename, xendev->state);
+ "to device: %s (local state %d, remote state %d)\n",
+ xendev->nodename, xendev->state, rstate);
}

return 0;
@@ -897,7 +901,7 @@ static int print_device_status(struct device *dev, void *data)
static int ready_to_wait_for_devices;

/*
- * On a 10 second timeout, wait for all devices currently configured. We need
+ * On a 5-minute timeout, wait for all devices currently configured. We need
* to do this to guarantee that the filesystems and / or network devices
* needed for boot are available, before we can allow the boot to proceed.
*
@@ -912,18 +916,30 @@ static int ready_to_wait_for_devices;
*/
static void wait_for_devices(struct xenbus_driver *xendrv)
{
- unsigned long timeout = jiffies + 10*HZ;
+ unsigned long start = jiffies;
struct device_driver *drv = xendrv ? &xendrv->driver : NULL;
+ unsigned int seconds_waited = 0;

if (!ready_to_wait_for_devices || !xen_domain())
return;

- while (exists_disconnected_device(drv)) {
- if (time_after(jiffies, timeout))
- break;
+ while (exists_connecting_device(drv)) {
+ if (time_after(jiffies, start + (seconds_waited+5)*HZ)) {
+ if (!seconds_waited)
+ printk(KERN_WARNING "XENBUS: Waiting for "
+ "devices to initialise: ");
+ seconds_waited += 5;
+ printk("%us...", 300 - seconds_waited);
+ if (seconds_waited == 300)
+ break;
+ }
+
schedule_timeout_interruptible(HZ/10);
}

+ if (seconds_waited)
+ printk("\n");
+
bus_for_each_dev(&xenbus_frontend.bus, NULL, drv,
print_device_status);
}
--
1.6.2.5


2009-07-03 13:46:17

by Ian Campbell

[permalink] [raw]
Subject: Re: [PATCH] xen: wait up to 5 minutes for device connection

On Fri, 2009-07-03 at 15:12 +0200, Paolo Bonzini wrote:
> Increases the device timeout from 10s to 5 minutes, giving the user a visual
> indication during that time in case there are problems. The patch is a
> backport of changesets 144, 146, 150 and 909 in the Xenbits tree.
>
> Cc: Jeremy Fitzhardinge <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>

Acked-by: Ian Campbell <[email protected]>

Although with the recent printk changes you might need to sprinkle some
KERN_CONT around the place.

Ian.

--
Ian Campbell

"Cable is not a luxury, since many areas have poor TV reception."
-- The mayor of Tucson, Arizona, 1989

2009-07-03 16:00:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] xen: wait up to 5 minutes for device connection

On Fri, Jul 03, 2009 at 03:12:51PM +0200, Paolo Bonzini wrote:
> Increases the device timeout from 10s to 5 minutes, giving the user a visual
> indication during that time in case there are problems. The patch is a
> backport of changesets 144, 146, 150 and 909 in the Xenbits tree.

Why would it take 5 minutes to wait for a device?

2009-07-03 21:21:56

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: [PATCH] xen: wait up to 5 minutes for device connection

On 07/03/09 18:00, Christoph Hellwig wrote:
> On Fri, Jul 03, 2009 at 03:12:51PM +0200, Paolo Bonzini wrote:
>> Increases the device timeout from 10s to 5 minutes, giving the user a visual
>> indication during that time in case there are problems. The patch is a
>> backport of changesets 144, 146, 150 and 909 in the Xenbits tree.
>
> Why would it take 5 minutes to wait for a device?

Usually it is *much* faster, but when the host is quite loaded it can
take a unusual long time. With 10 seconds it happends in practice now
and then that a virtual machine fails to boot just because the virtual
root disk didn't show up fast enough.

cheers,
Gerd

2009-07-04 15:15:59

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] xen: wait up to 5 minutes for device connection

On Fri, 03 Jul 2009 23:21:35 +0200, Gerd Hoffmann said:

> Usually it is *much* faster, but when the host is quite loaded it can
> take a unusual long time. With 10 seconds it happends in practice now
> and then that a virtual machine fails to boot just because the virtual
> root disk didn't show up fast enough.

Are people actually trying to boot a guest on a host machine so loaded that
disks take that long to show up - and expect things to work in any sane
matter?

I'm tempted to suggest that booting under conditions like that is almost
deserving of its own kernel Tainted flag. If devices aren't showing up in
a timely manner, we probably need to be leery of any other kernel timeout
values as well...


Attachments:
(No filename) (226.00 B)

2009-07-06 07:43:39

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: [PATCH] xen: wait up to 5 minutes for device connection

On 07/04/09 17:15, [email protected] wrote:
> On Fri, 03 Jul 2009 23:21:35 +0200, Gerd Hoffmann said:
>
>> Usually it is *much* faster, but when the host is quite loaded it can
>> take a unusual long time. With 10 seconds it happends in practice now
>> and then that a virtual machine fails to boot just because the virtual
>> root disk didn't show up fast enough.
>
> Are people actually trying to boot a guest on a host machine so loaded that
> disks take that long to show up - and expect things to work in any sane
> matter?

Even on machines which can handle their load just fine under normal
circumstances you may see that behavior in case of load peaks. Booting
a bunch of virtual machines at the same time can trigger it for example.

> I'm tempted to suggest that booting under conditions like that is almost
> deserving of its own kernel Tainted flag. If devices aren't showing up in
> a timely manner, we probably need to be leery of any other kernel timeout
> values as well...

It is pointless to taint just because of a unusual delay at some random
point in time. A virtual machine can see higher delays due to load
peaks on the host anytime. The flag wouldn't carry any useful information.

You could add TAINT_VIRT to flag *all* vm guests, but as the boot log
gives you that information already it is pointless too IMHO.

cheers,
Gerd

2009-07-06 22:04:20

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] xen: wait up to 5 minutes for device connection

On 07/03/09 06:12, Paolo Bonzini wrote:
> Increases the device timeout from 10s to 5 minutes, giving the user a visual
> indication during that time in case there are problems. The patch is a
> backport of changesets 144, 146, 150 and 909 in the Xenbits tree.
>

This patch does a lot more than change the timeout. Please split it
into logically distinct patches (one for each of the changesets, if that
makes sense) with proper descriptions and a note linking it back to the
source changeset.

Thanks,
J

> Cc: Jeremy Fitzhardinge <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> drivers/xen/xenbus/xenbus_probe.c | 42 +++++++++++++++++++++++++-----------
> 1 files changed, 29 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
> index d42e25d..4f69159 100644
> --- a/drivers/xen/xenbus/xenbus_probe.c
> +++ b/drivers/xen/xenbus/xenbus_probe.c
> @@ -843,7 +843,7 @@ postcore_initcall(xenbus_probe_init);
>
> MODULE_LICENSE("GPL");
>
> -static int is_disconnected_device(struct device *dev, void *data)
> +static int is_device_connecting(struct device *dev, void *data)
> {
> struct xenbus_device *xendev = to_xenbus_device(dev);
> struct device_driver *drv = data;
> @@ -861,14 +861,15 @@ static int is_disconnected_device(struct device *dev, void *data)
> return 0;
>
> xendrv = to_xenbus_driver(dev->driver);
> - return (xendev->state != XenbusStateConnected ||
> - (xendrv->is_ready && !xendrv->is_ready(xendev)));
> + return (xendev->state < XenbusStateConnected ||
> + (xendev->state == XenbusStateConnected &&
> + xendrv->is_ready && !xendrv->is_ready(xendev)));
> }
>
> -static int exists_disconnected_device(struct device_driver *drv)
> +static int exists_connecting_device(struct device_driver *drv)
> {
> return bus_for_each_dev(&xenbus_frontend.bus, NULL, drv,
> - is_disconnected_device);
> + is_device_connecting);
> }
>
> static int print_device_status(struct device *dev, void *data)
> @@ -884,10 +885,13 @@ static int print_device_status(struct device *dev, void *data)
> /* Information only: is this too noisy? */
> printk(KERN_INFO "XENBUS: Device with no driver: %s\n",
> xendev->nodename);
> - } else if (xendev->state != XenbusStateConnected) {
> + } else if (xendev->state < XenbusStateConnected) {
> + enum xenbus_state rstate = XenbusStateUnknown;
> + if (xendev->otherend)
> + rstate = xenbus_read_driver_state(xendev->otherend);
> printk(KERN_WARNING "XENBUS: Timeout connecting "
> - "to device: %s (state %d)\n",
> - xendev->nodename, xendev->state);
> + "to device: %s (local state %d, remote state %d)\n",
> + xendev->nodename, xendev->state, rstate);
> }
>
> return 0;
> @@ -897,7 +901,7 @@ static int print_device_status(struct device *dev, void *data)
> static int ready_to_wait_for_devices;
>
> /*
> - * On a 10 second timeout, wait for all devices currently configured. We need
> + * On a 5-minute timeout, wait for all devices currently configured. We need
> * to do this to guarantee that the filesystems and / or network devices
> * needed for boot are available, before we can allow the boot to proceed.
> *
> @@ -912,18 +916,30 @@ static int ready_to_wait_for_devices;
> */
> static void wait_for_devices(struct xenbus_driver *xendrv)
> {
> - unsigned long timeout = jiffies + 10*HZ;
> + unsigned long start = jiffies;
> struct device_driver *drv = xendrv ? &xendrv->driver : NULL;
> + unsigned int seconds_waited = 0;
>
> if (!ready_to_wait_for_devices || !xen_domain())
> return;
>
> - while (exists_disconnected_device(drv)) {
> - if (time_after(jiffies, timeout))
> - break;
> + while (exists_connecting_device(drv)) {
> + if (time_after(jiffies, start + (seconds_waited+5)*HZ)) {
> + if (!seconds_waited)
> + printk(KERN_WARNING "XENBUS: Waiting for "
> + "devices to initialise: ");
> + seconds_waited += 5;
> + printk("%us...", 300 - seconds_waited);
> + if (seconds_waited == 300)
> + break;
> + }
> +
> schedule_timeout_interruptible(HZ/10);
> }
>
> + if (seconds_waited)
> + printk("\n");
> +
> bus_for_each_dev(&xenbus_frontend.bus, NULL, drv,
> print_device_status);
> }
>

2009-07-08 10:27:54

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 1/3 v2] xen: fix is_disconnected_device/exists_disconnected_device

The logic of is_disconnected_device/exists_disconnected_device is wrong
in that they are used to test whether a device is trying to connect (i.e.
connecting). For this reason the patch fixes them to not consider a
Closing or Closed device to be connecting. At the same time the patch
also renames the functions according to what they really do; you could
say a closed device is "disconnected" (the old name), but not "connecting"
(the new name).

This patch is a backport of changeset 909 from the Xenbits tree.

Cc: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
The code for this series is exactly the same as for v1, just
split into three separate bisectable patches.

drivers/xen/xenbus/xenbus_probe.c | 13 +++++++------
1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index d42e25d..c543766 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -843,7 +843,7 @@ postcore_initcall(xenbus_probe_init);

MODULE_LICENSE("GPL");

-static int is_disconnected_device(struct device *dev, void *data)
+static int is_device_connecting(struct device *dev, void *data)
{
struct xenbus_device *xendev = to_xenbus_device(dev);
struct device_driver *drv = data;
@@ -861,14 +861,15 @@ static int is_disconnected_device(struct device *dev, void *data)
return 0;

xendrv = to_xenbus_driver(dev->driver);
- return (xendev->state != XenbusStateConnected ||
- (xendrv->is_ready && !xendrv->is_ready(xendev)));
+ return (xendev->state < XenbusStateConnected ||
+ (xendev->state == XenbusStateConnected &&
+ xendrv->is_ready && !xendrv->is_ready(xendev)));
}

-static int exists_disconnected_device(struct device_driver *drv)
+static int exists_connecting_device(struct device_driver *drv)
{
return bus_for_each_dev(&xenbus_frontend.bus, NULL, drv,
- is_disconnected_device);
+ is_device_connecting);
}

static int print_device_status(struct device *dev, void *data)
@@ -918,7 +919,7 @@ static void wait_for_devices(struct xenbus_driver *xendrv)
if (!ready_to_wait_for_devices || !xen_domain())
return;

- while (exists_disconnected_device(drv)) {
+ while (exists_connecting_device(drv)) {
if (time_after(jiffies, timeout))
break;
schedule_timeout_interruptible(HZ/10);
--
1.6.2.5

2009-07-08 10:28:08

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 2/3 v2] xen: improvement to wait_for_devices()

When printing a warning about a timed-out device, print the
current state of both ends of the device connection (i.e., backend as
well as frontend). This backports half of changeset 146 from the
Xenbits tree.

Cc: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
The other half of changeset 146 is buggy and hence superseded
by patch 1/3.

drivers/xen/xenbus/xenbus_probe.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index c543766..3a867a5 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -885,10 +885,13 @@ static int print_device_status(struct device *dev, void *data)
/* Information only: is this too noisy? */
printk(KERN_INFO "XENBUS: Device with no driver: %s\n",
xendev->nodename);
- } else if (xendev->state != XenbusStateConnected) {
+ } else if (xendev->state < XenbusStateConnected) {
+ enum xenbus_state rstate = XenbusStateUnknown;
+ if (xendev->otherend)
+ rstate = xenbus_read_driver_state(xendev->otherend);
printk(KERN_WARNING "XENBUS: Timeout connecting "
- "to device: %s (state %d)\n",
- xendev->nodename, xendev->state);
+ "to device: %s (local state %d, remote state %d)\n",
+ xendev->nodename, xendev->state, rstate);
}

return 0;
--
1.6.2.5

2009-07-08 10:28:25

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 3/3 v2] xen: wait up to 5 minutes for device connetion

Increases the device timeout from 10s to 5 minutes, giving the user a
visual indication during that time in case there are problems. The patch
is a backport of changesets 144 and 150 in the Xenbits tree.

Cc: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
Changeset 144 introduced this logic but chose to use only 30
seconds for the timeout. Changeset 150 simply bumped the
timeout to 5 minutes. Separating them does not make sense.

drivers/xen/xenbus/xenbus_probe.c | 20 ++++++++++++++++----
1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3a867a5..4f69159 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -901,7 +901,7 @@ static int print_device_status(struct device *dev, void *data)
static int ready_to_wait_for_devices;

/*
- * On a 10 second timeout, wait for all devices currently configured. We need
+ * On a 5-minute timeout, wait for all devices currently configured. We need
* to do this to guarantee that the filesystems and / or network devices
* needed for boot are available, before we can allow the boot to proceed.
*
@@ -916,18 +916,30 @@ static int ready_to_wait_for_devices;
*/
static void wait_for_devices(struct xenbus_driver *xendrv)
{
- unsigned long timeout = jiffies + 10*HZ;
+ unsigned long start = jiffies;
struct device_driver *drv = xendrv ? &xendrv->driver : NULL;
+ unsigned int seconds_waited = 0;

if (!ready_to_wait_for_devices || !xen_domain())
return;

while (exists_connecting_device(drv)) {
- if (time_after(jiffies, timeout))
- break;
+ if (time_after(jiffies, start + (seconds_waited+5)*HZ)) {
+ if (!seconds_waited)
+ printk(KERN_WARNING "XENBUS: Waiting for "
+ "devices to initialise: ");
+ seconds_waited += 5;
+ printk("%us...", 300 - seconds_waited);
+ if (seconds_waited == 300)
+ break;
+ }
+
schedule_timeout_interruptible(HZ/10);
}

+ if (seconds_waited)
+ printk("\n");
+
bus_for_each_dev(&xenbus_frontend.bus, NULL, drv,
print_device_status);
}
--
1.6.2.5