2015-04-08 18:44:59

by Boris Ostrovsky

[permalink] [raw]
Subject: [PATCH] xen: Suspend ticks on all CPUs during suspend

Commit 77e32c89a711 ("clockevents: Manage device's state separately for
the core") decouples clockevent device's modes from states. With this
change when a Xen guest tries to resume, it won't be calling its
set_mode op which needs to be done on each VCPU in order to make the
hypervisor aware that we are in oneshot mode.

This happens because clockevents_tick_resume() (which is an intermediate
step of resuming ticks on a processor) no longer calls clockevents_set_state()
and because during suspend clockevent devices on all VCPUs (except for the
one doing the suspend) are left in ONESHOT state. As result, during resume
the clockevents state machine will assume that device is already where it
should be and doesn't need to be updated.

To avoid this problem we should suspend ticks on all VCPUs during
suspend.

Signed-off-by: Boris Ostrovsky <[email protected]>
---

Note that this patch is against tip. The commit mentioned above as well as
tick_suspend_local() referenced below don't exist in mainline yet.

arch/x86/xen/suspend.c | 10 ++++++++++
drivers/xen/manage.c | 2 ++
include/xen/xen-ops.h | 1 +
3 files changed, 13 insertions(+)

diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index d949769..53b4c08 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -88,7 +88,17 @@ static void xen_vcpu_notify_restore(void *data)
tick_resume_local();
}

+static void xen_vcpu_notify_suspend(void *data)
+{
+ tick_suspend_local();
+}
+
void xen_arch_resume(void)
{
on_each_cpu(xen_vcpu_notify_restore, NULL, 1);
}
+
+void xen_arch_suspend(void)
+{
+ on_each_cpu(xen_vcpu_notify_suspend, NULL, 1);
+}
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index bf19407..2fd9fe8 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -131,6 +131,8 @@ static void do_suspend(void)
goto out_resume;
}

+ xen_arch_suspend();
+
si.cancelled = 1;

err = stop_machine(xen_suspend, &si, cpumask_of(0));
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 8333821..f47bfe7 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -13,6 +13,7 @@ void xen_arch_post_suspend(int suspend_cancelled);

void xen_timer_resume(void);
void xen_arch_resume(void);
+void xen_arch_suspend(void);

void xen_resume_notifier_register(struct notifier_block *nb);
void xen_resume_notifier_unregister(struct notifier_block *nb);
--
1.9.3


2015-04-27 10:33:26

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: Suspend ticks on all CPUs during suspend

On 08/04/15 19:53, Boris Ostrovsky wrote:
> Commit 77e32c89a711 ("clockevents: Manage device's state separately for
> the core") decouples clockevent device's modes from states. With this
> change when a Xen guest tries to resume, it won't be calling its
> set_mode op which needs to be done on each VCPU in order to make the
> hypervisor aware that we are in oneshot mode.
>
> This happens because clockevents_tick_resume() (which is an intermediate
> step of resuming ticks on a processor) no longer calls clockevents_set_state()
> and because during suspend clockevent devices on all VCPUs (except for the
> one doing the suspend) are left in ONESHOT state. As result, during resume
> the clockevents state machine will assume that device is already where it
> should be and doesn't need to be updated.
>
> To avoid this problem we should suspend ticks on all VCPUs during
> suspend.

Sorry for the delay in reviewing this.

> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
> index bf19407..2fd9fe8 100644
> --- a/drivers/xen/manage.c
> +++ b/drivers/xen/manage.c
> @@ -131,6 +131,8 @@ static void do_suspend(void)
> goto out_resume;
> }
>
> + xen_arch_suspend();
> +
> si.cancelled = 1;

xen_arch_resume() is only called when !si.cancelled but you call
xen_arch_suspend() unconditionally.

David

2015-04-27 14:31:34

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: Suspend ticks on all CPUs during suspend

On 04/27/2015 06:33 AM, David Vrabel wrote:
> On 08/04/15 19:53, Boris Ostrovsky wrote:
>> Commit 77e32c89a711 ("clockevents: Manage device's state separately for
>> the core") decouples clockevent device's modes from states. With this
>> change when a Xen guest tries to resume, it won't be calling its
>> set_mode op which needs to be done on each VCPU in order to make the
>> hypervisor aware that we are in oneshot mode.
>>
>> This happens because clockevents_tick_resume() (which is an intermediate
>> step of resuming ticks on a processor) no longer calls clockevents_set_state()
>> and because during suspend clockevent devices on all VCPUs (except for the
>> one doing the suspend) are left in ONESHOT state. As result, during resume
>> the clockevents state machine will assume that device is already where it
>> should be and doesn't need to be updated.
>>
>> To avoid this problem we should suspend ticks on all VCPUs during
>> suspend.
> Sorry for the delay in reviewing this.
>
>> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
>> index bf19407..2fd9fe8 100644
>> --- a/drivers/xen/manage.c
>> +++ b/drivers/xen/manage.c
>> @@ -131,6 +131,8 @@ static void do_suspend(void)
>> goto out_resume;
>> }
>>
>> + xen_arch_suspend();
>> +
>> si.cancelled = 1;
> xen_arch_resume() is only called when !si.cancelled but you call
> xen_arch_suspend() unconditionally.


Good point. Let me see if I can move this to xen_arch_post_suspend, when
we know whether the suspend has been canceled.

-boris