2013-05-01 19:27:17

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 0/3] xen/arm: account for stolen ticks

Hi all,
this small patch series introduces stolen ticks accounting for Xen on
ARM. Stolen ticks are clocksource ticks that have been "stolen" from the
cpu, typically because Linux is running in a virtual machine and the
vcpu has been descheduled.


Stefano Stabellini (3):
arm_arch_timer: introduce arch_timer_stolen_ticks
xen: move do_stolen_accounting to drivers/xen/time.c
xen/arm: account for stolen ticks

arch/arm/include/asm/arch_timer.h | 5 +
arch/arm/xen/enlighten.c | 4 +
arch/x86/xen/time.c | 127 +------------------------------
drivers/clocksource/arm_arch_timer.c | 6 ++
drivers/xen/Makefile | 2 +-
drivers/xen/time.c | 142 ++++++++++++++++++++++++++++++++++
include/xen/xen-ops.h | 4 +
7 files changed, 163 insertions(+), 127 deletions(-)
create mode 100644 drivers/xen/time.c


git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git lost_ticks_1


Cheers,

Stefano


2013-05-01 19:27:59

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

Introduce a function, called arch_timer_stolen_ticks, called from the
arch_timer interrupt handler to account for stolen ticks.

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
---
arch/arm/include/asm/arch_timer.h | 5 +++++
drivers/clocksource/arm_arch_timer.c | 6 ++++++
2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
index 7ade91d..30db413 100644
--- a/arch/arm/include/asm/arch_timer.h
+++ b/arch/arm/include/asm/arch_timer.h
@@ -13,6 +13,11 @@
int arch_timer_of_register(void);
int arch_timer_sched_clock_init(void);

+/* per-platform function to calculate stolen ticks (clock cycles stolen
+ * to the vcpu by the hypervisor).
+ */
+extern void (*arch_timer_stolen_ticks)(void);
+
/*
* These register accessors are marked inline so the compiler can
* nicely work out which register we want, and chuck away the rest of
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index d7ad425..d1ddf0b 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -37,6 +37,8 @@ static int arch_timer_ppi[MAX_TIMER_PPI];

static struct clock_event_device __percpu *arch_timer_evt;

+void (*arch_timer_stolen_ticks)(void);
+
static bool arch_timer_use_virtual = true;

/*
@@ -52,6 +54,10 @@ static inline irqreturn_t timer_handler(const int access,
ctrl |= ARCH_TIMER_CTRL_IT_MASK;
arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl);
evt->event_handler(evt);
+
+ if ( arch_timer_stolen_ticks != NULL )
+ arch_timer_stolen_ticks();
+
return IRQ_HANDLED;
}

--
1.7.2.5

2013-05-01 19:28:05

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 3/3] xen/arm: account for stolen ticks

Register the runstate_memory_area with the hypervisor.
Use arch_timer_stolen_ticks and xen_stolen_accounting to account for
stolen ticks.

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/arm/xen/enlighten.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index d30042e..8f040a0 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -14,6 +14,7 @@
#include <xen/xen-ops.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
+#include <asm/arch_timer.h>
#include <asm/system_misc.h>
#include <linux/interrupt.h>
#include <linux/irqreturn.h>
@@ -172,6 +173,8 @@ static int __init xen_secondary_init(unsigned int cpu)
later ones fail to. */
per_cpu(xen_vcpu, cpu) = vcpup;
}
+
+ xen_setup_runstate_info(cpu);
return 0;
}

@@ -272,6 +275,7 @@ static int __init xen_guest_init(void)
if (!xen_initial_domain())
xenbus_probe(NULL);

+ arch_timer_stolen_ticks = xen_stolen_accounting;
pm_power_off = xen_power_off;
arm_pm_restart = xen_restart;

--
1.7.2.5

2013-05-01 19:28:25

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 2/3] xen: move do_stolen_accounting to drivers/xen/time.c

Move do_stolen_accounting, xen_vcpu_stolen and related functions and
static variables to common code (drivers/xen/time.c).

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/x86/xen/time.c | 127 +-------------------------------------------
drivers/xen/Makefile | 2 +-
drivers/xen/time.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++
include/xen/xen-ops.h | 4 ++
4 files changed, 148 insertions(+), 127 deletions(-)
create mode 100644 drivers/xen/time.c

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 0296a95..278b7c2 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -28,131 +28,6 @@

/* Xen may fire a timer up to this many ns early */
#define TIMER_SLOP 100000
-#define NS_PER_TICK (1000000000LL / HZ)
-
-/* runstate info updated by Xen */
-static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate);
-
-/* snapshots of runstate info */
-static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate_snapshot);
-
-/* unused ns of stolen and blocked time */
-static DEFINE_PER_CPU(u64, xen_residual_stolen);
-static DEFINE_PER_CPU(u64, xen_residual_blocked);
-
-/* return an consistent snapshot of 64-bit time/counter value */
-static u64 get64(const u64 *p)
-{
- u64 ret;
-
- if (BITS_PER_LONG < 64) {
- u32 *p32 = (u32 *)p;
- u32 h, l;
-
- /*
- * Read high then low, and then make sure high is
- * still the same; this will only loop if low wraps
- * and carries into high.
- * XXX some clean way to make this endian-proof?
- */
- do {
- h = p32[1];
- barrier();
- l = p32[0];
- barrier();
- } while (p32[1] != h);
-
- ret = (((u64)h) << 32) | l;
- } else
- ret = *p;
-
- return ret;
-}
-
-/*
- * Runstate accounting
- */
-static void get_runstate_snapshot(struct vcpu_runstate_info *res)
-{
- u64 state_time;
- struct vcpu_runstate_info *state;
-
- BUG_ON(preemptible());
-
- state = &__get_cpu_var(xen_runstate);
-
- /*
- * The runstate info is always updated by the hypervisor on
- * the current CPU, so there's no need to use anything
- * stronger than a compiler barrier when fetching it.
- */
- do {
- state_time = get64(&state->state_entry_time);
- barrier();
- *res = *state;
- barrier();
- } while (get64(&state->state_entry_time) != state_time);
-}
-
-/* return true when a vcpu could run but has no real cpu to run on */
-bool xen_vcpu_stolen(int vcpu)
-{
- return per_cpu(xen_runstate, vcpu).state == RUNSTATE_runnable;
-}
-
-void xen_setup_runstate_info(int cpu)
-{
- struct vcpu_register_runstate_memory_area area;
-
- area.addr.v = &per_cpu(xen_runstate, cpu);
-
- if (HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area,
- cpu, &area))
- BUG();
-}
-
-static void do_stolen_accounting(void)
-{
- struct vcpu_runstate_info state;
- struct vcpu_runstate_info *snap;
- s64 blocked, runnable, offline, stolen;
- cputime_t ticks;
-
- get_runstate_snapshot(&state);
-
- WARN_ON(state.state != RUNSTATE_running);
-
- snap = &__get_cpu_var(xen_runstate_snapshot);
-
- /* work out how much time the VCPU has not been runn*ing* */
- blocked = state.time[RUNSTATE_blocked] - snap->time[RUNSTATE_blocked];
- runnable = state.time[RUNSTATE_runnable] - snap->time[RUNSTATE_runnable];
- offline = state.time[RUNSTATE_offline] - snap->time[RUNSTATE_offline];
-
- *snap = state;
-
- /* Add the appropriate number of ticks of stolen time,
- including any left-overs from last time. */
- stolen = runnable + offline + __this_cpu_read(xen_residual_stolen);
-
- if (stolen < 0)
- stolen = 0;
-
- ticks = iter_div_u64_rem(stolen, NS_PER_TICK, &stolen);
- __this_cpu_write(xen_residual_stolen, stolen);
- account_steal_ticks(ticks);
-
- /* Add the appropriate number of ticks of blocked time,
- including any left-overs from last time. */
- blocked += __this_cpu_read(xen_residual_blocked);
-
- if (blocked < 0)
- blocked = 0;
-
- ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked);
- __this_cpu_write(xen_residual_blocked, blocked);
- account_idle_ticks(ticks);
-}

/* Get the TSC speed from Xen */
static unsigned long xen_tsc_khz(void)
@@ -390,7 +265,7 @@ static irqreturn_t xen_timer_interrupt(int irq, void *dev_id)
ret = IRQ_HANDLED;
}

- do_stolen_accounting();
+ xen_stolen_accounting();

return ret;
}
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index eabd0ee..2bf461a 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -3,7 +3,7 @@ obj-y += manage.o
obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
endif
obj-$(CONFIG_X86) += fallback.o
-obj-y += grant-table.o features.o events.o balloon.o
+obj-y += grant-table.o features.o events.o balloon.o time.o
obj-y += xenbus/

nostackp := $(call cc-option, -fno-stack-protector)
diff --git a/drivers/xen/time.c b/drivers/xen/time.c
new file mode 100644
index 0000000..d00ed24
--- /dev/null
+++ b/drivers/xen/time.c
@@ -0,0 +1,142 @@
+/*
+ * Xen stolen ticks accounting.
+ */
+#include <linux/kernel.h>
+#include <linux/kernel_stat.h>
+#include <linux/math64.h>
+#include <linux/gfp.h>
+
+#include <asm/xen/hypervisor.h>
+#include <asm/xen/hypercall.h>
+
+#include <xen/events.h>
+#include <xen/features.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/vcpu.h>
+#include <xen/xen-ops.h>
+
+#define NS_PER_TICK (1000000000LL / HZ)
+
+/* runstate info updated by Xen */
+static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate);
+
+/* snapshots of runstate info */
+static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate_snapshot);
+
+/* unused ns of stolen and blocked time */
+static DEFINE_PER_CPU(u64, xen_residual_stolen);
+static DEFINE_PER_CPU(u64, xen_residual_blocked);
+
+/* return an consistent snapshot of 64-bit time/counter value */
+static u64 get64(const u64 *p)
+{
+ u64 ret;
+
+ if (BITS_PER_LONG < 64) {
+ u32 *p32 = (u32 *)p;
+ u32 h, l;
+
+ /*
+ * Read high then low, and then make sure high is
+ * still the same; this will only loop if low wraps
+ * and carries into high.
+ * XXX some clean way to make this endian-proof?
+ */
+ do {
+ h = p32[1];
+ barrier();
+ l = p32[0];
+ barrier();
+ } while (p32[1] != h);
+
+ ret = (((u64)h) << 32) | l;
+ } else
+ ret = *p;
+
+ return ret;
+}
+
+/*
+ * Runstate accounting
+ */
+static void get_runstate_snapshot(struct vcpu_runstate_info *res)
+{
+ u64 state_time;
+ struct vcpu_runstate_info *state;
+
+ BUG_ON(preemptible());
+
+ state = &__get_cpu_var(xen_runstate);
+
+ /*
+ * The runstate info is always updated by the hypervisor on
+ * the current CPU, so there's no need to use anything
+ * stronger than a compiler barrier when fetching it.
+ */
+ do {
+ state_time = get64(&state->state_entry_time);
+ barrier();
+ *res = *state;
+ barrier();
+ } while (get64(&state->state_entry_time) != state_time);
+}
+
+/* return true when a vcpu could run but has no real cpu to run on */
+bool xen_vcpu_stolen(int vcpu)
+{
+ return per_cpu(xen_runstate, vcpu).state == RUNSTATE_runnable;
+}
+
+void xen_setup_runstate_info(int cpu)
+{
+ struct vcpu_register_runstate_memory_area area;
+
+ area.addr.v = &per_cpu(xen_runstate, cpu);
+
+ if (HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area,
+ cpu, &area))
+ BUG();
+}
+
+void xen_stolen_accounting(void)
+{
+ struct vcpu_runstate_info state;
+ struct vcpu_runstate_info *snap;
+ s64 blocked, runnable, offline, stolen;
+ cputime_t ticks;
+
+ get_runstate_snapshot(&state);
+
+ WARN_ON(state.state != RUNSTATE_running);
+
+ snap = &__get_cpu_var(xen_runstate_snapshot);
+
+ /* work out how much time the VCPU has not been runn*ing* */
+ blocked = state.time[RUNSTATE_blocked] - snap->time[RUNSTATE_blocked];
+ runnable = state.time[RUNSTATE_runnable] - snap->time[RUNSTATE_runnable];
+ offline = state.time[RUNSTATE_offline] - snap->time[RUNSTATE_offline];
+
+ *snap = state;
+
+ /* Add the appropriate number of ticks of stolen time,
+ including any left-overs from last time. */
+ stolen = runnable + offline + __this_cpu_read(xen_residual_stolen);
+
+ if (stolen < 0)
+ stolen = 0;
+
+ ticks = iter_div_u64_rem(stolen, NS_PER_TICK, &stolen);
+ __this_cpu_write(xen_residual_stolen, stolen);
+ account_steal_ticks(ticks);
+
+ /* Add the appropriate number of ticks of blocked time,
+ including any left-overs from last time. */
+ blocked += __this_cpu_read(xen_residual_blocked);
+
+ if (blocked < 0)
+ blocked = 0;
+
+ ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked);
+ __this_cpu_write(xen_residual_blocked, blocked);
+ account_idle_ticks(ticks);
+}
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index d6fe062..85bb673 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -16,6 +16,10 @@ void xen_mm_unpin_all(void);
void xen_timer_resume(void);
void xen_arch_resume(void);

+bool xen_vcpu_stolen(int vcpu);
+void xen_setup_runstate_info(int cpu);
+void xen_stolen_accounting(void);
+
int xen_setup_shutdown_event(void);

extern unsigned long *xen_contiguous_bitmap;
--
1.7.2.5

2013-05-01 20:36:19

by Christopher Covington

[permalink] [raw]
Subject: Re: [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

Hi Stefano,

On 05/01/2013 03:27 PM, Stefano Stabellini wrote:
> Introduce a function, called arch_timer_stolen_ticks, called from the
> arch_timer interrupt handler to account for stolen ticks.

[...]

> diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
> index 7ade91d..30db413 100644
> --- a/arch/arm/include/asm/arch_timer.h
> +++ b/arch/arm/include/asm/arch_timer.h
> @@ -13,6 +13,11 @@
> int arch_timer_of_register(void);
> int arch_timer_sched_clock_init(void);
>
> +/* per-platform function to calculate stolen ticks (clock cycles stolen
> + * to the vcpu by the hypervisor).

Stolen from the vcpu by the hypervisor?

Is the hypervisor adjusting the Virtual Offset Register?

[...]

Thanks,
Christopher

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

2013-05-02 08:19:19

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 2/3] xen: move do_stolen_accounting to drivers/xen/time.c

On Wed, 2013-05-01 at 20:27 +0100, Stefano Stabellini wrote:
> Move do_stolen_accounting, xen_vcpu_stolen and related functions and
> static variables to common code (drivers/xen/time.c).
>
> Signed-off-by: Stefano Stabellini <[email protected]>

Acked-by: Ian Campbell <[email protected]>

> + /* work out how much time the VCPU has not been runn*ing* */

I wonder what this emphasis on runn*ing* means?

Ian.

2013-05-02 08:19:47

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

On Wed, 2013-05-01 at 21:36 +0100, Christopher Covington wrote:
> Hi Stefano,
>
> On 05/01/2013 03:27 PM, Stefano Stabellini wrote:
> > Introduce a function, called arch_timer_stolen_ticks, called from the
> > arch_timer interrupt handler to account for stolen ticks.
>
> [...]
>
> > diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
> > index 7ade91d..30db413 100644
> > --- a/arch/arm/include/asm/arch_timer.h
> > +++ b/arch/arm/include/asm/arch_timer.h
> > @@ -13,6 +13,11 @@
> > int arch_timer_of_register(void);
> > int arch_timer_sched_clock_init(void);
> >
> > +/* per-platform function to calculate stolen ticks (clock cycles stolen
> > + * to the vcpu by the hypervisor).
>
> Stolen from the vcpu by the hypervisor?

Stolen is time where the VCPU wants to be running bit isn't because the
hypervisor has descheduled it, e.g. because another VCPU is being run.
So yes, Stefano meant "from".

> Is the hypervisor adjusting the Virtual Offset Register?

The virtual offset register is useful when a VCPU is migrated to another
system to account for the differences in physical time on the two hosts
but isn't useful for accounting for stolen time while running on a
single host.

e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
where the timer goes off. Nor does it make sense to require that the
guest actually be running for 5 before injecting the timer because that
would mean real time elapsed time for the timer would be 5+3 in the case
where 3 are stolen.

So the virtual timer should appear to have been running even while time
is being stolen and therefore stolen time needs to be accounted via some
other means.

Ian.

2013-05-02 08:21:07

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 3/3] xen/arm: account for stolen ticks

On Wed, 2013-05-01 at 20:27 +0100, Stefano Stabellini wrote:
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index d30042e..8f040a0 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -14,6 +14,7 @@
> #include <xen/xen-ops.h>
> #include <asm/xen/hypervisor.h>
> #include <asm/xen/hypercall.h>
> +#include <asm/arch_timer.h>
> #include <asm/system_misc.h>
> #include <linux/interrupt.h>
> #include <linux/irqreturn.h>
> @@ -172,6 +173,8 @@ static int __init xen_secondary_init(unsigned int cpu)
> later ones fail to. */
> per_cpu(xen_vcpu, cpu) = vcpup;
> }
> +
> + xen_setup_runstate_info(cpu);

Is this called for the boot processor too?

(do you have a linux-next branch where I can easily find the baseline
for this series?)

Ian.

2013-05-02 10:38:19

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 3/3] xen/arm: account for stolen ticks

On Thu, 2 May 2013, Ian Campbell wrote:
> On Wed, 2013-05-01 at 20:27 +0100, Stefano Stabellini wrote:
> > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > index d30042e..8f040a0 100644
> > --- a/arch/arm/xen/enlighten.c
> > +++ b/arch/arm/xen/enlighten.c
> > @@ -14,6 +14,7 @@
> > #include <xen/xen-ops.h>
> > #include <asm/xen/hypervisor.h>
> > #include <asm/xen/hypercall.h>
> > +#include <asm/arch_timer.h>
> > #include <asm/system_misc.h>
> > #include <linux/interrupt.h>
> > #include <linux/irqreturn.h>
> > @@ -172,6 +173,8 @@ static int __init xen_secondary_init(unsigned int cpu)
> > later ones fail to. */
> > per_cpu(xen_vcpu, cpu) = vcpup;
> > }
> > +
> > + xen_setup_runstate_info(cpu);
>
> Is this called for the boot processor too?

Yep.

> (do you have a linux-next branch where I can easily find the baseline
> for this series?)

git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git lost_ticks_1

It's based on my 3.9-rc3-smp-6 branch currently in linux-next (that is
based on 3.9-rc3).
However lost_ticks_1 is not in linux-next yet.

2013-05-02 10:48:32

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 3/3] xen/arm: account for stolen ticks

On Thu, 2013-05-02 at 11:38 +0100, Stefano Stabellini wrote:
> On Thu, 2 May 2013, Ian Campbell wrote:
> > On Wed, 2013-05-01 at 20:27 +0100, Stefano Stabellini wrote:
> > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > index d30042e..8f040a0 100644
> > > --- a/arch/arm/xen/enlighten.c
> > > +++ b/arch/arm/xen/enlighten.c
> > > @@ -14,6 +14,7 @@
> > > #include <xen/xen-ops.h>
> > > #include <asm/xen/hypervisor.h>
> > > #include <asm/xen/hypercall.h>
> > > +#include <asm/arch_timer.h>
> > > #include <asm/system_misc.h>
> > > #include <linux/interrupt.h>
> > > #include <linux/irqreturn.h>
> > > @@ -172,6 +173,8 @@ static int __init xen_secondary_init(unsigned int cpu)
> > > later ones fail to. */
> > > per_cpu(xen_vcpu, cpu) = vcpup;
> > > }
> > > +
> > > + xen_setup_runstate_info(cpu);
> >
> > Is this called for the boot processor too?
>
> Yep.
>
> > (do you have a linux-next branch where I can easily find the baseline
> > for this series?)
>
> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git lost_ticks_1

Is that your Xen tree, I was after the Linux one...

> It's based on my 3.9-rc3-smp-6 branch currently in linux-next (that is
> based on 3.9-rc3).
> However lost_ticks_1 is not in linux-next yet.

Which branch is in Linux next? It doesn't seem to be your linux-next
(last commit 2011).

Ian.

2013-05-02 10:54:25

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 3/3] xen/arm: account for stolen ticks

On Thu, 2 May 2013, Ian Campbell wrote:
> On Thu, 2013-05-02 at 11:38 +0100, Stefano Stabellini wrote:
> > On Thu, 2 May 2013, Ian Campbell wrote:
> > > On Wed, 2013-05-01 at 20:27 +0100, Stefano Stabellini wrote:
> > > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > > index d30042e..8f040a0 100644
> > > > --- a/arch/arm/xen/enlighten.c
> > > > +++ b/arch/arm/xen/enlighten.c
> > > > @@ -14,6 +14,7 @@
> > > > #include <xen/xen-ops.h>
> > > > #include <asm/xen/hypervisor.h>
> > > > #include <asm/xen/hypercall.h>
> > > > +#include <asm/arch_timer.h>
> > > > #include <asm/system_misc.h>
> > > > #include <linux/interrupt.h>
> > > > #include <linux/irqreturn.h>
> > > > @@ -172,6 +173,8 @@ static int __init xen_secondary_init(unsigned int cpu)
> > > > later ones fail to. */
> > > > per_cpu(xen_vcpu, cpu) = vcpup;
> > > > }
> > > > +
> > > > + xen_setup_runstate_info(cpu);
> > >
> > > Is this called for the boot processor too?
> >
> > Yep.
> >
> > > (do you have a linux-next branch where I can easily find the baseline
> > > for this series?)
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git lost_ticks_1
>
> Is that your Xen tree, I was after the Linux one...

nope, that's my Linux tree disguised as a Xen tree to confuse the
maintainers ;-)

> > It's based on my 3.9-rc3-smp-6 branch currently in linux-next (that is
> > based on 3.9-rc3).
> > However lost_ticks_1 is not in linux-next yet.
>
> Which branch is in Linux next? It doesn't seem to be your linux-next
> (last commit 2011).

that would be linux-next from
git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git

2013-05-02 11:02:16

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 3/3] xen/arm: account for stolen ticks

On Thu, 2013-05-02 at 11:54 +0100, Stefano Stabellini wrote:
> On Thu, 2 May 2013, Ian Campbell wrote:
> > On Thu, 2013-05-02 at 11:38 +0100, Stefano Stabellini wrote:
> > > On Thu, 2 May 2013, Ian Campbell wrote:
> > > > On Wed, 2013-05-01 at 20:27 +0100, Stefano Stabellini wrote:
> > > > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > > > > index d30042e..8f040a0 100644
> > > > > --- a/arch/arm/xen/enlighten.c
> > > > > +++ b/arch/arm/xen/enlighten.c
> > > > > @@ -14,6 +14,7 @@
> > > > > #include <xen/xen-ops.h>
> > > > > #include <asm/xen/hypervisor.h>
> > > > > #include <asm/xen/hypercall.h>
> > > > > +#include <asm/arch_timer.h>
> > > > > #include <asm/system_misc.h>
> > > > > #include <linux/interrupt.h>
> > > > > #include <linux/irqreturn.h>
> > > > > @@ -172,6 +173,8 @@ static int __init xen_secondary_init(unsigned int cpu)
> > > > > later ones fail to. */
> > > > > per_cpu(xen_vcpu, cpu) = vcpup;
> > > > > }
> > > > > +
> > > > > + xen_setup_runstate_info(cpu);
> > > >
> > > > Is this called for the boot processor too?
> > >
> > > Yep.
> > >
> > > > (do you have a linux-next branch where I can easily find the baseline
> > > > for this series?)
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git lost_ticks_1
> >
> > Is that your Xen tree, I was after the Linux one...
>
> nope, that's my Linux tree disguised as a Xen tree to confuse the
> maintainers ;-)

;-)

> > > It's based on my 3.9-rc3-smp-6 branch currently in linux-next (that is
> > > based on 3.9-rc3).
> > > However lost_ticks_1 is not in linux-next yet.
> >
> > Which branch is in Linux next? It doesn't seem to be your linux-next
> > (last commit 2011).
>
> that would be linux-next from
> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git

Ah, I was looking in your xenbits tree -- is that deprecated?

Ian.

2013-05-02 11:05:21

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 3/3] xen/arm: account for stolen ticks

On Thu, 2 May 2013, Ian Campbell wrote:
> > > > It's based on my 3.9-rc3-smp-6 branch currently in linux-next (that is
> > > > based on 3.9-rc3).
> > > > However lost_ticks_1 is not in linux-next yet.
> > >
> > > Which branch is in Linux next? It doesn't seem to be your linux-next
> > > (last commit 2011).
> >
> > that would be linux-next from
> > git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
>
> Ah, I was looking in your xenbits tree -- is that deprecated?

I guess it is, now that I have a kernel.org account.
I might as well get rid of it.

2013-05-02 18:49:25

by Christopher Covington

[permalink] [raw]
Subject: Re: [PATCH 2/3] xen: move do_stolen_accounting to drivers/xen/time.c

Hi Stefano,

On 05/01/2013 03:27 PM, Stefano Stabellini wrote:
> Move do_stolen_accounting, xen_vcpu_stolen and related functions and
> static variables to common code (drivers/xen/time.c).
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> arch/x86/xen/time.c | 127 +-------------------------------------------

In going through some of the code, it looked like there might be a few little
bits that you might be interested in pulling out of arch/ia64/xen/time.c as well.

[...]

Regards,
Christopher

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

2013-05-02 21:33:33

by Christopher Covington

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

Hi Ian,

On 05/02/2013 04:19 AM, Ian Campbell wrote:
> On Wed, 2013-05-01 at 21:36 +0100, Christopher Covington wrote:
>> Hi Stefano,
>>
>> On 05/01/2013 03:27 PM, Stefano Stabellini wrote:
>>> Introduce a function, called arch_timer_stolen_ticks, called from the
>>> arch_timer interrupt handler to account for stolen ticks.
>>
>> [...]
>>
>>> diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h
>>> index 7ade91d..30db413 100644
>>> --- a/arch/arm/include/asm/arch_timer.h
>>> +++ b/arch/arm/include/asm/arch_timer.h
>>> @@ -13,6 +13,11 @@
>>> int arch_timer_of_register(void);
>>> int arch_timer_sched_clock_init(void);
>>>
>>> +/* per-platform function to calculate stolen ticks (clock cycles stolen
>>> + * to the vcpu by the hypervisor).

[...]

>> Is the hypervisor adjusting the Virtual Offset Register?
>
> The virtual offset register is useful when a VCPU is migrated to another
> system to account for the differences in physical time on the two hosts
> but isn't useful for accounting for stolen time while running on a
> single host.
>
> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
> where the timer goes off. Nor does it make sense to require that the
> guest actually be running for 5 before injecting the timer because that
> would mean real time elapsed time for the timer would be 5+3 in the case
> where 3 are stolen.

This is a bit of an aside, but I think that hiding time spent at higher
privilege levels can be a quite sensible approach to timekeeping in a
virtualized environment, but I understand that it's not the approach taken
with Xen, and as you pointed out above, adjusting the Virtual Offset Register
by itself isn't enough to implement that approach.

> So the virtual timer should appear to have been running even while time
> is being stolen and therefore stolen time needs to be accounted via some
> other means.

Something that's not currently obvious to me is that given that the stolen
cycle accounting should be done, what makes the architected timer interrupt
handler the ideal place to do it?

Thanks,
Christopher

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

2013-05-03 08:26:51

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 2/3] xen: move do_stolen_accounting to drivers/xen/time.c

On Thu, 2013-05-02 at 19:49 +0100, Christopher Covington wrote:
> Hi Stefano,
>
> On 05/01/2013 03:27 PM, Stefano Stabellini wrote:
> > Move do_stolen_accounting, xen_vcpu_stolen and related functions and
> > static variables to common code (drivers/xen/time.c).
> >
> > Signed-off-by: Stefano Stabellini <[email protected]>
> > ---
> > arch/x86/xen/time.c | 127 +-------------------------------------------
>
> In going through some of the code, it looked like there might be a few little
> bits that you might be interested in pulling out of arch/ia64/xen/time.c as well.

ia64 in the hypervisor was deprecated and with the 4.3 release has been
removed.

AFAIK ia64/Xen support has been broken in the mainline kernel for year.

Ian

2013-05-03 10:29:23

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 2/3] xen: move do_stolen_accounting to drivers/xen/time.c

On Fri, 3 May 2013, Ian Campbell wrote:
> On Thu, 2013-05-02 at 19:49 +0100, Christopher Covington wrote:
> > Hi Stefano,
> >
> > On 05/01/2013 03:27 PM, Stefano Stabellini wrote:
> > > Move do_stolen_accounting, xen_vcpu_stolen and related functions and
> > > static variables to common code (drivers/xen/time.c).
> > >
> > > Signed-off-by: Stefano Stabellini <[email protected]>
> > > ---
> > > arch/x86/xen/time.c | 127 +-------------------------------------------
> >
> > In going through some of the code, it looked like there might be a few little
> > bits that you might be interested in pulling out of arch/ia64/xen/time.c as well.
>
> ia64 in the hypervisor was deprecated and with the 4.3 release has been
> removed.
>
> AFAIK ia64/Xen support has been broken in the mainline kernel for year.

Yes, that's why it didn't occur to me to do anything about it.

Also, the ia64 code differs from the x86 code, I can't just replace it
with the newly introduced common code unfortunately.

2013-05-03 10:43:33

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

On Thu, 2 May 2013, Christopher Covington wrote:
> > So the virtual timer should appear to have been running even while time
> > is being stolen and therefore stolen time needs to be accounted via some
> > other means.
>
> Something that's not currently obvious to me is that given that the stolen
> cycle accounting should be done, what makes the architected timer interrupt
> handler the ideal place to do it?

That is a good question and I would appreciate suggestions to improve
the patch.

Given that Xen x86 and ia64 does stolen time accounting from the timer
interrupt handler:

arch/x86/xen/time.c:xen_timer_interrupt
arch/ia64/kernel/time.c:timer_interrupt

and given that the arch_timer is the only timer used by Xen on ARM and
that it includes a virt_timer that is made on purpose to be used by
virtual machines, I thought that it might be a good place for it.

I also thought that doing it this way, KVM should be able to reuse the
same hook.

2013-05-03 10:54:41

by Marc Zyngier

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

On 03/05/13 11:43, Stefano Stabellini wrote:
> On Thu, 2 May 2013, Christopher Covington wrote:
>>> So the virtual timer should appear to have been running even while time
>>> is being stolen and therefore stolen time needs to be accounted via some
>>> other means.
>>
>> Something that's not currently obvious to me is that given that the stolen
>> cycle accounting should be done, what makes the architected timer interrupt
>> handler the ideal place to do it?
>
> That is a good question and I would appreciate suggestions to improve
> the patch.
>
> Given that Xen x86 and ia64 does stolen time accounting from the timer
> interrupt handler:
>
> arch/x86/xen/time.c:xen_timer_interrupt
> arch/ia64/kernel/time.c:timer_interrupt
>
> and given that the arch_timer is the only timer used by Xen on ARM and
> that it includes a virt_timer that is made on purpose to be used by
> virtual machines, I thought that it might be a good place for it.
>
> I also thought that doing it this way, KVM should be able to reuse the
> same hook.

Indeed. I just need to understand how time stealing works there ;-).

Now, KVM is not necessarily limited to arch_timers, and we've run KVM
using a QEMU-provided timer in the past. Can you think of a more generic
location for this hook? Possibly something that would satisfy the
requirements of other architectures while we're at it?

M.
--
Jazz is not dead. It just smells funny...

2013-05-05 16:47:21

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

On Fri, 3 May 2013, Marc Zyngier wrote:
> On 03/05/13 11:43, Stefano Stabellini wrote:
> > On Thu, 2 May 2013, Christopher Covington wrote:
> >>> So the virtual timer should appear to have been running even while time
> >>> is being stolen and therefore stolen time needs to be accounted via some
> >>> other means.
> >>
> >> Something that's not currently obvious to me is that given that the stolen
> >> cycle accounting should be done, what makes the architected timer interrupt
> >> handler the ideal place to do it?
> >
> > That is a good question and I would appreciate suggestions to improve
> > the patch.
> >
> > Given that Xen x86 and ia64 does stolen time accounting from the timer
> > interrupt handler:
> >
> > arch/x86/xen/time.c:xen_timer_interrupt
> > arch/ia64/kernel/time.c:timer_interrupt
> >
> > and given that the arch_timer is the only timer used by Xen on ARM and
> > that it includes a virt_timer that is made on purpose to be used by
> > virtual machines, I thought that it might be a good place for it.
> >
> > I also thought that doing it this way, KVM should be able to reuse the
> > same hook.
>
> Indeed. I just need to understand how time stealing works there ;-).
>
> Now, KVM is not necessarily limited to arch_timers, and we've run KVM
> using a QEMU-provided timer in the past. Can you think of a more generic
> location for this hook? Possibly something that would satisfy the
> requirements of other architectures while we're at it?

Probably the best option would be to reuse

kernel/sched/cputime.c:steal_account_process_tick

but that also means introducing CONFIG_PARAVIRT on arm.
I am up for that, what do you think?

2013-05-06 14:36:26

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

> > e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
> > would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
> > where the timer goes off. Nor does it make sense to require that the
> > guest actually be running for 5 before injecting the timer because that
> > would mean real time elapsed time for the timer would be 5+3 in the case
> > where 3 are stolen.
>
> This is a bit of an aside, but I think that hiding time spent at higher
> privilege levels can be a quite sensible approach to timekeeping in a
> virtualized environment, but I understand that it's not the approach taken
> with Xen, and as you pointed out above, adjusting the Virtual Offset Register
> by itself isn't enough to implement that approach.

This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for
implementation. In the user-space, the entry in 'top' of "stolen" (%st)
is for this exact value.

2013-05-06 14:55:42

by Christopher Covington

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

On 05/05/2013 12:47 PM, Stefano Stabellini wrote:
> On Fri, 3 May 2013, Marc Zyngier wrote:
>> On 03/05/13 11:43, Stefano Stabellini wrote:
>>> On Thu, 2 May 2013, Christopher Covington wrote:
>>>>> So the virtual timer should appear to have been running even while time
>>>>> is being stolen and therefore stolen time needs to be accounted via some
>>>>> other means.
>>>>
>>>> Something that's not currently obvious to me is that given that the stolen
>>>> cycle accounting should be done, what makes the architected timer interrupt
>>>> handler the ideal place to do it?
>>>
>>> That is a good question and I would appreciate suggestions to improve
>>> the patch.
>>>
>>> Given that Xen x86 and ia64 does stolen time accounting from the timer
>>> interrupt handler:
>>>
>>> arch/x86/xen/time.c:xen_timer_interrupt
>>> arch/ia64/kernel/time.c:timer_interrupt
>>>
>>> and given that the arch_timer is the only timer used by Xen on ARM and
>>> that it includes a virt_timer that is made on purpose to be used by
>>> virtual machines, I thought that it might be a good place for it.
>>>
>>> I also thought that doing it this way, KVM should be able to reuse the
>>> same hook.
>>
>> Indeed. I just need to understand how time stealing works there ;-).
>>
>> Now, KVM is not necessarily limited to arch_timers, and we've run KVM
>> using a QEMU-provided timer in the past. Can you think of a more generic
>> location for this hook? Possibly something that would satisfy the
>> requirements of other architectures while we're at it?
>
> Probably the best option would be to reuse
>
> kernel/sched/cputime.c:steal_account_process_tick
>
> but that also means introducing CONFIG_PARAVIRT on arm.
> I am up for that, what do you think?

That sounds like a good move to me.

Christopher

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

2013-05-07 16:17:58

by Christopher Covington

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

Hi Konrad,

On 05/06/2013 10:35 AM, Konrad Rzeszutek Wilk wrote:
>>> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
>>> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
>>> where the timer goes off. Nor does it make sense to require that the
>>> guest actually be running for 5 before injecting the timer because that
>>> would mean real time elapsed time for the timer would be 5+3 in the case
>>> where 3 are stolen.
>>
>> This is a bit of an aside, but I think that hiding time spent at higher
>> privilege levels can be a quite sensible approach to timekeeping in a
>> virtualized environment, but I understand that it's not the approach taken
>> with Xen, and as you pointed out above, adjusting the Virtual Offset Register
>> by itself isn't enough to implement that approach.
>
> This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for
> implementation. In the user-space, the entry in 'top' of "stolen" (%st)
> is for this exact value.

I may have been unclear with my terms, sorry. When I refer to time being
"hidden", I mean that kernel level software (supervisor mode, EL1) cannot
detect the passage of that time at all. I don't know whether this would really
work, but I imagine one might be able to get close with the current
virtualization facilities for ARM.

Am I correct in interpreting that what you're referring to is the deployment
of paravirtualization code that ensures (observable) "stolen" time is factored
into kernel decision-making?

Thanks,
Christopher

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

2013-05-08 11:19:56

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

On Tue, 7 May 2013, Christopher Covington wrote:
> Hi Konrad,
>
> On 05/06/2013 10:35 AM, Konrad Rzeszutek Wilk wrote:
> >>> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
> >>> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
> >>> where the timer goes off. Nor does it make sense to require that the
> >>> guest actually be running for 5 before injecting the timer because that
> >>> would mean real time elapsed time for the timer would be 5+3 in the case
> >>> where 3 are stolen.
> >>
> >> This is a bit of an aside, but I think that hiding time spent at higher
> >> privilege levels can be a quite sensible approach to timekeeping in a
> >> virtualized environment, but I understand that it's not the approach taken
> >> with Xen, and as you pointed out above, adjusting the Virtual Offset Register
> >> by itself isn't enough to implement that approach.
> >
> > This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for
> > implementation. In the user-space, the entry in 'top' of "stolen" (%st)
> > is for this exact value.
>
> I may have been unclear with my terms, sorry. When I refer to time being
> "hidden", I mean that kernel level software (supervisor mode, EL1) cannot
> detect the passage of that time at all. I don't know whether this would really
> work, but I imagine one might be able to get close with the current
> virtualization facilities for ARM.
>
> Am I correct in interpreting that what you're referring to is the deployment
> of paravirtualization code that ensures (observable) "stolen" time is factored
> into kernel decision-making?

Although it might be possible to hide the real time flow from the VM, it
is inadvisable: what would happen when the VM needs to deal with a real
hardware device? Or just send packets over the network? This is why it
is much safer and more reliable to expose the stolen ticks to the VM.

2013-05-08 11:48:22

by Christopher Covington

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 1/3] arm_arch_timer: introduce arch_timer_stolen_ticks

On 05/08/2013 07:19 AM, Stefano Stabellini wrote:
> On Tue, 7 May 2013, Christopher Covington wrote:
>> Hi Konrad,
>>
>> On 05/06/2013 10:35 AM, Konrad Rzeszutek Wilk wrote:
>>>>> e.g. if a VCPU sets a timer for NOW+5, but 3 are stolen in the middle it
>>>>> would not make sense (from the guests PoV) for NOW'==NOW+2 at the point
>>>>> where the timer goes off. Nor does it make sense to require that the
>>>>> guest actually be running for 5 before injecting the timer because that
>>>>> would mean real time elapsed time for the timer would be 5+3 in the case
>>>>> where 3 are stolen.
>>>>
>>>> This is a bit of an aside, but I think that hiding time spent at higher
>>>> privilege levels can be a quite sensible approach to timekeeping in a
>>>> virtualized environment, but I understand that it's not the approach taken
>>>> with Xen, and as you pointed out above, adjusting the Virtual Offset Register
>>>> by itself isn't enough to implement that approach.
>>>
>>> This is the approach taken by Xen and KVM. Look in CONFIG_PARAVIRT_CLOCK for
>>> implementation. In the user-space, the entry in 'top' of "stolen" (%st)
>>> is for this exact value.
>>
>> I may have been unclear with my terms, sorry. When I refer to time being
>> "hidden", I mean that kernel level software (supervisor mode, EL1) cannot
>> detect the passage of that time at all. I don't know whether this would really
>> work, but I imagine one might be able to get close with the current
>> virtualization facilities for ARM.
>>
>> Am I correct in interpreting that what you're referring to is the deployment
>> of paravirtualization code that ensures (observable) "stolen" time is factored
>> into kernel decision-making?
>
> Although it might be possible to hide the real time flow from the VM, it
> is inadvisable: what would happen when the VM needs to deal with a real
> hardware device? Or just send packets over the network? This is why it
> is much safer and more reliable to expose the stolen ticks to the VM.

One would probably have to emulate all the hardware. I don't mean to imply
that I think this is useful for everyday virtualization, but I speculate that
such an approach might enable the analysis of kernels infected with
VM-detecting malware or other niche use-cases.

Christopher

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.