On Mon, 2011-01-31 at 12:25 +0100, Peter Zijlstra wrote:
> On Fri, 2011-01-28 at 14:52 -0500, Glauber Costa wrote:
>
> > +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> > +static DEFINE_PER_CPU(u64, cpu_steal_time);
> > +
> > +#ifndef CONFIG_64BIT
> > +static DEFINE_PER_CPU(seqcount_t, steal_time_seq);
> > +
> > +static inline void steal_time_write_begin(void)
> > +{
> > + __this_cpu_inc(steal_time_seq.sequence);
> > + smp_wmb();
> > +}
> > +
> > +static inline void steal_time_write_end(void)
> > +{
> > + smp_wmb();
> > + __this_cpu_inc(steal_time_seq.sequence);
> > +}
> > +
> > +static inline u64 steal_time_read(int cpu)
> > +{
> > + u64 steal_time;
> > + unsigned seq;
> > +
> > + do {
> > + seq = read_seqcount_begin(&per_cpu(steal_time_seq, cpu));
> > + steal_time = per_cpu(cpu_steal_time, cpu);
> > + } while (read_seqcount_retry(&per_cpu(steal_time_seq, cpu), seq));
> > +
> > + return steal_time;
> > +}
> > +#else /* CONFIG_64BIT */
> > +static inline void steal_time_write_begin(void)
> > +{
> > +}
> > +
> > +static inline void steal_time_write_end(void)
> > +{
> > +}
> > +
> > +static inline u64 steal_time_read(int cpu)
> > +{
> > + return per_cpu(cpu_steal_time, cpu);
> > +}
>
>
> > @@ -3536,6 +3592,11 @@ static int touch_steal_time(int is_idle)
> >
> > if (st) {
> > account_steal_time(st);
> > +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> > + steal_time_write_begin();
> > + __this_cpu_add(cpu_steal_time, steal);
> > + steal_time_write_end();
> > +#endif
> > return 1;
> > }
> > return 0;
>
>
> Why replicate all logic you've already got in patch 4? That too is
> reading steal time in a loop in kvm_account_steal_time(), why not extend
> that interface to take a cpu argument and be done with it?
Because that part is kvm-specific, and this is scheduler general.
It seemed cleaner to me to do it this way. But I can do it differently,
certainly.
>
On Tue, 2011-02-01 at 13:59 -0200, Glauber Costa wrote:
>
> Because that part is kvm-specific, and this is scheduler general.
> It seemed cleaner to me to do it this way. But I can do it differently,
> certainly.
Well, any steal time clock will be hypervisor specific, but if we agree
that anything that enables CONFIG_PARAVIRT_TIME_ACCOUNTING provides a
u64 steal_time_clock(int cpu) function then all should be well, right?
The bit you have in kvm is almost that, except it assumes cpu ==
this_cpu.
You simply cannot rely on the silly tick accounting to drive any clock,
its archaic.
On Tue, 2011-02-01 at 17:19 +0100, Peter Zijlstra wrote:
> On Tue, 2011-02-01 at 13:59 -0200, Glauber Costa wrote:
> >
> > Because that part is kvm-specific, and this is scheduler general.
> > It seemed cleaner to me to do it this way. But I can do it differently,
> > certainly.
>
> Well, any steal time clock will be hypervisor specific, but if we agree
> that anything that enables CONFIG_PARAVIRT_TIME_ACCOUNTING provides a
> u64 steal_time_clock(int cpu) function then all should be well, right?
Once the hypervisor provided the data, it can all be generic, and have
large parts of it that are generic, living in sched.c.
> The bit you have in kvm is almost that, except it assumes cpu ==
> this_cpu.
>
> You simply cannot rely on the silly tick accounting to drive any clock,
> its archaic.
Which tick accounting? In your other e-mail , you pointed that this only
runs in touch_steal_time, which is fine, will change. But all the rest
here, that is behind the hypervisor specific vs generic code has nothing
to do with ticks at all.
>
>
On Tue, 2011-02-01 at 14:22 -0200, Glauber Costa wrote:
>
>
> Which tick accounting? In your other e-mail , you pointed that this only
> runs in touch_steal_time, which is fine, will change.
That tick ;-), all the account_foo muck is per tick.
> But all the rest
> here, that is behind the hypervisor specific vs generic code has nothing
> to do with ticks at all.
But I don't get it, there is no generic code needed, all that's needed
is u64 steal_time_clock(int cpu), and the first part of your
kvm_account_steal_time() function is exactly that if you add the cpu
argument.
+static u64 steal_time_clock(int cpu)
+{
+ u64 steal_time;
+ struct kvm_steal_time *src;
+ int version;
+
+ preempt_disable();
+ src = &per_cpu_ptr(steal_time, cpu);
+ do {
+ version = src->version;
+ rmb();
+ steal_time = src->steal;
+ rmb();
+ } while ((src->version & 1) || (version != src->version));
+ preempt_enable();
+
+ return steal_time
+}
And you're done.. no need to for any of that steal_time_{read,write} business.
On Tue, 2011-02-01 at 19:59 +0100, Peter Zijlstra wrote:
> On Tue, 2011-02-01 at 14:22 -0200, Glauber Costa wrote:
> >
> >
> > Which tick accounting? In your other e-mail , you pointed that this only
> > runs in touch_steal_time, which is fine, will change.
>
> That tick ;-), all the account_foo muck is per tick.
>
> > But all the rest
> > here, that is behind the hypervisor specific vs generic code has nothing
> > to do with ticks at all.
>
> But I don't get it, there is no generic code needed, all that's needed
> is u64 steal_time_clock(int cpu), and the first part of your
> kvm_account_steal_time() function is exactly that if you add the cpu
> argument.
>
> +static u64 steal_time_clock(int cpu)
> +{
> + u64 steal_time;
> + struct kvm_steal_time *src;
> + int version;
> +
> + preempt_disable();
> + src = &per_cpu_ptr(steal_time, cpu);
> + do {
> + version = src->version;
> + rmb();
> + steal_time = src->steal;
> + rmb();
> + } while ((src->version & 1) || (version != src->version));
> + preempt_enable();
> +
> + return steal_time
> +}
>
> And you're done.. no need to for any of that steal_time_{read,write} business.
update_rq_clock_task still have to keep track of what was the last steal
time value we saw, in the same way it does for irq. One option is to
call update_rq_clock_task from inside kvm-code, but I don't really like
it very much.
But okay, there are many ways to work around it, I'll cook something.
On Tue, 2011-02-01 at 17:55 -0200, Glauber Costa wrote:
>
> update_rq_clock_task still have to keep track of what was the last steal
> time value we saw, in the same way it does for irq.
Right, the CONFIG_SCHED_PARAVIRT patch I sent earlier adds a
prev_steal_time member to struct rq for this purpose.
> One option is to
> call update_rq_clock_task from inside kvm-code, but I don't really like
> it very much.
Why would you need to call anything from the kvm code?
Simply make u64 steal_time_clock(int cpu) a paravirt function with u64
native_steal_time_clock(int cpu) { return 0ULL; }. Possibly avoid the
whole CONFIG_SCHED_PARAVIRT block in update_rq_clock_task() for !
paravirt guests.