From: Glauber de Oliveira Costa <[email protected]>
tsc is very good time source (when it does not have drifts, does not
change it's frequency, i.e. when it works), so it should have its rating
raised to a value greater than, or equal 400.
Since it's being a tendency among paravirt clocksources to use values
around 400, we should declare tsc as even better: So we use 500.
This patch also touches the comments on clocksource.h, which suggests
that 499 would be a limit on the rating values.
Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/x86/kernel/tsc_32.c | 2 +-
arch/x86/kernel/tsc_64.c | 2 +-
include/linux/clocksource.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
index 9ebc0da..4d91e59 100644
--- a/arch/x86/kernel/tsc_32.c
+++ b/arch/x86/kernel/tsc_32.c
@@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
static struct clocksource clocksource_tsc = {
.name = "tsc",
- .rating = 300,
+ .rating = 500,
.read = read_tsc,
.mask = CLOCKSOURCE_MASK(64),
.mult = 0, /* to be set */
diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
index 9c70af4..4fd5b1b 100644
--- a/arch/x86/kernel/tsc_64.c
+++ b/arch/x86/kernel/tsc_64.c
@@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
static struct clocksource clocksource_tsc = {
.name = "tsc",
- .rating = 300,
+ .rating = 500,
.read = read_tsc,
.mask = CLOCKSOURCE_MASK(64),
.shift = 22,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 107787a..5b0aadd 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -39,7 +39,7 @@ struct clocksource;
* A correct and usable clocksource.
* 300-399: Desired.
* A reasonably fast and accurate clocksource.
- * 400-499: Perfect
+ * >= 400 : Perfect
* The ideal clocksource. A must-use where
* available.
* @read: returns a cycle value
--
1.5.0.6
On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:
CC'ed John and removed [email protected] :)
> From: Glauber de Oliveira Costa <[email protected]>
>
> tsc is very good time source (when it does not have drifts, does not
> change it's frequency, i.e. when it works), so it should have its rating
> raised to a value greater than, or equal 400.
>
> Since it's being a tendency among paravirt clocksources to use values
> around 400, we should declare tsc as even better: So we use 500.
>
> This patch also touches the comments on clocksource.h, which suggests
> that 499 would be a limit on the rating values.
>
> Signed-off-by: Glauber de Oliveira Costa <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
> ---
> arch/x86/kernel/tsc_32.c | 2 +-
> arch/x86/kernel/tsc_64.c | 2 +-
> include/linux/clocksource.h | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> index 9ebc0da..4d91e59 100644
> --- a/arch/x86/kernel/tsc_32.c
> +++ b/arch/x86/kernel/tsc_32.c
> @@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
>
> static struct clocksource clocksource_tsc = {
> .name = "tsc",
> - .rating = 300,
> + .rating = 500,
> .read = read_tsc,
> .mask = CLOCKSOURCE_MASK(64),
> .mult = 0, /* to be set */
> diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
> index 9c70af4..4fd5b1b 100644
> --- a/arch/x86/kernel/tsc_64.c
> +++ b/arch/x86/kernel/tsc_64.c
> @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
>
> static struct clocksource clocksource_tsc = {
> .name = "tsc",
> - .rating = 300,
> + .rating = 500,
> .read = read_tsc,
> .mask = CLOCKSOURCE_MASK(64),
> .shift = 22,
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 107787a..5b0aadd 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -39,7 +39,7 @@ struct clocksource;
> * A correct and usable clocksource.
> * 300-399: Desired.
> * A reasonably fast and accurate clocksource.
> - * 400-499: Perfect
> + * >= 400 : Perfect
> * The ideal clocksource. A must-use where
> * available.
> * @read: returns a cycle value
> --
> 1.5.0.6
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
* Thomas Gleixner <[email protected]> wrote:
> > Since it's being a tendency among paravirt clocksources to use
> > values around 400, we should declare tsc as even better: So we use
> > 500.
> >
> > This patch also touches the comments on clocksource.h, which
> > suggests that 499 would be a limit on the rating values.
> >
> > Signed-off-by: Glauber de Oliveira Costa <[email protected]>
>
> Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Ingo
On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
> From: Glauber de Oliveira Costa <[email protected]>
>
> tsc is very good time source (when it does not have drifts, does not
> change it's frequency, i.e. when it works), so it should have its rating
> raised to a value greater than, or equal 400.
>
> Since it's being a tendency among paravirt clocksources to use values
> around 400, we should declare tsc as even better: So we use 500.
Why is the TSC better than a paravirt clocksource? In our case this is
definitely inaccurate. Paravirt clocksources should be preferred to
TSC, and both must be made available in hardware for platforms which do
not support paravirt.
Also, please cc all the paravirt developers on things related to
paravirt, especially things with such broad effect. I think 400 is a
good value for a perfect native clocksource. >400 should be reserved
for super-real (i.e. paravirt) sources that should always be chosen over
a hardware realistic implementation in a virtual environment.
Zach
Zachary Amsden wrote:
> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
>
>> From: Glauber de Oliveira Costa <[email protected]>
>>
>> tsc is very good time source (when it does not have drifts, does not
>> change it's frequency, i.e. when it works), so it should have its rating
>> raised to a value greater than, or equal 400.
>>
>> Since it's being a tendency among paravirt clocksources to use values
>> around 400, we should declare tsc as even better: So we use 500.
>>
>
> Why is the TSC better than a paravirt clocksource? In our case this is
> definitely inaccurate. Paravirt clocksources should be preferred to
> TSC, and both must be made available in hardware for platforms which do
> not support paravirt.
>
> Also, please cc all the paravirt developers on things related to
> paravirt, especially things with such broad effect. I think 400 is a
> good value for a perfect native clocksource. >400 should be reserved
> for super-real (i.e. paravirt) sources that should always be chosen over
> a hardware realistic implementation in a virtual environment.
>
Yes, agreed. The tsc is never the right thing to use if there's a
paravirt clocksource available.
What's wrong with rating it 300? What inferior clocksource does it lose
out to? Shouldn't that clocksource be lowered? (Why don't we just use
1 to 10?)
J
* Zachary Amsden <[email protected]> wrote:
> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
> > From: Glauber de Oliveira Costa <[email protected]>
> >
> > tsc is very good time source (when it does not have drifts, does not
> > change it's frequency, i.e. when it works), so it should have its rating
> > raised to a value greater than, or equal 400.
> >
> > Since it's being a tendency among paravirt clocksources to use values
> > around 400, we should declare tsc as even better: So we use 500.
>
> Why is the TSC better than a paravirt clocksource? In our case this
> is definitely inaccurate. Paravirt clocksources should be preferred
> to TSC, and both must be made available in hardware for platforms
> which do not support paravirt.
if it's inaccurate why are you exposing it to the guest then? Native
only uses the TSC if it's safe and accurate to do so.
Ingo
Ingo Molnar wrote:
> * Zachary Amsden <[email protected]> wrote:
>
>
>> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
>>
>>> From: Glauber de Oliveira Costa <[email protected]>
>>>
>>> tsc is very good time source (when it does not have drifts, does not
>>> change it's frequency, i.e. when it works), so it should have its rating
>>> raised to a value greater than, or equal 400.
>>>
>>> Since it's being a tendency among paravirt clocksources to use values
>>> around 400, we should declare tsc as even better: So we use 500.
>>>
>> Why is the TSC better than a paravirt clocksource? In our case this
>> is definitely inaccurate. Paravirt clocksources should be preferred
>> to TSC, and both must be made available in hardware for platforms
>> which do not support paravirt.
>>
>
> if it's inaccurate why are you exposing it to the guest then? Native
> only uses the TSC if it's safe and accurate to do so.
>
It is used as part of the Xen clocksource as a short term extrapolator,
with correction parameters supplied by the hypervisor. It should never
be used directly.
J
On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
> * Zachary Amsden <[email protected]> wrote:
> if it's inaccurate why are you exposing it to the guest then? Native
> only uses the TSC if it's safe and accurate to do so.
Not every guest support paravirt, but for correctness, all guests
require TSC, which must be exposed all the way up to userspace, no
matter what the efficiency or accuracy may be.
Zach
* Jeremy Fitzhardinge <[email protected]> wrote:
> > if it's inaccurate why are you exposing it to the guest then? Native
> > only uses the TSC if it's safe and accurate to do so.
>
> It is used as part of the Xen clocksource as a short term
> extrapolator, with correction parameters supplied by the hypervisor.
> It should never be used directly.
that's totally broken then. You cannot create an SMP-safe monotonic
clocksource via interpolation - native does not do it either. Good thing
this problem got exposed, it needs to be fixed.
Ingo
* Zachary Amsden <[email protected]> wrote:
> On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
> > * Zachary Amsden <[email protected]> wrote:
>
> > if it's inaccurate why are you exposing it to the guest then? Native
> > only uses the TSC if it's safe and accurate to do so.
>
> Not every guest support paravirt, but for correctness, all guests
> require TSC, which must be exposed all the way up to userspace, no
> matter what the efficiency or accuracy may be.
but if there's a perfect TSC available (there is such hardware) then the
TSC _is_ the best clocksource. Paravirt now turns it off unconditionally
in essence.
anyway, that's at most an optimization issue. No strong feelings here,
and we can certainly delay this patch until this gets all sorted out.
Ingo
On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
> * Zachary Amsden <[email protected]> wrote:
> > Not every guest support paravirt, but for correctness, all guests
> > require TSC, which must be exposed all the way up to userspace, no
> > matter what the efficiency or accuracy may be.
>
> but if there's a perfect TSC available (there is such hardware) then the
> TSC _is_ the best clocksource. Paravirt now turns it off unconditionally
> in essence.
No, if no paravirt clocksource is detected, nothing can override the
perfect TSC hardware clocksource rating of 400. And if a paravirt
clocksource is detected, it is always better than TSC.
> anyway, that's at most an optimization issue. No strong feelings here,
> and we can certainly delay this patch until this gets all sorted out.
This patch should be nacked, since it is just wrong. This is not an
optimization issue. It is an accuracy issue for all virtualization
environments that affects long term kernel clock stability, which is
important to fix, and the best way to do that is to use a paravirt
clocksource.
Zach
* Zachary Amsden <[email protected]> wrote:
> On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
> > * Zachary Amsden <[email protected]> wrote:
>
> > > Not every guest support paravirt, but for correctness, all guests
> > > require TSC, which must be exposed all the way up to userspace, no
> > > matter what the efficiency or accuracy may be.
> >
> > but if there's a perfect TSC available (there is such hardware) then the
> > TSC _is_ the best clocksource. Paravirt now turns it off unconditionally
> > in essence.
>
> No, if no paravirt clocksource is detected, nothing can override the
> perfect TSC hardware clocksource rating of 400. And if a paravirt
> clocksource is detected, it is always better than TSC.
>
> > anyway, that's at most an optimization issue. No strong feelings here,
> > and we can certainly delay this patch until this gets all sorted out.
>
> This patch should be nacked, since it is just wrong. This is not an
> optimization issue. It is an accuracy issue for all virtualization
> environments that affects long term kernel clock stability, which is
> important to fix, and the best way to do that is to use a paravirt
> clocksource.
i know it's not an optimization issue. Your current pessimisation of
even perfect TSCs _is_ an optimization issue.
(and note that if the TSC is unstable the guest will/should notice it
and will fall back to the hyper clocksource)
Ingo
Ingo Molnar wrote:
> that's totally broken then. You cannot create an SMP-safe monotonic
> clocksource via interpolation - native does not do it either. Good thing
> this problem got exposed, it needs to be fixed.
>
Sigh, I don't really want to have this fight again.
I don't really see what point there is in raising the tsc's rating (why
is 300 insufficient again?), but regardless of the value, the Xen
clocksource rating needs to be higher.
J
* Jeremy Fitzhardinge <[email protected]> wrote:
> Ingo Molnar wrote:
> > that's totally broken then. You cannot create an SMP-safe monotonic
> > clocksource via interpolation - native does not do it either. Good thing
> > this problem got exposed, it needs to be fixed.
> >
>
> Sigh, I don't really want to have this fight again.
i dont remember us having discussed this before, ever. If there's any
"fight" about monotonicity and SMP then it would be a pretty onesided
affair, with you being beaten up seriously ;-)
> I don't really see what point there is in raising the tsc's rating
> (why is 300 insufficient again?), but regardless of the value, the Xen
> clocksource rating needs to be higher.
anyway, i agree that this patch cannot go in in its current form.
Ingo
On 10/29/2007 04:02 PM, Ingo Molnar wrote:
> * Zachary Amsden <[email protected]> wrote:
>
>> On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
>>> * Zachary Amsden <[email protected]> wrote:
>>> if it's inaccurate why are you exposing it to the guest then? Native
>>> only uses the TSC if it's safe and accurate to do so.
>> Not every guest support paravirt, but for correctness, all guests
>> require TSC, which must be exposed all the way up to userspace, no
>> matter what the efficiency or accuracy may be.
>
> but if there's a perfect TSC available (there is such hardware) then the
> TSC _is_ the best clocksource. Paravirt now turns it off unconditionally
> in essence.
>
Not really. In the case hardware TSC is perfect, the paravirt time
counter can be implemented directly in terms of hardware TSC; there is
no loss in optimization. This is done transparently. And virtual TSC
can be implemented this way too.
The real improvement that a paravirt clocksource offers over the TSC
clocksource is that the guest does not need to measure the TSC frequency
itself against some other constant frequency source (which is
problematic on a virtual machine). Instead, the paravirt clocksource
queries the hypervisor for the frequency of the counter. As you know,
with clocksource style kernels, it's important to get this frequency
correct, or else the guest will have long-term time drift.
Ingo Molnar wrote:
> i dont remember us having discussed this before, ever. If there's any
> "fight" about monotonicity and SMP then it would be a pretty onesided
> affair, with you being beaten up seriously ;-)
>
This is part of Xen's ABI, so it isn't easily changed. You're right
that getting monotonicity is a pretty ugly affair of doing something
like "if (now < previous_return) return ++previous_return;", but its
better than using the tsc.
(Last time around, it was "Xen can't use the tsc because it can't ever
be used as a stable timebase" - though I don't think that was you
specifically.)
J
Glauber de Oliveira Costa wrote:
> From: Glauber de Oliveira Costa <[email protected]>
>
> tsc is very good time source (when it does not have drifts, does not
> change it's frequency, i.e. when it works), so it should have its rating
> raised to a value greater than, or equal 400.
>
> Since it's being a tendency among paravirt clocksources to use values
> around 400, we should declare tsc as even better: So we use 500.
>
> This patch also touches the comments on clocksource.h, which suggests
> that 499 would be a limit on the rating values.
>
> Signed-off-by: Glauber de Oliveira Costa <[email protected]>
Are you sure these paravirt sources don't intentionally trump the TSC?
-hpa
On Mon, 2007-10-29 at 23:17 +0100, Thomas Gleixner wrote:
> On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:
>
> CC'ed John and removed [email protected] :)
>
> > From: Glauber de Oliveira Costa <[email protected]>
> >
> > tsc is very good time source (when it does not have drifts, does not
> > change it's frequency, i.e. when it works), so it should have its rating
> > raised to a value greater than, or equal 400.
But all of those qualities (and more) make it less then ideal.
What issue exactly does this patch resolve?
> > Since it's being a tendency among paravirt clocksources to use values
> > around 400, we should declare tsc as even better: So we use 500.
This is the rating inflation I was initially worried about and created
the advisory scale to avoid. If paravirt clocksources are rated too
high, they should be dropped.
I strongly disagree that the TSC is a "perfect" clocksource, and I think
its rating is appropriate.
> > This patch also touches the comments on clocksource.h, which suggests
> > that 499 would be a limit on the rating values.
> >
> > Signed-off-by: Glauber de Oliveira Costa <[email protected]>
>
> Acked-by: Thomas Gleixner <[email protected]>
If a better justification can be provided, I might reconsider, but right
now I can't ack this.
thanks
-john
> > ---
> > arch/x86/kernel/tsc_32.c | 2 +-
> > arch/x86/kernel/tsc_64.c | 2 +-
> > include/linux/clocksource.h | 2 +-
> > 3 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> > index 9ebc0da..4d91e59 100644
> > --- a/arch/x86/kernel/tsc_32.c
> > +++ b/arch/x86/kernel/tsc_32.c
> > @@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
> >
> > static struct clocksource clocksource_tsc = {
> > .name = "tsc",
> > - .rating = 300,
> > + .rating = 500,
> > .read = read_tsc,
> > .mask = CLOCKSOURCE_MASK(64),
> > .mult = 0, /* to be set */
> > diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
> > index 9c70af4..4fd5b1b 100644
> > --- a/arch/x86/kernel/tsc_64.c
> > +++ b/arch/x86/kernel/tsc_64.c
> > @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
> >
> > static struct clocksource clocksource_tsc = {
> > .name = "tsc",
> > - .rating = 300,
> > + .rating = 500,
> > .read = read_tsc,
> > .mask = CLOCKSOURCE_MASK(64),
> > .shift = 22,
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 107787a..5b0aadd 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -39,7 +39,7 @@ struct clocksource;
> > * A correct and usable clocksource.
> > * 300-399: Desired.
> > * A reasonably fast and accurate clocksource.
> > - * 400-499: Perfect
> > + * >= 400 : Perfect
> > * The ideal clocksource. A must-use where
> > * available.
> > * @read: returns a cycle value
> > --
> > 1.5.0.6
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> > Sigh, I don't really want to have this fight again.
>
> i dont remember us having discussed this before, ever. If there's any
> "fight" about monotonicity and SMP then it would be a pretty onesided
> affair, with you being beaten up seriously ;-)
Actually, it is possible, even for NUMA systems with CPUs running off
completely different oscillators, and in the presence of CPU frequency
changes, power management, and even in the presence of thermal
throttling (though the latter introduces temporary inaccuracies it
doesn't affect monotonicity or rate).
Take a look at the Xen code to see how each physical CPU is
independently calibrated on an ongoing basis, how movement of VCPUs
between physical CPUs is tracked, and how shared variables are used to
ensure montonicity if a guest requires it.
The fixed-rate TSCs on newer CPUs make some of this stuff easier, but
you still need to cope with different source oscillators and some power
management states.
Ian
> > I don't really see what point there is in raising the tsc's rating
> > (why is 300 insufficient again?), but regardless of the value, the
Xen
> > clocksource rating needs to be higher.
>
> anyway, i agree that this patch cannot go in in its current form.
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Tuesday 30 October 2007 09:17:38 Thomas Gleixner wrote:
> On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:
>
> CC'ed John and removed [email protected] :)
>
> > From: Glauber de Oliveira Costa <[email protected]>
> >
> > tsc is very good time source (when it does not have drifts, does not
> > change it's frequency, i.e. when it works), so it should have its rating
> > raised to a value greater than, or equal 400.
> >
> > Since it's being a tendency among paravirt clocksources to use values
> > around 400, we should declare tsc as even better: So we use 500.
> >
> > This patch also touches the comments on clocksource.h, which suggests
> > that 499 would be a limit on the rating values.
> >
> > Signed-off-by: Glauber de Oliveira Costa <[email protected]>
>
> Acked-by: Thomas Gleixner <[email protected]>
No. tsc is very good, it's not perfect. If a paravirt clock registers 400 it
really means "pick me over the tsc".
That's *why* they use > 400: it's in the documentation.
Rusty.
Dan Hecht wrote:
> Not really. In the case hardware TSC is perfect, the paravirt time
> counter can be implemented directly in terms of hardware TSC; there is
> no loss in optimization. This is done transparently. And virtual TSC
> can be implemented this way too.
>
> The real improvement that a paravirt clocksource offers over the TSC
> clocksource is that the guest does not need to measure the TSC frequency
> itself against some other constant frequency source (which is
> problematic on a virtual machine). Instead, the paravirt clocksource
> queries the hypervisor for the frequency of the counter. As you know,
> with clocksource style kernels, it's important to get this frequency
> correct, or else the guest will have long-term time drift.
>
>
In addition, a paravirt clocksource can compensate for events like vcpu
migration to another host cpu. So I agree: a paravirt clocksource is
always better than or equal to the tsc.
--
Any sufficiently difficult bug is indistinguishable from a feature.
* Dan Hecht <[email protected]> wrote:
>> but if there's a perfect TSC available (there is such hardware) then
>> the TSC _is_ the best clocksource. Paravirt now turns it off
>> unconditionally in essence.
>
> Not really. In the case hardware TSC is perfect, the paravirt time
> counter can be implemented directly in terms of hardware TSC; there is
> no loss in optimization. This is done transparently. And virtual TSC
> can be implemented this way too.
Of course if you duplicate all (or part) of the TSC clocksource driver
in the paravirt guest code then the "paravirt clocksource" is at least
as good as the TSC. But that argument is playing word-games, _of course_
if you use the same (or similar) code it's at least as good. The real
question are clocksources that communicate out to the hypervisor, and
hence have higher overhead than a native, TSC based clocksource - and
clocksources that use the TSC in a broken way.
> The real improvement that a paravirt clocksource offers over the TSC
> clocksource is that the guest does not need to measure the TSC
> frequency itself against some other constant frequency source (which
> is problematic on a virtual machine). [...]
hey, you need not tell me, i've implemented a hyper-clocksource driver
myself. But calibration is a boot only issue and there's no reason why
calibration _has_ to be fragile. For example we could easily extend the
TSC clocksource driver to not calibrate in the guest but take
calibration information from the host. It's in essence a trivial and
obvious extension to calibration. That way we get the highest possible
performance _and_ we share much of the clocksource driver with the host.
also, the way the TSC is used by guests like Xen is fundamentally
fragile on SMP. So i have a good reason to distrust the approach of
hypervisors to timekeeping. The maintenance problem to me is that
everyone in the paravirt space is busy coding away in their own (often
broken) direction, replicating the essence of the TSC clocksource driver
4 times over again and again, with subtle bugs in each variant, even in
cases where the TSC readout can be trusted perfectly well.
"Consolidation" and "sharing code" is not a particularly strong point of
the paravirt projects ;-) (ok, KVM is a notable exception there.)
anyway, i do agree that this patch is wrong currently, mainly due to TSC
calibration not being reliable in guest-space at the moment - but the
whole concept of putting a separate clocksource driver into each
paravirt guest, even in the case where the TSC is perfect, is madness.
That code, once the hardware gets sane (and there are good signs for
that), and once calibration can be passed from host to guest reliably,
_will_ be consolidated, because it makes perfect technical sense.
Ingo
* Ian Pratt <[email protected]> wrote:
> > > Sigh, I don't really want to have this fight again.
> >
> > i dont remember us having discussed this before, ever. If there's any
> > "fight" about monotonicity and SMP then it would be a pretty onesided
> > affair, with you being beaten up seriously ;-)
>
> Actually, it is possible, even for NUMA systems with CPUs running off
> completely different oscillators, and in the presence of CPU frequency
> changes, power management, and even in the presence of thermal
> throttling (though the latter introduces temporary inaccuracies it
> doesn't affect monotonicity or rate).
>
> Take a look at the Xen code to see how each physical CPU is
> independently calibrated on an ongoing basis, how movement of VCPUs
> between physical CPUs is tracked, and how shared variables are used to
> ensure montonicity if a guest requires it.
I think that's wishful thinking. Check out:
http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
change TEST_TSC to 0, run it on an SMP guest (on a reasonably fast
machine) and let me know whether you can make SMP guests not come up
with monotonicity violations in the CLOCK tests.
Ingo
* Rusty Russell <[email protected]> wrote:
> No. tsc is very good, it's not perfect. If a paravirt clock
> registers 400 it really means "pick me over the tsc".
often the TSC is not perfect, but _IF_ it's perfect, using the paravirt
driver is a pessimisation in performance.
the main problem at the moment is that there's no mechanism at the
moment to convey to the guest the information that the TSC is "perfect",
and to convey the calibration values.
> That's *why* they use > 400: it's in the documentation.
static values do not capture conditional quality like that of the TSC.
and just in case it's not obvious: i am not arguing for the inclusion of
the patch, i'm just pointing out the plain fact that in the case where
the TSC _is_ reliable, 5 different clocksource drivers for has obvious
disadvantages. Anyone arguing against that simple point needs his head
examined :) Once we can pass around calibration information from the
host to the guest (which we dont do at the moment) there will be reason
not to use the native clocksource driver in the guest.
in the long run, the paravirt clocksource drivers must become _fallback_
drivers, for the case when the TSC is not perfect. Instead of the
current "lets try to make it reliable but also nearly as fast as the
TSC", which leads to inevitable breakage on SMP.
Ingo
On Tuesday 30 October 2007 18:37:36 Ingo Molnar wrote:
> * Rusty Russell <[email protected]> wrote:
> > No. tsc is very good, it's not perfect. If a paravirt clock
> > registers 400 it really means "pick me over the tsc".
>
> often the TSC is not perfect, but _IF_ it's perfect, using the paravirt
> driver is a pessimisation in performance.
>
> the main problem at the moment is that there's no mechanism at the
> moment to convey to the guest the information that the TSC is "perfect",
> and to convey the calibration values.
The host can communicate to the guest what clock to use: the guest can decide
to register a paravirt clock or not depending on whether it wants to leave it
to the TSC.
For a while we couldn't remove the TSC cpuid capability in the guest, because
if you configured your kernel in some ways it was hardcoded on. I think
the "all 686+ have a tsc" assumption has now been fixed, so I should change
the lguest clock to do as you said: register its clock at lower prio to the
TSC and then the host can simply remove the TSC cpuid if it isn't suitable
for the guest to use.
ISTR the core TSC timing code (which lguest could use) and various hardware
manipulations (which lguest couldn't) were intertwined, but I'll have to go
back and check exactly what the issue was.
> and just in case it's not obvious: i am not arguing for the inclusion of
> the patch
Unfortunately, you and Thomas both acked the patch. This says v bad things
about how much review kernel patches get.
Cheers,
Rusty.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Zachary Amsden escreveu:
> On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
>> * Zachary Amsden <[email protected]> wrote:
>
>> if it's inaccurate why are you exposing it to the guest then? Native
>> only uses the TSC if it's safe and accurate to do so.
>
> Not every guest support paravirt, but for correctness, all guests
> require TSC, which must be exposed all the way up to userspace, no
> matter what the efficiency or accuracy may be.
>
> Zach
>
However, with such accuracy, it will fail the tsc clocksources tests. If
it passes, it'a good clocksource to use.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org
iD8DBQFHJxyujYI8LaFUWXMRAhI5AKCDl/O7iA3TdtpvElVg8NoSDVfKpgCeNvz3
ZEZDMxg8T7FA2R+5cgH/uKI=
=8qSH
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Zachary Amsden escreveu:
> On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
>> * Zachary Amsden <[email protected]> wrote:
>
>>> Not every guest support paravirt, but for correctness, all guests
>>> require TSC, which must be exposed all the way up to userspace, no
>>> matter what the efficiency or accuracy may be.
>> but if there's a perfect TSC available (there is such hardware) then the
>> TSC _is_ the best clocksource. Paravirt now turns it off unconditionally
>> in essence.
>
> No, if no paravirt clocksource is detected, nothing can override the
> perfect TSC hardware clocksource rating of 400. And if a paravirt
> clocksource is detected, it is always better than TSC.
Why always? tsc is the best possible alternative the _platform_ can
provide. So the paravirt clocksource will be at best, as good as tsc.
And if it is the case: why bother with communication mechanisms of all
kinds, calculations, etc, if we can just read the tsc?
Noting again: If the tsc does not arrive accurate to the guest, it will
fail the tsc clocksource tests. So it will be rated zero anyway
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org
iD8DBQFHJx1KjYI8LaFUWXMRAv0hAJ4sj0Z1FraYrgbU5Mj0pWOJGk6jtwCfc5xL
jpTC273X0oqPTCR7NcVHJOI=
=WPzV
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ingo Molnar escreveu:
> * Rusty Russell <[email protected]> wrote:
>
> and just in case it's not obvious: i am not arguing for the inclusion of
> the patch, i'm just pointing out the plain fact that in the case where
> the TSC _is_ reliable, 5 different clocksource drivers for has obvious
> disadvantages. Anyone arguing against that simple point needs his head
> examined :) Once we can pass around calibration information from the
> host to the guest (which we dont do at the moment) there will be reason
> not to use the native clocksource driver in the guest.
If you sustain that we cannot have a reliable synchronization test
mechanism, neither do I. All this is based in the assumption that a bad
tsc will fail such tests.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org
iD8DBQFHJyAHjYI8LaFUWXMRAtYoAJ9l5kNodRLfTsNHDvyioufM7SQzTACfarEy
XXq3sDq9uxZ/72hhA46YmgM=
=DwN/
-----END PGP SIGNATURE-----
On Tue, 2007-10-30 at 10:02 -0200, Glauber de Oliveira Costa wrote:
> > No, if no paravirt clocksource is detected, nothing can override the
> > perfect TSC hardware clocksource rating of 400. And if a paravirt
> > clocksource is detected, it is always better than TSC.
>
> Why always? tsc is the best possible alternative the _platform_ can
> provide. So the paravirt clocksource will be at best, as good as tsc.
What is the justification for this claim? The TSC provides no
information about frequency. The frequency must be measured from a
known fixed rate source. This measurement is error prone, both on real
hardware, and even more so in a VM.
The TSC can only provide a single measure of elapsed periods. It has no
concept to differentiate between elapsed periods of time spent in the
guest, which is relevant to the scheduler, and elapsed periods of time
spent in real time, which is relevant to timekeeping.
So a TSC can not drive both timekeeping and scheduling in a virtual
machine, or if it does, it does so inaccurately.
> And if it is the case: why bother with communication mechanisms of all
> kinds, calculations, etc, if we can just read the tsc?
>
> Noting again: If the tsc does not arrive accurate to the guest, it will
> fail the tsc clocksource tests. So it will be rated zero anyway
You always need to provide a TSC, and it will always initially appear to
be accurate. Long term, as time is reported via the TSC is an
approximation somewhere between real time and elapsed running time,
measurements made using it in a VM will drift, and it is impossible to
remove the inaccuracy from one end of the spectrum (by providing real
time exactly) without compromising it at the other end (by skewing
relative time measurement).
A paravirt clocksource is always better than a TSC because it
compensates for these effects.
Zach