2008-10-07 22:41:51

by Jeff Hansen

[permalink] [raw]
Subject: x86_32 tsc/pit and hrtimers

Linus, Ingo, All,

I've been struggling with hrtimer support in 2.6.26.5 on an older
x86_32/i386 system, and I'm wondering if there are any easy fixes that you
(or anyone else) would suggest.

Basically, this system does not print out the message:

"Switched to high resolution mode on CPU 0"

indicating that one-shot, hrtimers, etc. won't work, since high resolution
mode has not been enabled. I've verified that hrtimers started with
hrtimer_start do not have the expected resolution further than 1/HZ.

This system does not have LAPIC, ACPI, or HPET, so really the only
clocksources I can use are TSC and PIT. This should be fine (in theory,
unless it wasn't designed like that), but apparently the clocksource flags
are not initialized in such a way that one of them ever gets marked as
CLOCK_SOURCE_VALID_FOR_HRES.

The flow of the flags on each of these clocksources is as follows:

1) The flags on the TSC clocksource are CLOCK_SOURCE_IS_CONTINUOUS |
CLOCK_SOURCE_MUST_VERIFY, which causes PIT to be used as the watchdog
clocksource. (see kernel/time/clocksource.c:~171)
2) Around line 122 in kernel/time/clocksource.c, where most clocksources'
flags usually get ORed with CLOCK_SOURCE_VALID_FOR_HRES, the PIT's do
not because it is not CLOCK_SOURCE_IS_CONTINUOUS, and the TSC's do not
also because the PIT (as the watchdog) is not
CLOCK_SOURCE_IS_CONTINUOUS.

I get the same results on a new laptop booting into 32-bit Linux with hpet
and acpi disabled.

Can you please tell me if this is supposed to work, and I just have a
poorly configured kernel; or if TSC/PIT drivers were not designed to work
this way in the first place. If it wasn't designed to do this, do you
have any tips on implementing this, since I'll be needing to do that?

-Jeff Hansen

---------------------------------------------------
"If someone's gotta do it, it might as well be me."


2008-10-08 18:41:25

by Chris Snook

[permalink] [raw]
Subject: Re: x86_32 tsc/pit and hrtimers

Jeff Hansen wrote:
> Linus, Ingo, All,
>
> I've been struggling with hrtimer support in 2.6.26.5 on an older
> x86_32/i386 system, and I'm wondering if there are any easy fixes that you
> (or anyone else) would suggest.
>
> Basically, this system does not print out the message:
>
> "Switched to high resolution mode on CPU 0"
>
> indicating that one-shot, hrtimers, etc. won't work, since high resolution
> mode has not been enabled. I've verified that hrtimers started with
> hrtimer_start do not have the expected resolution further than 1/HZ.
>
> This system does not have LAPIC, ACPI, or HPET, so really the only
> clocksources I can use are TSC and PIT. This should be fine (in theory,
> unless it wasn't designed like that), but apparently the clocksource flags
> are not initialized in such a way that one of them ever gets marked as
> CLOCK_SOURCE_VALID_FOR_HRES.
>
> The flow of the flags on each of these clocksources is as follows:
>
> 1) The flags on the TSC clocksource are CLOCK_SOURCE_IS_CONTINUOUS |
> CLOCK_SOURCE_MUST_VERIFY, which causes PIT to be used as the watchdog
> clocksource. (see kernel/time/clocksource.c:~171)
> 2) Around line 122 in kernel/time/clocksource.c, where most clocksources'
> flags usually get ORed with CLOCK_SOURCE_VALID_FOR_HRES, the PIT's do
> not because it is not CLOCK_SOURCE_IS_CONTINUOUS, and the TSC's do not
> also because the PIT (as the watchdog) is not
> CLOCK_SOURCE_IS_CONTINUOUS.
>
> I get the same results on a new laptop booting into 32-bit Linux with hpet
> and acpi disabled.
>
> Can you please tell me if this is supposed to work, and I just have a
> poorly configured kernel; or if TSC/PIT drivers were not designed to work
> this way in the first place. If it wasn't designed to do this, do you
> have any tips on implementing this, since I'll be needing to do that?
>
> -Jeff Hansen

This is not supposed to work, but it might be worthwhile to add a boot option to
force the kernel to trust the TSC, as hardware that lacks any high-res timers
also tends to be primitive enough that the TSC can be trusted, if it exists. If
you patch out the CLOCK_SOURCE_MUST_VERIFY flag on the TSC, do you get
correctly-functioning high-res timers on this system?

-- Chris

2008-10-08 19:46:33

by Jeff Hansen

[permalink] [raw]
Subject: Re: x86_32 tsc/pit and hrtimers

Chris,

This worked perfectly! I second adding a kernel option that forces
trusting the TSC. I can make a patch if you'd like. Should the option be
something like "trusttsc" or "tsc=noverify"?

-Jeff

---------------------------------------------------
"If someone's gotta do it, it might as well be me."

On Wed, 8 Oct 2008, Chris Snook wrote:

> Jeff Hansen wrote:
>> Linus, Ingo, All,
>>
>> I've been struggling with hrtimer support in 2.6.26.5 on an older
>> x86_32/i386 system, and I'm wondering if there are any easy fixes that you
>> (or anyone else) would suggest.
>>
>> Basically, this system does not print out the message:
>>
>> "Switched to high resolution mode on CPU 0"
>>
>> indicating that one-shot, hrtimers, etc. won't work, since high resolution
>> mode has not been enabled. I've verified that hrtimers started with
>> hrtimer_start do not have the expected resolution further than 1/HZ.
>>
>> This system does not have LAPIC, ACPI, or HPET, so really the only
>> clocksources I can use are TSC and PIT. This should be fine (in theory,
>> unless it wasn't designed like that), but apparently the clocksource flags
>> are not initialized in such a way that one of them ever gets marked as
>> CLOCK_SOURCE_VALID_FOR_HRES.
>>
>> The flow of the flags on each of these clocksources is as follows:
>>
>> 1) The flags on the TSC clocksource are CLOCK_SOURCE_IS_CONTINUOUS |
>> CLOCK_SOURCE_MUST_VERIFY, which causes PIT to be used as the watchdog
>> clocksource. (see kernel/time/clocksource.c:~171)
>> 2) Around line 122 in kernel/time/clocksource.c, where most clocksources'
>> flags usually get ORed with CLOCK_SOURCE_VALID_FOR_HRES, the PIT's do
>> not because it is not CLOCK_SOURCE_IS_CONTINUOUS, and the TSC's do not
>> also because the PIT (as the watchdog) is not
>> CLOCK_SOURCE_IS_CONTINUOUS.
>>
>> I get the same results on a new laptop booting into 32-bit Linux with hpet
>> and acpi disabled.
>>
>> Can you please tell me if this is supposed to work, and I just have a
>> poorly configured kernel; or if TSC/PIT drivers were not designed to work
>> this way in the first place. If it wasn't designed to do this, do you
>> have any tips on implementing this, since I'll be needing to do that?
>>
>> -Jeff Hansen
>
> This is not supposed to work, but it might be worthwhile to add a boot option
> to force the kernel to trust the TSC, as hardware that lacks any high-res
> timers also tends to be primitive enough that the TSC can be trusted, if it
> exists. If you patch out the CLOCK_SOURCE_MUST_VERIFY flag on the TSC, do
> you get correctly-functioning high-res timers on this system?
>
> -- Chris
>
>
>

2008-10-08 20:25:15

by Chris Snook

[permalink] [raw]
Subject: Re: x86_32 tsc/pit and hrtimers

Jeff Hansen wrote:
> This worked perfectly! I second adding a kernel option that forces
> trusting the TSC. I can make a patch if you'd like. Should the option
> be something like "trusttsc" or "tsc=noverify"?

How about a "highres=noverify" flag, to disable all CLOCK_SOURCE_MUST_VERIFY
checking? People might want to use this with other timers too.

-- Chris

> On Wed, 8 Oct 2008, Chris Snook wrote:
>
>> Jeff Hansen wrote:
>>> Linus, Ingo, All,
>>>
>>> I've been struggling with hrtimer support in 2.6.26.5 on an older
>>> x86_32/i386 system, and I'm wondering if there are any easy fixes
>>> that you
>>> (or anyone else) would suggest.
>>>
>>> Basically, this system does not print out the message:
>>>
>>> "Switched to high resolution mode on CPU 0"
>>>
>>> indicating that one-shot, hrtimers, etc. won't work, since high
>>> resolution
>>> mode has not been enabled. I've verified that hrtimers started with
>>> hrtimer_start do not have the expected resolution further than 1/HZ.
>>>
>>> This system does not have LAPIC, ACPI, or HPET, so really the only
>>> clocksources I can use are TSC and PIT. This should be fine (in
>>> theory,
>>> unless it wasn't designed like that), but apparently the clocksource
>>> flags
>>> are not initialized in such a way that one of them ever gets marked as
>>> CLOCK_SOURCE_VALID_FOR_HRES.
>>>
>>> The flow of the flags on each of these clocksources is as follows:
>>>
>>> 1) The flags on the TSC clocksource are CLOCK_SOURCE_IS_CONTINUOUS |
>>> CLOCK_SOURCE_MUST_VERIFY, which causes PIT to be used as the
>>> watchdog
>>> clocksource. (see kernel/time/clocksource.c:~171)
>>> 2) Around line 122 in kernel/time/clocksource.c, where most
>>> clocksources'
>>> flags usually get ORed with CLOCK_SOURCE_VALID_FOR_HRES, the
>>> PIT's do
>>> not because it is not CLOCK_SOURCE_IS_CONTINUOUS, and the TSC's
>>> do not
>>> also because the PIT (as the watchdog) is not
>>> CLOCK_SOURCE_IS_CONTINUOUS.
>>>
>>> I get the same results on a new laptop booting into 32-bit Linux
>>> with hpet
>>> and acpi disabled.
>>>
>>> Can you please tell me if this is supposed to work, and I just have a
>>> poorly configured kernel; or if TSC/PIT drivers were not designed to
>>> work
>>> this way in the first place. If it wasn't designed to do this, do you
>>> have any tips on implementing this, since I'll be needing to do that?
>>>
>>> -Jeff Hansen
>>
>> This is not supposed to work, but it might be worthwhile to add a boot
>> option to force the kernel to trust the TSC, as hardware that lacks
>> any high-res timers also tends to be primitive enough that the TSC can
>> be trusted, if it exists. If you patch out the
>> CLOCK_SOURCE_MUST_VERIFY flag on the TSC, do you get
>> correctly-functioning high-res timers on this system?
>>
>> -- Chris
>>
>>
>>

2008-10-08 21:43:59

by Jeff Hansen

[permalink] [raw]
Subject: [PATCH] Re: x86_32 tsc/pit and hrtimers

[HRTIMER]: Add highres=noverify option to bypass clocksource verification

This disregards the CLOCK_SOURCE_MUST_VERIFY flag on all clocksources.
This is particularly useful on legacy x86_32 systems that have no ACPI,
LAPIC, or HPET timers, where only TSC and PIT are available.

Thanks to Chris Snook for suggesting this.
---
include/linux/clocksource.h | 2 ++
kernel/hrtimer.c | 2 ++
kernel/time/clocksource.c | 3 ++-
3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 55e434f..90ae835 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -104,6 +104,8 @@ extern struct clocksource *clock; /* current clocksource */
#define CLOCK_SOURCE_WATCHDOG 0x10
#define CLOCK_SOURCE_VALID_FOR_HRES 0x20

+extern int clocksource_noverify;
+
/* simplify initialization of mask field */
#define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index ab80515..2fdf59f 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -476,6 +476,8 @@ static int __init setup_hrtimer_hres(char *str)
hrtimer_hres_enabled = 0;
else if (!strcmp(str, "on"))
hrtimer_hres_enabled = 1;
+ else if (!strcmp(str, "noverify"))
+ clocksource_noverify = 1;
else
return 0;
return 1;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index dadde53..d3c14c1 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -54,6 +54,7 @@ static LIST_HEAD(clocksource_list);
static DEFINE_SPINLOCK(clocksource_lock);
static char override_name[32];
static int finished_booting;
+int clocksource_noverify;

/* clocksource_done_booting - Called near the end of core bootup
*
@@ -165,7 +166,7 @@ static void clocksource_check_watchdog(struct clocksource *cs)
unsigned long flags;

spin_lock_irqsave(&watchdog_lock, flags);
- if (cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
+ if (!clocksource_noverify && cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
int started = !list_empty(&watchdog_list);

list_add(&cs->wd_list, &watchdog_list);
--
1.5.6.4

On Wed, 8 Oct 2008, Chris Snook wrote:

> Jeff Hansen wrote:
>> This worked perfectly! I second adding a kernel option that forces
>> trusting the TSC. I can make a patch if you'd like. Should the option be
>> something like "trusttsc" or "tsc=noverify"?
>
> How about a "highres=noverify" flag, to disable all CLOCK_SOURCE_MUST_VERIFY
> checking? People might want to use this with other timers too.
>
> -- Chris
>
>> On Wed, 8 Oct 2008, Chris Snook wrote:
>>
>> > Jeff Hansen wrote:
>> > > Linus, Ingo, All,
>> > >
>> > > I've been struggling with hrtimer support in 2.6.26.5 on an older
>> > > x86_32/i386 system, and I'm wondering if there are any easy fixes
>> > > that you
>> > > (or anyone else) would suggest.
>> > >
>> > > Basically, this system does not print out the message:
>> > >
>> > > "Switched to high resolution mode on CPU 0"
>> > >
>> > > indicating that one-shot, hrtimers, etc. won't work, since high
>> > > resolution
>> > > mode has not been enabled. I've verified that hrtimers started with
>> > > hrtimer_start do not have the expected resolution further than 1/HZ.
>> > >
>> > > This system does not have LAPIC, ACPI, or HPET, so really the only
>> > > clocksources I can use are TSC and PIT. This should be fine (in
>> > > theory,
>> > > unless it wasn't designed like that), but apparently the clocksource
>> > > flags
>> > > are not initialized in such a way that one of them ever gets marked
>> > > as
>> > > CLOCK_SOURCE_VALID_FOR_HRES.
>> > >
>> > > The flow of the flags on each of these clocksources is as follows:
>> > >
>> > > 1) The flags on the TSC clocksource are CLOCK_SOURCE_IS_CONTINUOUS |
>> > > CLOCK_SOURCE_MUST_VERIFY, which causes PIT to be used as the
>> > > watchdog
>> > > clocksource. (see kernel/time/clocksource.c:~171)
>> > > 2) Around line 122 in kernel/time/clocksource.c, where most
>> > > clocksources'
>> > > flags usually get ORed with CLOCK_SOURCE_VALID_FOR_HRES, the
>> > > PIT's do
>> > > not because it is not CLOCK_SOURCE_IS_CONTINUOUS, and the TSC's
>> > > do not
>> > > also because the PIT (as the watchdog) is not
>> > > CLOCK_SOURCE_IS_CONTINUOUS.
>> > >
>> > > I get the same results on a new laptop booting into 32-bit Linux with
>> > > hpet
>> > > and acpi disabled.
>> > >
>> > > Can you please tell me if this is supposed to work, and I just have a
>> > > poorly configured kernel; or if TSC/PIT drivers were not designed to
>> > > work
>> > > this way in the first place. If it wasn't designed to do this, do
>> > > you
>> > > have any tips on implementing this, since I'll be needing to do that?
>> > >
>> > > -Jeff Hansen
>> >
>> > This is not supposed to work, but it might be worthwhile to add a boot
>> > option to force the kernel to trust the TSC, as hardware that lacks any
>> > high-res timers also tends to be primitive enough that the TSC can be
>> > trusted, if it exists. If you patch out the CLOCK_SOURCE_MUST_VERIFY
>> > flag on the TSC, do you get correctly-functioning high-res timers on
>> > this system?
>> >
>> > -- Chris
>> >
>> >
>> >
>
>
>

2008-10-08 21:47:56

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

On Wed, 8 Oct 2008, Jeff Hansen wrote:

> [HRTIMER]: Add highres=noverify option to bypass clocksource verification
>
> This disregards the CLOCK_SOURCE_MUST_VERIFY flag on all clocksources.
> This is particularly useful on legacy x86_32 systems that have no ACPI,
> LAPIC, or HPET timers, where only TSC and PIT are available.
>
> Thanks to Chris Snook for suggesting this.
> ---
> include/linux/clocksource.h | 2 ++
> kernel/hrtimer.c | 2 ++
> kernel/time/clocksource.c | 3 ++-
> 3 files changed, 6 insertions(+), 1 deletions(-)

Oops, missing update to Documentation/kernel-parameters.txt.
Please see Documentation/SubmitChecklist :)
Thanks.

> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 55e434f..90ae835 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -104,6 +104,8 @@ extern struct clocksource *clock; /* current clocksource
> */
> #define CLOCK_SOURCE_WATCHDOG 0x10
> #define CLOCK_SOURCE_VALID_FOR_HRES 0x20
>
> +extern int clocksource_noverify;
> +
> /* simplify initialization of mask field */
> #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) :
> -1)
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index ab80515..2fdf59f 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -476,6 +476,8 @@ static int __init setup_hrtimer_hres(char *str)
> hrtimer_hres_enabled = 0;
> else if (!strcmp(str, "on"))
> hrtimer_hres_enabled = 1;
> + else if (!strcmp(str, "noverify"))
> + clocksource_noverify = 1;
> else
> return 0;
> return 1;
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index dadde53..d3c14c1 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -54,6 +54,7 @@ static LIST_HEAD(clocksource_list);
> static DEFINE_SPINLOCK(clocksource_lock);
> static char override_name[32];
> static int finished_booting;
> +int clocksource_noverify;
>
> /* clocksource_done_booting - Called near the end of core bootup
> *
> @@ -165,7 +166,7 @@ static void clocksource_check_watchdog(struct clocksource
> *cs)
> unsigned long flags;
>
> spin_lock_irqsave(&watchdog_lock, flags);
> - if (cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
> + if (!clocksource_noverify && cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
> int started = !list_empty(&watchdog_list);
>
> list_add(&cs->wd_list, &watchdog_list);
>

--
~Randy

2008-10-08 21:48:21

by Chris Snook

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

Jeff Hansen wrote:
> [HRTIMER]: Add highres=noverify option to bypass clocksource verification
>
> This disregards the CLOCK_SOURCE_MUST_VERIFY flag on all clocksources.
> This is particularly useful on legacy x86_32 systems that have no ACPI,
> LAPIC, or HPET timers, where only TSC and PIT are available.
>
> Thanks to Chris Snook for suggesting this.
> ---
> include/linux/clocksource.h | 2 ++
> kernel/hrtimer.c | 2 ++
> kernel/time/clocksource.c | 3 ++-
> 3 files changed, 6 insertions(+), 1 deletions(-)

Please also update Documentation/kernel-parameters.txt, and resubmit with your
Signed-off-by tag.

-- Chris

> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 55e434f..90ae835 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -104,6 +104,8 @@ extern struct clocksource *clock; /* current
> clocksource */
> #define CLOCK_SOURCE_WATCHDOG 0x10
> #define CLOCK_SOURCE_VALID_FOR_HRES 0x20
>
> +extern int clocksource_noverify;
> +
> /* simplify initialization of mask field */
> #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ?
> ((1ULL<<(bits))-1) : -1)
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index ab80515..2fdf59f 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -476,6 +476,8 @@ static int __init setup_hrtimer_hres(char *str)
> hrtimer_hres_enabled = 0;
> else if (!strcmp(str, "on"))
> hrtimer_hres_enabled = 1;
> + else if (!strcmp(str, "noverify"))
> + clocksource_noverify = 1;
> else
> return 0;
> return 1;
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index dadde53..d3c14c1 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -54,6 +54,7 @@ static LIST_HEAD(clocksource_list);
> static DEFINE_SPINLOCK(clocksource_lock);
> static char override_name[32];
> static int finished_booting;
> +int clocksource_noverify;
>
> /* clocksource_done_booting - Called near the end of core bootup
> *
> @@ -165,7 +166,7 @@ static void clocksource_check_watchdog(struct
> clocksource *cs)
> unsigned long flags;
>
> spin_lock_irqsave(&watchdog_lock, flags);
> - if (cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
> + if (!clocksource_noverify && cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
> int started = !list_empty(&watchdog_list);
>
> list_add(&cs->wd_list, &watchdog_list);

2008-10-08 21:56:19

by Jeff Hansen

[permalink] [raw]
Subject: [PATCH] Re: x86_32 tsc/pit and hrtimers

[HRTIMER]: Add highres=noverify option to bypass clocksource verification

This disregards the CLOCK_SOURCE_MUST_VERIFY flag on all clocksources.
This is particularly useful on legacy x86_32 systems that have no ACPI,
LAPIC, or HPET timers, where only TSC and PIT are available.

Thanks to Chris Snook for suggesting this.

Signed-off-by: Jeff Hansen <[email protected]>
---
Documentation/kernel-parameters.txt | 7 ++++---
include/linux/clocksource.h | 2 ++
kernel/hrtimer.c | 2 ++
kernel/time/clocksource.c | 3 ++-
4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b52f47d..9611505 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -730,9 +730,10 @@ and is between 256 and 4096 characters. It is defined in the file
highmem otherwise. This also works to reduce highmem
size on bigger boxes.

- highres= [KNL] Enable/disable high resolution timer mode.
- Valid parameters: "on", "off"
- Default: "on"
+ highres= [KNL] Modify high resolution timer parameters.
+ highres=on: Enable high-resolution timer mode (default)
+ highres=off: Disable high-resolution timer mode
+ highres=noverify: Assume all clocksources are stable

hisax= [HW,ISDN]
See Documentation/isdn/README.HiSax.
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 55e434f..90ae835 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -104,6 +104,8 @@ extern struct clocksource *clock; /* current clocksource */
#define CLOCK_SOURCE_WATCHDOG 0x10
#define CLOCK_SOURCE_VALID_FOR_HRES 0x20

+extern int clocksource_noverify;
+
/* simplify initialization of mask field */
#define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index ab80515..2fdf59f 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -476,6 +476,8 @@ static int __init setup_hrtimer_hres(char *str)
hrtimer_hres_enabled = 0;
else if (!strcmp(str, "on"))
hrtimer_hres_enabled = 1;
+ else if (!strcmp(str, "noverify"))
+ clocksource_noverify = 1;
else
return 0;
return 1;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index dadde53..d3c14c1 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -54,6 +54,7 @@ static LIST_HEAD(clocksource_list);
static DEFINE_SPINLOCK(clocksource_lock);
static char override_name[32];
static int finished_booting;
+int clocksource_noverify;

/* clocksource_done_booting - Called near the end of core bootup
*
@@ -165,7 +166,7 @@ static void clocksource_check_watchdog(struct clocksource *cs)
unsigned long flags;

spin_lock_irqsave(&watchdog_lock, flags);
- if (cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
+ if (!clocksource_noverify && cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
int started = !list_empty(&watchdog_list);

list_add(&cs->wd_list, &watchdog_list);
--
1.5.6.4

On Wed, 8 Oct 2008, Chris Snook wrote:

> Jeff Hansen wrote:
>> [HRTIMER]: Add highres=noverify option to bypass clocksource verification
>>
>> This disregards the CLOCK_SOURCE_MUST_VERIFY flag on all clocksources.
>> This is particularly useful on legacy x86_32 systems that have no ACPI,
>> LAPIC, or HPET timers, where only TSC and PIT are available.
>>
>> Thanks to Chris Snook for suggesting this.
>> ---
>> include/linux/clocksource.h | 2 ++
>> kernel/hrtimer.c | 2 ++
>> kernel/time/clocksource.c | 3 ++-
>> 3 files changed, 6 insertions(+), 1 deletions(-)
>
> Please also update Documentation/kernel-parameters.txt, and resubmit with
> your Signed-off-by tag.
>
> -- Chris
>
>> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
>> index 55e434f..90ae835 100644
>> --- a/include/linux/clocksource.h
>> +++ b/include/linux/clocksource.h
>> @@ -104,6 +104,8 @@ extern struct clocksource *clock; /* current
>> clocksource */
>> #define CLOCK_SOURCE_WATCHDOG 0x10
>> #define CLOCK_SOURCE_VALID_FOR_HRES 0x20
>>
>> +extern int clocksource_noverify;
>> +
>> /* simplify initialization of mask field */
>> #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ?
>> ((1ULL<<(bits))-1) : -1)
>>
>> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
>> index ab80515..2fdf59f 100644
>> --- a/kernel/hrtimer.c
>> +++ b/kernel/hrtimer.c
>> @@ -476,6 +476,8 @@ static int __init setup_hrtimer_hres(char *str)
>> hrtimer_hres_enabled = 0;
>> else if (!strcmp(str, "on"))
>> hrtimer_hres_enabled = 1;
>> + else if (!strcmp(str, "noverify"))
>> + clocksource_noverify = 1;
>> else
>> return 0;
>> return 1;
>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>> index dadde53..d3c14c1 100644
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -54,6 +54,7 @@ static LIST_HEAD(clocksource_list);
>> static DEFINE_SPINLOCK(clocksource_lock);
>> static char override_name[32];
>> static int finished_booting;
>> +int clocksource_noverify;
>>
>> /* clocksource_done_booting - Called near the end of core bootup
>> *
>> @@ -165,7 +166,7 @@ static void clocksource_check_watchdog(struct
>> clocksource *cs)
>> unsigned long flags;
>>
>> spin_lock_irqsave(&watchdog_lock, flags);
>> - if (cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
>> + if (!clocksource_noverify && cs->flags & CLOCK_SOURCE_MUST_VERIFY) {
>> int started = !list_empty(&watchdog_list);
>>
>> list_add(&cs->wd_list, &watchdog_list);
>
>
>

2008-10-09 07:15:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

On Wed, 8 Oct 2008, Jeff Hansen wrote:

> [HRTIMER]: Add highres=noverify option to bypass clocksource verification
>
> This disregards the CLOCK_SOURCE_MUST_VERIFY flag on all clocksources.
> This is particularly useful on legacy x86_32 systems that have no ACPI,
> LAPIC, or HPET timers, where only TSC and PIT are available.

While I agree in principle, adding this to highres is the wrong thing
to do. This is a property of clocksources and not of high resolution
timer support.

The affected clocksource is TSC and it should go there as a command
line option e.g. tsc=stable, which in turn clears the
CLOCK_SOURCE_MUST_VERIFY flag in the tsc clocksource.

Thanks,

tglx

2008-10-09 17:20:29

by Pavel Machek

[permalink] [raw]
Subject: Re: x86_32 tsc/pit and hrtimers

Hi!

>> Can you please tell me if this is supposed to work, and I just have a
>> poorly configured kernel; or if TSC/PIT drivers were not designed to work
>> this way in the first place. If it wasn't designed to do this, do you
>> have any tips on implementing this, since I'll be needing to do that?
>>
> This is not supposed to work, but it might be worthwhile to add a boot
> option to force the kernel to trust the TSC, as hardware that lacks any
> high-res timers also tends to be primitive enough that the TSC can be
> trusted, if it exists. If you patch out the CLOCK_SOURCE_MUST_VERIFY
> flag on the TSC, do you get correctly-functioning high-res timers on this
> system?

Untrue. Unstable tsc dates back to pentium MMX times.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-10-09 18:40:34

by Chris Snook

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

Thomas Gleixner wrote:
> On Wed, 8 Oct 2008, Jeff Hansen wrote:
>
>> [HRTIMER]: Add highres=noverify option to bypass clocksource verification
>>
>> This disregards the CLOCK_SOURCE_MUST_VERIFY flag on all clocksources.
>> This is particularly useful on legacy x86_32 systems that have no ACPI,
>> LAPIC, or HPET timers, where only TSC and PIT are available.
>
> While I agree in principle, adding this to highres is the wrong thing
> to do. This is a property of clocksources and not of high resolution
> timer support.
>
> The affected clocksource is TSC and it should go there as a command
> line option e.g. tsc=stable, which in turn clears the
> CLOCK_SOURCE_MUST_VERIFY flag in the tsc clocksource.
>
> Thanks,
>
> tglx

Fair enough, but do you think it's worthwhile to have an option to
disable CLOCK_SOURCE_MUST_VERIFY checking for all timers, or should we
implement this on a case-by-case basis when there's legitimate cause,
like with TSC?

-- Chris

2008-10-09 19:20:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

On Thu, 9 Oct 2008, Chris Snook wrote:
>
> Fair enough, but do you think it's worthwhile to have an option to disable
> CLOCK_SOURCE_MUST_VERIFY checking for all timers, or should we implement this
> on a case-by-case basis when there's legitimate cause, like with TSC?

Case-by-case is preferred.

Thanks,

tglx

2008-10-09 19:45:35

by Jeff Hansen

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

OK, so are we all agreed that something like clocksource_trust=tsc
would be the best?

-Jeff

---------------------------------------------------
"If someone's gotta do it, it might as well be me."

On Thu, 9 Oct 2008, Thomas Gleixner wrote:

> On Thu, 9 Oct 2008, Chris Snook wrote:
>>
>> Fair enough, but do you think it's worthwhile to have an option to disable
>> CLOCK_SOURCE_MUST_VERIFY checking for all timers, or should we implement this
>> on a case-by-case basis when there's legitimate cause, like with TSC?
>
> Case-by-case is preferred.
>
> Thanks,
>
> tglx
>
>

2008-10-09 19:54:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

On Thu, 9 Oct 2008, Jeff Hansen wrote:

> OK, so are we all agreed that something like clocksource_trust=tsc would be
> the best?

No, it's per affected device: tsc=trust or tsc=stable or whatever
unintuitive name we want to come up. And it is a modification to TSC
not to the clocksource layer.

Thanks,

tglx

2008-10-09 20:46:04

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

On Thu, Oct 9, 2008 at 12:53 PM, Thomas Gleixner <[email protected]> wrote:
> On Thu, 9 Oct 2008, Jeff Hansen wrote:
>
>> OK, so are we all agreed that something like clocksource_trust=tsc would be
>> the best?
>
> No, it's per affected device: tsc=trust or tsc=stable or whatever
> unintuitive name we want to come up. And it is a modification to TSC
> not to the clocksource layer.

Yep, this is cool. I too have a patch in my local tree which does a
similar thing i have a tsc_reliable flag which is set right now only
when we are running under a VMware hypervisor.
Along with marking the no_verify flag for TSC, this patch of mine also
skips the TSC synchornization checks.

The TSC synchronization loop which is run whenever a new cpu is
brought up is not actually needed on systems which are known to have a
reliable TSC. TSC between 2 cpus can be off by a marginal value on such
systems and thats okay for timekeeping, since we do check for tsc going
back in read_tsc.

Can this reasoning be included and synchronization skipped for all
these systems with reliable aka trustworthy TSC's ?

Thanks,
Alok
>
> Thanks,
>
> tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-10-09 21:05:26

by Chris Snook

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

Alok kataria wrote:
> On Thu, Oct 9, 2008 at 12:53 PM, Thomas Gleixner <[email protected]> wrote:
>> On Thu, 9 Oct 2008, Jeff Hansen wrote:
>>
>>> OK, so are we all agreed that something like clocksource_trust=tsc would be
>>> the best?
>> No, it's per affected device: tsc=trust or tsc=stable or whatever
>> unintuitive name we want to come up. And it is a modification to TSC
>> not to the clocksource layer.
>
> Yep, this is cool. I too have a patch in my local tree which does a
> similar thing i have a tsc_reliable flag which is set right now only
> when we are running under a VMware hypervisor.
> Along with marking the no_verify flag for TSC, this patch of mine also
> skips the TSC synchornization checks.
>
> The TSC synchronization loop which is run whenever a new cpu is
> brought up is not actually needed on systems which are known to have a
> reliable TSC. TSC between 2 cpus can be off by a marginal value on such
> systems and thats okay for timekeeping, since we do check for tsc going
> back in read_tsc.
>
> Can this reasoning be included and synchronization skipped for all
> these systems with reliable aka trustworthy TSC's ?

In general, no. Not all hardware/hypervisors behave this way, even when the TSC
is otherwise stable once synchronized.

-- Chris

2008-10-09 21:18:33

by Jeff Hansen

[permalink] [raw]
Subject: [PATCH] Re: x86_32 tsc/pit and hrtimers

[X86] Add tsc=stable option for marking TSC as stable

This enables legacy hardware or VMs without HPET, LAPIC, or ACPI
timers to enter high-resolution timer mode.

Signed-off-by: Jeff Hansen <[email protected]>
---
Documentation/kernel-parameters.txt | 4 ++++
arch/x86/kernel/tsc_32.c | 12 +++++++++++-
arch/x86/kernel/tsc_64.c | 9 +++++++++
3 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 9611505..8488074 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2084,6 +2084,10 @@ and is between 256 and 4096 characters. It is defined in the file
Format:
<io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq>

+ tsc= [X86-32,X86-64]
+ tsc=stable: Mark TSC clocksource as stable, enabling
+ high-resolution timer mode on older hardware.
+
turbografx.map[2|3]= [HW,JOY]
TurboGraFX parallel port interface
Format:
diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
index 65b7063..f6e8f71 100644
--- a/arch/x86/kernel/tsc_32.c
+++ b/arch/x86/kernel/tsc_32.c
@@ -49,6 +49,17 @@ static int __init tsc_setup(char *str)

__setup("notsc", tsc_setup);

+static struct clocksource clocksource_tsc;
+
+static int __init tscx_setup(char *str)
+{
+ if (!strcmp(str, "stable"))
+ clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+ return 1;
+}
+
+__setup("tsc=", tscx_setup);
+
/*
* code to mark and check if the TSC is unstable
* due to cpufreq or due to unsynced TSCs
@@ -287,7 +298,6 @@ core_initcall(cpufreq_tsc);
/* clock source code */

static unsigned long current_tsc_khz;
-static struct clocksource clocksource_tsc;

/*
* We compare the TSC to the cycle_last value in the clocksource
diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
index 1784b80..4a3e555 100644
--- a/arch/x86/kernel/tsc_64.c
+++ b/arch/x86/kernel/tsc_64.c
@@ -291,6 +291,15 @@ __setup("notsc", notsc_setup);

static struct clocksource clocksource_tsc;

+static int __init tscx_setup(char *str)
+{
+ if (!strcmp(str, "stable"))
+ clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+ return 1;
+}
+
+__setup("tsc=", tscx_setup);
+
/*
* We compare the TSC to the cycle_last value in the clocksource
* structure to avoid a nasty time-warp. This can be observed in a
--
1.5.6.4

---------------------------------------------------
"If someone's gotta do it, it might as well be me."

On Thu, 9 Oct 2008, Chris Snook wrote:

> Alok kataria wrote:
>> On Thu, Oct 9, 2008 at 12:53 PM, Thomas Gleixner <[email protected]>
>> wrote:
>> > On Thu, 9 Oct 2008, Jeff Hansen wrote:
>> >
>> > > OK, so are we all agreed that something like clocksource_trust=tsc
>> > > would be
>> > > the best?
>> > No, it's per affected device: tsc=trust or tsc=stable or whatever
>> > unintuitive name we want to come up. And it is a modification to TSC
>> > not to the clocksource layer.
>>
>> Yep, this is cool. I too have a patch in my local tree which does a
>> similar thing i have a tsc_reliable flag which is set right now only
>> when we are running under a VMware hypervisor.
>> Along with marking the no_verify flag for TSC, this patch of mine also
>> skips the TSC synchornization checks.
>>
>> The TSC synchronization loop which is run whenever a new cpu is
>> brought up is not actually needed on systems which are known to have a
>> reliable TSC. TSC between 2 cpus can be off by a marginal value on such
>> systems and thats okay for timekeeping, since we do check for tsc going
>> back in read_tsc.
>>
>> Can this reasoning be included and synchronization skipped for all
>> these systems with reliable aka trustworthy TSC's ?
>
> In general, no. Not all hardware/hypervisors behave this way, even when the
> TSC is otherwise stable once synchronized.
>
> -- Chris
>
>
>

2008-10-09 21:53:50

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

On Thu, 2008-10-09 at 14:03 -0700, Chris Snook wrote:
> Alok kataria wrote:
> > On Thu, Oct 9, 2008 at 12:53 PM, Thomas Gleixner <[email protected]> wrote:
> >> On Thu, 9 Oct 2008, Jeff Hansen wrote:
> >>
> >>> OK, so are we all agreed that something like clocksource_trust=tsc would be
> >>> the best?
> >> No, it's per affected device: tsc=trust or tsc=stable or whatever
> >> unintuitive name we want to come up. And it is a modification to TSC
> >> not to the clocksource layer.
> >
> > Yep, this is cool. I too have a patch in my local tree which does a
> > similar thing i have a tsc_reliable flag which is set right now only
> > when we are running under a VMware hypervisor.
> > Along with marking the no_verify flag for TSC, this patch of mine also
> > skips the TSC synchornization checks.
> >
> > The TSC synchronization loop which is run whenever a new cpu is
> > brought up is not actually needed on systems which are known to have a
> > reliable TSC. TSC between 2 cpus can be off by a marginal value on such
> > systems and thats okay for timekeeping, since we do check for tsc going
> > back in read_tsc.
> >
> > Can this reasoning be included and synchronization skipped for all
> > these systems with reliable aka trustworthy TSC's ?
>
> In general, no. Not all hardware/hypervisors behave this way, even when the TSC
> is otherwise stable once synchronized.

I agree that in general this should be no, but since this is a
commandline variable it will be normally set for only those systems
which have only TSC as a option or know that the TSC is reliable.
wouldn't doing this be ok for such systems ?

Thanks,
Alok

>
> -- Chris

2008-10-09 22:03:46

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

On Thu, 2008-10-09 at 14:18 -0700, Jeff Hansen wrote:
> [X86] Add tsc=stable option for marking TSC as stable
>
> This enables legacy hardware or VMs without HPET, LAPIC, or ACPI
> timers to enter high-resolution timer mode.
>
> Signed-off-by: Jeff Hansen <[email protected]>
> ---
> Documentation/kernel-parameters.txt | 4 ++++
> arch/x86/kernel/tsc_32.c | 12 +++++++++++-
> arch/x86/kernel/tsc_64.c | 9 +++++++++
> 3 files changed, 24 insertions(+), 1 deletions(-)

Hi Jeff,

I see that these changes are not for the mainline tree. Do you plan to
do something of this sought for mainline (.28/tip) ?

My comments below are mainly for generalizing this stuff, don't know how
useful these (my comments) will be for 26.5


> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 9611505..8488074 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2084,6 +2084,10 @@ and is between 256 and 4096 characters. It is defined in the file
> Format:
> <io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq>
>
> + tsc= [X86-32,X86-64]
> + tsc=stable: Mark TSC clocksource as stable, enabling
> + high-resolution timer mode on older hardware.
> +
> turbografx.map[2|3]= [HW,JOY]
> TurboGraFX parallel port interface
> Format:
> diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> index 65b7063..f6e8f71 100644
> --- a/arch/x86/kernel/tsc_32.c
> +++ b/arch/x86/kernel/tsc_32.c
> @@ -49,6 +49,17 @@ static int __init tsc_setup(char *str)
>
> __setup("notsc", tsc_setup);
>
> +static struct clocksource clocksource_tsc;
> +
> +static int __init tscx_setup(char *str)
> +{
> + if (!strcmp(str, "stable"))
> + clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;

Should we add a flag here instead, say tsc_reliable ? or whatever else
you want to call that.

Also, there is code in check_geode_tsc_reliable which too sets this flag
for the TSC clocksource.
Maybe you can just set the tsc_reliable flag from tscx_setup and
check_geode_tsc_reliable, and then check "tsc_reliable" value and set
the verify flag in init_tsc_clocksource.

Thanks,
Alok


2008-10-09 22:51:50

by Chris Snook

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

Alok Kataria wrote:
> On Thu, 2008-10-09 at 14:03 -0700, Chris Snook wrote:
>> Alok kataria wrote:
>>> On Thu, Oct 9, 2008 at 12:53 PM, Thomas Gleixner <[email protected]> wrote:
>>>> On Thu, 9 Oct 2008, Jeff Hansen wrote:
>>>>
>>>>> OK, so are we all agreed that something like clocksource_trust=tsc would be
>>>>> the best?
>>>> No, it's per affected device: tsc=trust or tsc=stable or whatever
>>>> unintuitive name we want to come up. And it is a modification to TSC
>>>> not to the clocksource layer.
>>> Yep, this is cool. I too have a patch in my local tree which does a
>>> similar thing i have a tsc_reliable flag which is set right now only
>>> when we are running under a VMware hypervisor.
>>> Along with marking the no_verify flag for TSC, this patch of mine also
>>> skips the TSC synchornization checks.
>>>
>>> The TSC synchronization loop which is run whenever a new cpu is
>>> brought up is not actually needed on systems which are known to have a
>>> reliable TSC. TSC between 2 cpus can be off by a marginal value on such
>>> systems and thats okay for timekeeping, since we do check for tsc going
>>> back in read_tsc.
>>>
>>> Can this reasoning be included and synchronization skipped for all
>>> these systems with reliable aka trustworthy TSC's ?
>> In general, no. Not all hardware/hypervisors behave this way, even when the TSC
>> is otherwise stable once synchronized.
>
> I agree that in general this should be no, but since this is a
> commandline variable it will be normally set for only those systems
> which have only TSC as a option or know that the TSC is reliable.
> wouldn't doing this be ok for such systems ?

Hardware doesn't deliberately do any TSC synchronization, though you might get
it by accident in some configurations. A VMware guest gets it for free thanks
to the hypervisor doing it in software, but we need to run the check when we're
booting on bare metal. Jeff's patch enables the feature with as little risk as
possible. TSC sync is pretty cheap, so it's not worth it to add a more
dangerous special case just for guests of certain hypervisors.

-- Chris

2008-10-09 23:23:08

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

On Thu, 2008-10-09 at 15:50 -0700, Chris Snook wrote:
> Alok Kataria wrote:
> > On Thu, 2008-10-09 at 14:03 -0700, Chris Snook wrote:
> >
> > I agree that in general this should be no, but since this is a
> > commandline variable it will be normally set for only those systems
> > which have only TSC as a option or know that the TSC is reliable.
> > wouldn't doing this be ok for such systems ?
>
> Hardware doesn't deliberately do any TSC synchronization, though you might get
> it by accident in some configurations. A VMware guest gets it for free thanks
> to the hypervisor doing it in software, but we need to run the check when we're
> booting on bare metal.

The TSC sync algorithm right now expects that TSC are perfectly in sync
between cpus.
But, the hardware doesn't deliberately do any synchronization, so we can
have situations where TSC was (accidently ? )off by a marginal value
during boot and as a result we mark TSC as unstable and don't use it as
a clocksource at all. For systems like the ones Jeff is using wouldn't
that be a problem. IOW, even though the TSC was *marginally* off during
bootup it should still be used as a clocksource, since you have no other
option, no ?

> Jeff's patch enables the feature with as little risk as
> possible. TSC sync is pretty cheap, so it's not worth it to add a more
> dangerous special case just for guests of certain hypervisors.

I am not saying that we should go ahead and disable TSC sync for all
cases, we can very well do that just when running on a particular type
of system. The point here is to understand what should we do in a case
like above, on physical hardware.

Thanks,
Alok

> -- Chris

2008-10-09 23:38:40

by Chris Snook

[permalink] [raw]
Subject: Re: [PATCH] Re: x86_32 tsc/pit and hrtimers

Alok Kataria wrote:
> On Thu, 2008-10-09 at 15:50 -0700, Chris Snook wrote:
>> Alok Kataria wrote:
>>> On Thu, 2008-10-09 at 14:03 -0700, Chris Snook wrote:
>>>
>>> I agree that in general this should be no, but since this is a
>>> commandline variable it will be normally set for only those systems
>>> which have only TSC as a option or know that the TSC is reliable.
>>> wouldn't doing this be ok for such systems ?
>> Hardware doesn't deliberately do any TSC synchronization, though you might get
>> it by accident in some configurations. A VMware guest gets it for free thanks
>> to the hypervisor doing it in software, but we need to run the check when we're
>> booting on bare metal.
>
> The TSC sync algorithm right now expects that TSC are perfectly in sync
> between cpus.
> But, the hardware doesn't deliberately do any synchronization, so we can
> have situations where TSC was (accidently ? )off by a marginal value
> during boot and as a result we mark TSC as unstable and don't use it as
> a clocksource at all. For systems like the ones Jeff is using wouldn't
> that be a problem. IOW, even though the TSC was *marginally* off during
> bootup it should still be used as a clocksource, since you have no other
> option, no ?

You seem to be conflating position and rate. When we mark TSC as stable, we're
saying it will always advance at a known rate on all CPUs, but this says nothing
about the relative positions on the different CPUs. That skew can be huge on
some hardware, not just marginal, so we still need to synchronize them at boot
time, even though we don't need to (and can't, in this case) verify stability
with another continuous clock source.

-- Chris

2008-10-10 14:25:17

by Jeff Hansen

[permalink] [raw]
Subject: [PATCH] Re: x86_32 tsc/pit and hrtimers

This one is against 2.6.27.


[X86] Add tsc=stable option for marking TSC as stable

This enables legacy hardware or VMs without HPET, LAPIC, or ACPI timers
to enter high-resolution timer mode.

Signed-off-by: Jeff Hansen <[email protected]>
---
Documentation/kernel-parameters.txt | 4 ++++
arch/x86/kernel/tsc.c | 11 +++++++++++
2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1150444..0528bcb 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2174,6 +2174,10 @@ and is between 256 and 4096 characters. It is defined in the file
Format:
<io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq>

+ tsc= [X86-32,X86-64]
+ tsc=stable: Mark TSC clocksource as stable, enabling
+ high-resolution timer mode on older hardware.
+
turbografx.map[2|3]= [HW,JOY]
TurboGraFX parallel port interface
Format:
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8f98e9d..70e485e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -98,6 +98,17 @@ int __init notsc_setup(char *str)

__setup("notsc", notsc_setup);

+static struct clocksource clocksource_tsc;
+
+static int __init tscx_setup(char *str)
+{
+ if (!strcmp(str, "stable"))
+ clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+ return 1;
+}
+
+__setup("tsc=", tscx_setup);
+
#define MAX_RETRIES 5
#define SMI_TRESHOLD 50000

--
1.5.6.4

---------------------------------------------------
"If someone's gotta do it, it might as well be me."

On Thu, 9 Oct 2008, Chris Snook wrote:

> Alok Kataria wrote:
>> On Thu, 2008-10-09 at 15:50 -0700, Chris Snook wrote:
>> > Alok Kataria wrote:
>> > > On Thu, 2008-10-09 at 14:03 -0700, Chris Snook wrote:
>> > >
>> > > I agree that in general this should be no, but since this is a
>> > > commandline variable it will be normally set for only those systems
>> > > which have only TSC as a option or know that the TSC is reliable.
>> > > wouldn't doing this be ok for such systems ?
>> > Hardware doesn't deliberately do any TSC synchronization, though you
>> > might get
>> > it by accident in some configurations. A VMware guest gets it for free
>> > thanks
>> > to the hypervisor doing it in software, but we need to run the check
>> > when we're
>> > booting on bare metal.
>>
>> The TSC sync algorithm right now expects that TSC are perfectly in sync
>> between cpus.
>> But, the hardware doesn't deliberately do any synchronization, so we can
>> have situations where TSC was (accidently ? )off by a marginal value
>> during boot and as a result we mark TSC as unstable and don't use it as
>> a clocksource at all. For systems like the ones Jeff is using wouldn't
>> that be a problem. IOW, even though the TSC was *marginally* off during
>> bootup it should still be used as a clocksource, since you have no other
>> option, no ?
>
> You seem to be conflating position and rate. When we mark TSC as stable,
> we're saying it will always advance at a known rate on all CPUs, but this
> says nothing about the relative positions on the different CPUs. That skew
> can be huge on some hardware, not just marginal, so we still need to
> synchronize them at boot time, even though we don't need to (and can't, in
> this case) verify stability with another continuous clock source.
>
> -- Chris
>
>