I got it all cleaned up and moved over to the new Kconfig. I've also removed
the irq stack pieces that leaked from Alan's tree.
It includes the boot GDT stuff, added configuration options for the
kernel/timers directory so that things which can't maintain a TSC can turn it
off at compile time.
This is the diffstat over 2.5.46:
arch/i386/Kconfig | 52 +++++++++++++++++++++++-------
arch/i386/Makefile | 4 ++
arch/i386/boot/compressed/head.S | 8 ++--
arch/i386/boot/compressed/misc.c | 2 -
arch/i386/boot/setup.S | 56 +++++++++++++++++++++++++++-----
arch/i386/kernel/Makefile | 3 +
arch/i386/kernel/head.S | 22 +++++++++---
arch/i386/kernel/irq.c | 2 -
arch/i386/kernel/timers/Makefile | 6 +--
arch/i386/kernel/timers/timer.c | 4 +-
arch/i386/kernel/timers/timer_pit.c | 2 +
arch/i386/kernel/trampoline.S | 6 +--
arch/i386/mach-voyager/voyager_basic.c | 28 +++++++++++-----
arch/i386/mach-voyager/voyager_smp.c | 57 +++++++++-----------------------
-
drivers/char/sysrq.c | 18 ----------
include/asm-i386/desc.h | 1
include/asm-i386/hw_irq.h | 2 -
include/asm-i386/segment.h | 8 ++++
include/asm-i386/smp.h | 21 ++++++++----
include/asm-i386/voyager.h | 1
20 files changed, 188 insertions(+), 115 deletions(-)
The changes to smp.h are to introduce a new macro to loop efficiently over a
sparse CPU bitmap, and a bit of rearrangement for some functions voyager needs.
This compiles and boots correctly for me.
It's all uploaded to
http://linux-voyager.bkbits.net/voyager-2.5
James
On Tue, 2002-11-05 at 16:35, J.E.J. Bottomley wrote:
> It includes the boot GDT stuff, added configuration options for the
> kernel/timers directory so that things which can't maintain a TSC can turn it
> off at compile time.
Just a few comments on the CONFIG_X86_TSC changes:
> diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig
> --- a/arch/i386/Kconfig Tue Nov 5 15:35:01 2002
> +++ b/arch/i386/Kconfig Tue Nov 5 15:35:01 2002
> @@ -1636,17 +1649,32 @@
>
> source "lib/Kconfig"
>
> +config X86_TSC
> + bool
> + depends on !VOYAGER && (MWINCHIP3D || MWINCHIP2 || MCRUSOE || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMIII || M686 || M586MMX || M586TSC)
> + default y
> +
> +config X86_PIT
> + bool
> + depends on M386 || M486 || M586 || M586TSC || VOYAGER
> + default y
> +
I'm fine w/ the X86_TSC change, but I'd drop the X86_PIT for now.
Then make the arch/i386/timers/Makefile change to be something like:
obj-y := timer.o timer_tsc.o timer_pit.o
obj-$(CONFIG_X86_TSC) -= timer_pit.o #does this(-=) work?
obj-$(CONFIG_X86_CYCYLONE) += timer_cyclone.o
Then when you boot, boot w/ notsc and you should be fine.
I do want to add some sort of TSC blacklisting so one doesn't always
have to boot w/ notsc if your machine is
detectable/compiled-exclusively= for. But I've got a few other issues in
the queue first.
thanks
-john
On Wed, 2002-11-06 at 02:31, john stultz wrote:
> I'm fine w/ the X86_TSC change, but I'd drop the X86_PIT for now.
>
> Then make the arch/i386/timers/Makefile change to be something like:
>
> obj-y := timer.o timer_tsc.o timer_pit.o
> obj-$(CONFIG_X86_TSC) -= timer_pit.o #does this(-=) work?
> obj-$(CONFIG_X86_CYCYLONE) += timer_cyclone.o
Not everything is going to have a PIT. Also I need to know if there is a
PIT for a few other things so I'd prefer to keep it, but I'm not
excessively bothered
[email protected] said:
> I'm fine w/ the X86_TSC change, but I'd drop the X86_PIT for now.
> Then when you boot, boot w/ notsc and you should be fine.
> I do want to add some sort of TSC blacklisting so one doesn't always
> have to boot w/ notsc if your machine is detectable/
> compiled-exclusively= for. But I've got a few other issues in the
> queue first.
There are certain architectures (voyager is the only one currently supported,
but I suspect the Numa machines will have this too) where the TSC cannot be
used for cross CPU timings because the processors are driven by separate
clocks and may even have different clock speeds.
What I need is an option simply not to compile in the TSC code and use the PIT
instead. What I'm trying to do with the TSC and PIT options is give three
choices:
1. Don't use TSC (don't compile TSC code): X86_TSC=n, X86_PIT=y
2. May use TSC but check first (blacklist, notsc kernel option). X86_TSC=y,
X86_PIT=y
3. TSC is always OK so don't need PIT. X86_TSC=y, X86_PIT=n
We probably need to make the notsc and dodgy tsc check contingent on X86_PIT
(or a config option that says we have some other timer mechanism compiled in).
Really, the options should probably be handled in timer.c.
Theres also another problem in that the timer_init is called too early in the
boot sequence to get a message out to the user, so the panic in timers.c about
not finding a suitable timer will never be seen (the system will just lock up
on boot).
Do we have an option for a deferred panic that will trip just after we init
the console and clean out the printk buffer?
> Then make the arch/i386/timers/Makefile change to be something like:
>
> obj-y := timer.o timer_tsc.o timer_pit.o
> obj-$(CONFIG_X86_TSC) -= timer_pit.o #does this(-=) work?
> obj-$(CONFIG_X86_CYCYLONE) += timer_cyclone.o
Even if it works, the config option style is confusing. It's easier just to
have a positive option (CONFIG_X86_PIT) for this.
James
On Wed, 2002-11-06 at 15:03, J.E.J. Bottomley wrote:
> There are certain architectures (voyager is the only one currently supported,
> but I suspect the Numa machines will have this too) where the TSC cannot be
> used for cross CPU timings because the processors are driven by separate
> clocks and may even have different clock speeds.
IBM Summit is indeed another one.
> What I need is an option simply not to compile in the TSC code and use the PIT
> instead. What I'm trying to do with the TSC and PIT options is give three
> choices:
>
> 1. Don't use TSC (don't compile TSC code): X86_TSC=n, X86_PIT=y
>
> 2. May use TSC but check first (blacklist, notsc kernel option). X86_TSC=y,
> X86_PIT=y
>
> 3. TSC is always OK so don't need PIT. X86_TSC=y, X86_PIT=n
[Plus we need X86_CYCLONE and we may need X86_SOMETHING else for some
pending stuff]
> We probably need to make the notsc and dodgy tsc check contingent on X86_PIT
> (or a config option that says we have some other timer mechanism compiled in).
> Really, the options should probably be handled in timer.c.
The dodgy_tsc check is now obsolete. The known cases are handled with
workarounds and CS5510/20 can now use the TSC
> Do we have an option for a deferred panic that will trip just after we init
> the console and clean out the printk buffer?
Point to timer_none, check that later on in the boot
On Wed, 6 Nov 2002, J.E.J. Bottomley wrote:
>
> There are certain architectures (voyager is the only one currently supported,
> but I suspect the Numa machines will have this too) where the TSC cannot be
> used for cross CPU timings because the processors are driven by separate
> clocks and may even have different clock speeds.
I disagree.
We should use the TSC everywhere (if it exists, of course), and the fact
that two CPU's don't run synchronized shouldn't matter.
The solution is to make all the TSC calibration and offsets be per-CPU.
That should be fairly trivial, since we _already_ do the calibration
per-CPU anyway for bogomips (for no good reason except the whole process
is obviously just a funny thing to do, which is the point of bogomips).
The only even half-way "interesting" case I see is a udelay() getting
preempted, and I suspect most of those already run non-preemptable, so in
the short run we could just force that with preempt_off()/on() inside
udelay().
In the long run we probably do _not_ want to do that nonpreemptable
udelay(), but even that is debatable (anybody who is willing to be
preempted should not have been using udelay() in the first place, but
actually sleeping - and people who use udelay() for things like IO port
accesses etc almost certainly won't mind not being moved across CPU's).
Let's face it, we don't have that many tsc-related data structures. What,
we have:
- loops_per_jiffy, which is already a per-CPU thing, used by udelay()
- fast_gettimeoffset_quotient - which is global right now and shouldn't
be.
- delay_at_last_interrupt. See previous.
- possibly even all of xtime and all the NTP stuff
It's clearly stupid in the long run to depend on the TSC synchronization.
We should consider different CPU's to be different clock-domains, and just
synchronize them using the primitives we already have (hey, people can use
ntp to synchronize over networks quite well, and that's without the kind
of synchronization primitives that we have within the same box).
Anybody willin gto look into this? I suspect the numa people should be
more motivated than most of us.. You still want fast gettimeofday() on
NUMA too..
Linus
On Wed, 2002-11-06 at 15:45, Linus Torvalds wrote:
> It's clearly stupid in the long run to depend on the TSC synchronization.
> We should consider different CPU's to be different clock-domains, and just
> synchronize them using the primitives we already have (hey, people can use
> ntp to synchronize over networks quite well, and that's without the kind
> of synchronization primitives that we have within the same box).
NTP synchronization assumes the clock runs at approximately the same
speed and that you can 'bend' ticklength to avoid backward steps. Thats
a really cool idea for the x440 but I wonder how practical it is when we
have CPU's that keep changing speeds and not always notifying us about
it either.
On 6 Nov 2002, Alan Cox wrote:
>
> On Wed, 2002-11-06 at 15:45, Linus Torvalds wrote:
> > It's clearly stupid in the long run to depend on the TSC synchronization.
> > We should consider different CPU's to be different clock-domains, and just
> > synchronize them using the primitives we already have (hey, people can use
> > ntp to synchronize over networks quite well, and that's without the kind
> > of synchronization primitives that we have within the same box).
>
> NTP synchronization assumes the clock runs at approximately the same
> speed and that you can 'bend' ticklength to avoid backward steps. Thats
> a really cool idea for the x440 but I wonder how practical it is when we
> have CPU's that keep changing speeds and not always notifying us about
> it either.
Note that you have a _lot_ more flexibility than NTP thanks to the strong
synchronization that we actually do have between CPU's in the end.
The synchronization just isn't strong enough to allow us to believe that
the TSC is exactly the _same_. But it is certainly string enough that we
should be able to do a really good job.
Of course, if the TSC changes speed without telling us, we have problems.
But that has nothing to do witht he synchronization protocol itself: we
have problems with that even on a single CPU on laptops right now. Does it
mean that gettimeofday() gets confused? Sure as hell. But it doesn't get
any _worse_ from being done separately on multiple CPU's.
(And it _does_ get slightly better. On multiple CPU's with per-CPU time
structures at least you _can_ handle the case where one CPU runs at a
different speed, so at least you could handle the case where one CPU is
slowed down explicitly much better than we can right now).
As an example of something that is simpler in the MP/NUMA world than in
NTP: we see the processes migrating, and we can fairly trivially do things
like
- every gettimeofday() will always save the value we return, along with a
sequence number (which is mainly read-only, so it's ok to share among
CPU's)
- every "settimeofday()" will increase the sequence number
- when the next gettimeofday happens, we can check the sequence number
and the old gettimeofday, and verify that we get monotonic behaviour in
the absense of explicit date setting. This allows us to handle small
problems gracefully ("return the old time + 1 ns" to make it
monotonic even when we screw up), _and_ it will also act as a big clue
for us that we should try to synchronize - so that we basically never
need to worry about "should I check the clocks" (where "basically
never" may be "we check the clocks every minute or so if nothing else
happens")
Basically, I think NTP itself would be _way_ overkill between CPU's, I
wasn't really suggesting we use NTP as the main mechanism at that level. I
just suspect that a lot of the data structures and info that we already
have to have for NTP might be used as help.
Linus
Alan Cox <[email protected]> writes:
> > What I need is an option simply not to compile in the TSC code and use the PIT
> > instead. What I'm trying to do with the TSC and PIT options is give three
> > choices:
> >
> > 1. Don't use TSC (don't compile TSC code): X86_TSC=n, X86_PIT=y
> >
> > 2. May use TSC but check first (blacklist, notsc kernel option). X86_TSC=y,
> > X86_PIT=y
> >
> > 3. TSC is always OK so don't need PIT. X86_TSC=y, X86_PIT=n
>
> [Plus we need X86_CYCLONE and we may need X86_SOMETHING else for some
> pending stuff]
Yes, for example the NatSemi SC2200 has a 32+1 bit "High Resolution
Timer" that can be clocked either at 1MHz or 27MHz and that can
generate an interrupt whenever it wraps around.
Just using the High Resolution timer would avoid the known problems
with the TSC (stops on HLT, a bug when the low 32 bits of the TSC wrap
around) and the PIT (something somewhere, maybe SMM mode, seems to
mess upp the latch values).
/Christer
--
"Just how much can I get away with and still go to heaven?"
Freelance consultant specializing in device driver programming for Linux
Christer Weinigel <[email protected]> http://www.weinigel.se
On Wed, 2002-11-06 at 16:12, Linus Torvalds wrote:
> Basically, I think NTP itself would be _way_ overkill between CPU's, I
> wasn't really suggesting we use NTP as the main mechanism at that level. I
> just suspect that a lot of the data structures and info that we already
> have to have for NTP might be used as help.
I don't think the NTP algorithms are overkill. We have the same problem
space - multiple nodes some of which can be rogue (eg pit misreads, tsc
weirdness), inability to directly sample the clock on another node, need
for an efficient way to bend clocks. The fundamental algorithm is
extremely simple its all the networks, security, ui and glue that isnt -
stuff we can skip.
On Wed, 2002-11-06 at 07:45, Linus Torvalds wrote:
> The solution is to make all the TSC calibration and offsets be per-CPU.
> That should be fairly trivial, since we _already_ do the calibration
> per-CPU anyway for bogomips (for no good reason except the whole process
> is obviously just a funny thing to do, which is the point of bogomips).
This was discussed earlier, but dismissed as being a can of worms. It
still is possible to do (and can be added as just another timer_opt
stucture), but uglies like the spread-spectrum feature on the x440,
which actually runs each node at slightly varying speeds, pop up and
make my head hurt. Regardless, the attempt would probably help clean
things up, as you mentioned below. We also would need to round-robin the
timer interrupt, as each cpu would need a last_tsc_low point to generate
an offset. So I'm not opposed to it, but I'm not exactly eager to
implement it.
> Let's face it, we don't have that many tsc-related data structures. What,
> we have:
>
> - loops_per_jiffy, which is already a per-CPU thing, used by udelay()
> - fast_gettimeoffset_quotient - which is global right now and shouldn't
> be.
Good to see its on your hit-list. :) I mailed out a patch for this
earlier, I'll resend later today.
> - delay_at_last_interrupt. See previous.
I'll get to this one too, as well as a few other spots where the
timer_opts abstraction isn't clean enough (cpu_khz needs to be pulled
out of the timer_tsc code, etc)
thanks for the feedback
-john
On Wed, 2002-11-06 at 07:03, J.E.J. Bottomley wrote:
> There are certain architectures (voyager is the only one currently supported,
> but I suspect the Numa machines will have this too) where the TSC cannot be
> used for cross CPU timings because the processors are driven by separate
> clocks and may even have different clock speeds.
Yes, I'll confirm your suspicions for some NUMA boxes ;) The timer_opts
structure was largely created to make it easier to remedy this
situation, allowing alternate time sources to be easily added.
> What I need is an option simply not to compile in the TSC code and use the PIT
> instead. What I'm trying to do with the TSC and PIT options is give three
> choices:
>
> 1. Don't use TSC (don't compile TSC code): X86_TSC=n, X86_PIT=y
>
> 2. May use TSC but check first (blacklist, notsc kernel option). X86_TSC=y,
> X86_PIT=y
>
> 3. TSC is always OK so don't need PIT. X86_TSC=y, X86_PIT=n
Almost all systems are going to want #3. For those that need an
alternate time source (NUMAQ, Voyager, x440, etc) do we really need the
PIT only option(#1)? It can easily be dynamically detected in #2, and
the resulting kernel will run correctly on more machines which makes for
one less special kernel distros have to create/manage.
> Theres also another problem in that the timer_init is called too early in the
> boot sequence to get a message out to the user, so the panic in timers.c about
> not finding a suitable timer will never be seen (the system will just lock up
> on boot).
>
> Do we have an option for a deferred panic that will trip just after we init
> the console and clean out the printk buffer?
Yea, I'm actually working on exactly what Alan suggested (timer_none),
to solve this. Thanks for bringing it up though, I occasionally need a
kick in the pants for motivation :)
> > Then make the arch/i386/timers/Makefile change to be something like:
> >
> > obj-y := timer.o timer_tsc.o timer_pit.o
> > obj-$(CONFIG_X86_TSC) -= timer_pit.o #does this(-=) work?
> > obj-$(CONFIG_X86_CYCYLONE) += timer_cyclone.o
>
> Even if it works, the config option style is confusing. It's easier just to
> have a positive option (CONFIG_X86_PIT) for this.
I realize that the negative-option that _X86_TSC has become is a bit
confusing, but it is an optimization option, not a feature option. I've
been thinking of something similar to _X86_PIT, but I want to avoid the
PIT only case that you had in your patch, and try to come up with
something that isn't more confusing then what we started with.
thanks
-john
On Wed, 2002-11-06 at 05:43, Alan Cox wrote:
> On Wed, 2002-11-06 at 02:31, john stultz wrote:
> > I'm fine w/ the X86_TSC change, but I'd drop the X86_PIT for now.
> >
> > Then make the arch/i386/timers/Makefile change to be something like:
> >
> > obj-y := timer.o timer_tsc.o timer_pit.o
> > obj-$(CONFIG_X86_TSC) -= timer_pit.o #does this(-=) work?
> > obj-$(CONFIG_X86_CYCYLONE) += timer_cyclone.o
>
> Not everything is going to have a PIT. Also I need to know if there is a
> PIT for a few other things so I'd prefer to keep it, but I'm not
> excessively bothered
Hmmm. Ok, How about something like what is below? This is very similar
to what James is suggesting, but tries to fix some of the corner cases
as well. The only problem I see with this is that in some places we're
using _X86_PIT_TIMER as an imagined !_X86_TSC_ONLY, so one couldn't have
a kernel that compiled in the Cyclone timer, not the PIT timer, and
allowed one to disable the TSC.
I'm still not sure this is the way to go, but at least gives James a
chance to poke at my code rather then the other way around.
thanks
-john
diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig
--- a/arch/i386/Kconfig Mon Nov 4 17:15:15 2002
+++ b/arch/i386/Kconfig Mon Nov 4 17:15:15 2002
@@ -430,6 +430,11 @@
depends on NUMA
default y
+config X86_PIT_TIMER
+ bool
+ depends on !X86_TSC || X86_NUMAQ
+ default n
+
config X86_MCE
bool "Machine Check Exception"
---help---
diff -Nru a/arch/i386/kernel/cpu/common.c b/arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c Mon Nov 4 17:15:15 2002
+++ b/arch/i386/kernel/cpu/common.c Mon Nov 4 17:15:15 2002
@@ -42,7 +42,7 @@
}
__setup("cachesize=", cachesize_setup);
-#ifndef CONFIG_X86_TSC
+#ifdef CONFIG_X86_PIT_TIMER
static int tsc_disable __initdata = 0;
static int __init tsc_setup(char *str)
@@ -55,7 +55,7 @@
static int __init tsc_setup(char *str)
{
- printk("notsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC.\n");
+ printk("notsc: Kernel not compiled with CONFIG_X86_PIT_TIMER, cannot disable TSC.\n");
return 1;
}
#endif
diff -Nru a/arch/i386/kernel/timers/Makefile b/arch/i386/kernel/timers/Makefile
--- a/arch/i386/kernel/timers/Makefile Mon Nov 4 17:15:15 2002
+++ b/arch/i386/kernel/timers/Makefile Mon Nov 4 17:15:15 2002
@@ -5,7 +5,7 @@
obj-y := timer.o
obj-y += timer_tsc.o
-obj-y += timer_pit.o
+obj-$(CONFIG_X86_PIT_TIMER) += timer_pit.o
obj-$(CONFIG_X86_CYCLONE) += timer_cyclone.o
include $(TOPDIR)/Rules.make
diff -Nru a/arch/i386/kernel/timers/timer.c b/arch/i386/kernel/timers/timer.c
--- a/arch/i386/kernel/timers/timer.c Mon Nov 4 17:15:15 2002
+++ b/arch/i386/kernel/timers/timer.c Mon Nov 4 17:15:15 2002
@@ -8,7 +8,7 @@
/* list of timers, ordered by preference, NULL terminated */
static struct timer_opts* timers[] = {
&timer_tsc,
-#ifndef CONFIG_X86_TSC
+#ifdef CONFIG_X86_PIT_TIMER
&timer_pit,
#endif
NULL,
Followup to: <[email protected]>
By author: Linus Torvalds <[email protected]>
In newsgroup: linux.dev.kernel
>
> I disagree.
>
> We should use the TSC everywhere (if it exists, of course), and the fact
> that two CPU's don't run synchronized shouldn't matter.
>
If it exists, and works :-/
> It's clearly stupid in the long run to depend on the TSC synchronization.
> We should consider different CPU's to be different clock-domains, and just
> synchronize them using the primitives we already have (hey, people can use
> ntp to synchronize over networks quite well, and that's without the kind
> of synchronization primitives that we have within the same box).
Synchronizing them is nice, since it makes RDTSC usable in user
space (without nodelocking.) If it ain't doable, then it ain't.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>
Hi!
> - when the next gettimeofday happens, we can check the sequence number
> and the old gettimeofday, and verify that we get monotonic behaviour in
> the absense of explicit date setting. This allows us to handle small
> problems gracefully ("return the old time + 1 ns" to make it
> monotonic even when we screw up), _and_ it will also act as a big clue
> for us that we should try to synchronize - so that we basically never
> need to worry about "should I check the clocks" (where "basically
> never" may be "we check the clocks every minute or so if nothing else
> happens")
Unfortunately, this means "bye bye vsyscalls for gettimeofday".
Pavel
--
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?
On Sun, 10 Nov 2002, Pavel Machek wrote:
>
> Unfortunately, this means "bye bye vsyscalls for gettimeofday".
Not necessarily. All of the fastpatch and the checking can be done by the
vsyscall, and if the vsyscall notices that there is a backwards jump in
time it just gives up and does a real system call. The vsyscall does need
to figure out the CPU it's running on somehow, but that should be solvable
- indexing through the thread ID or something.
That said, I suspect that the real issue with vsyscalls is that they don't
really make much sense. The only system call we've ever found that matters
at all is gettimeofday(), and the vsyscall implementation there looks like
a "cool idea, but doesn't really matter (and complicates things a lot)".
The system call overhead tends to scale up very well with CPU speed (the
one esception being the P4 which just has some internal problems with "int
0x80" and slowed down compared to a PIII).
So I would just suggest not spending a lot of effort on it, considering
the problems it already has.
Linus
Hi!
> > Unfortunately, this means "bye bye vsyscalls for gettimeofday".
>
> Not necessarily. All of the fastpatch and the checking can be done by the
> vsyscall, and if the vsyscall notices that there is a backwards jump
> in
I believe you need to *store* last value given to userland. Checking
backwards jump can be dealt with, but to check for time going
backwards you need to *store* result each result of vsyscall. I do not
think that can be done from userlnad.
> That said, I suspect that the real issue with vsyscalls is that they don't
> really make much sense. The only system call we've ever found that matters
> at all is gettimeofday(), and the vsyscall implementation there looks like
> a "cool idea, but doesn't really matter (and complicates things a lot)".
I don't like vsyscalls at all...
Pavel
--
When do you have heart between your knees?
On Sun, 10 Nov 2002, Pavel Machek wrote:
>
> I believe you need to *store* last value given to userland.
But that's trivially done: it doesn't even have to be thread-specific, so
it can be just a global entry anywhere in the process data structures.
This is just a random sanity check thing, after all. It doesn't have to be
system-global or even per-cpu. The only really important thing is that
"gettimeofday()" should return monotonically increasing data - and if it
doesn't, the vsyscall would have to ask why (sometimes it's fine, if
somebody did a settimeofday, but usually it's a sign of trouble).
But yes, it's certainly a lot more complex than just doing it in a
controlled system call environment. Which is why I think vsyscalls are
eventually not worth it.
Linus
On Sun, Nov 10, 2002 at 10:59:55AM -0800, Linus Torvalds wrote:
> On Sun, 10 Nov 2002, Pavel Machek wrote:
> >
> > Unfortunately, this means "bye bye vsyscalls for gettimeofday".
>
> Not necessarily. All of the fastpatch and the checking can be done by the
> vsyscall, and if the vsyscall notices that there is a backwards jump in
> time it just gives up and does a real system call. The vsyscall does need
> to figure out the CPU it's running on somehow, but that should be solvable
> - indexing through the thread ID or something.
I'm planning to store the CPU number in the highest bits of the TSC ...
> That said, I suspect that the real issue with vsyscalls is that they don't
> really make much sense. The only system call we've ever found that matters
> at all is gettimeofday(), and the vsyscall implementation there looks like
> a "cool idea, but doesn't really matter (and complicates things a lot)".
It's not complicating things overly. We'd have to go through most of the
hoops anyway if we wanted a fast gettimeofday syscall instead of a
vsyscall.
> The system call overhead tends to scale up very well with CPU speed (the
> one esception being the P4 which just has some internal problems with "int
> 0x80" and slowed down compared to a PIII).
>
> So I would just suggest not spending a lot of effort on it, considering
> the problems it already has.
Agreed. The only problem left I see is the need to have an interrupt of
every CPU from time to time to update the per-cpu time values, and to
synchronize those to the 'global timer interrupt' somehow.
--
Vojtech Pavlik
SuSE Labs
Hi!
> > I believe you need to *store* last value given to userland.
>
> But that's trivially done: it doesn't even have to be thread-specific, so
> it can be just a global entry anywhere in the process data
> structures.
> This is just a random sanity check thing, after all. It doesn't have to be
> system-global or even per-cpu. The only really important thing is that
> "gettimeofday()" should return monotonically increasing data - and if it
> doesn't, the vsyscall would have to ask why (sometimes it's fine, if
> somebody did a settimeofday, but usually it's a sign of trouble).
I believe you need it system-global. If task A tells task B "its
10:30:00" and than task B does gettimeofday and gets "10:29:59", it
will be confused for sure.
Pavel
--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.
On Sun, Nov 10, 2002 at 08:42:04PM +0100, Pavel Machek wrote:
> Hi!
>
> > > I believe you need to *store* last value given to userland.
> >
> > But that's trivially done: it doesn't even have to be thread-specific, so
> > it can be just a global entry anywhere in the process data
> > structures.
>
> > This is just a random sanity check thing, after all. It doesn't have to be
> > system-global or even per-cpu. The only really important thing is that
> > "gettimeofday()" should return monotonically increasing data - and if it
> > doesn't, the vsyscall would have to ask why (sometimes it's fine, if
> > somebody did a settimeofday, but usually it's a sign of trouble).
>
> I believe you need it system-global. If task A tells task B "its
> 10:30:00" and than task B does gettimeofday and gets "10:29:59", it
> will be confused for sure.
You just need to make sure the time difference is less than the speed of
light in the system times the distance between the two tasks. ;) Really,
relativity, and the limited speed of travel of information kicks in and
saves us here.
--
Vojtech Pavlik
SuSE Labs
commence Pavel Machek quotation:
>> This is just a random sanity check thing, after all. It doesn't have to be
>> system-global or even per-cpu. The only really important thing is that
>> "gettimeofday()" should return monotonically increasing data - and if it
^^^^^^^^^^^^^^^^^^^^^^^^
>> doesn't, the vsyscall would have to ask why (sometimes it's fine, if
>> somebody did a settimeofday, but usually it's a sign of trouble).
>
> I believe you need it system-global. If task A tells task B "its
> 10:30:00" and than task B does gettimeofday and gets "10:29:59", it
> will be confused for sure.
Hence the requirement that it be monotonically increasing.
--
/ |
[|] Sean Neakums | Questions are a burden to others;
[|] <[email protected]> | answers a prison for oneself.
\ |
On 2002-11-10T20:02:00,
Sean Neakums <[email protected]> said:
> > I believe you need it system-global. If task A tells task B "its
> > 10:30:00" and than task B does gettimeofday and gets "10:29:59", it
> > will be confused for sure.
> Hence the requirement that it be monotonically increasing.
Processes expecting time to increase strictly monotonically across process
boundaries will enjoy life in cluster settings or when the admin adjusts the
time.
In short, those programs are already broken.
Of course, physically that should be true, Star Trek or not ;), but it is a
really hard promise to keep across multiple nodes (think Mosix, CC/NC-NUMA or
even real clusters which distributed processing).
Serializing all gettimeofday() calls via a single counter at least is a rather
bad idea.
Sincerely,
Lars Marowsky-Br?e <[email protected]>
--
Principal Squirrel
SuSE Labs - Research & Development, SuSE Linux AG
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
-- Capt. Edward A. Murphy -- Louis Pasteur
On Sun, 2002-11-10 at 20:16, Lars Marowsky-Bree wrote:
> Processes expecting time to increase strictly monotonically across process
> boundaries will enjoy life in cluster settings or when the admin adjusts the
> time.
I'd fix your cluster code. OpenMosix gets this right and clusters
outside the mosix/numa/ssi world don't generally care as you are
restarting services, but even then tend to use NTP to keep a bendy but
forwarding moving time.
Alan
On Sun, 2002-11-10 at 11:46, Vojtech Pavlik wrote:
> On Sun, Nov 10, 2002 at 10:59:55AM -0800, Linus Torvalds wrote:
> > On Sun, 10 Nov 2002, Pavel Machek wrote:
> > > Unfortunately, this means "bye bye vsyscalls for gettimeofday".
> >
> > Not necessarily. All of the fastpatch and the checking can be done by the
> > vsyscall, and if the vsyscall notices that there is a backwards jump in
> > time it just gives up and does a real system call. The vsyscall does need
> > to figure out the CPU it's running on somehow, but that should be solvable
> > - indexing through the thread ID or something.
>
> I'm planning to store the CPU number in the highest bits of the TSC ...
I could be wrong, but we had considered this earlier, and found that
there isn't a way to set the high bits of the TSC, you can only clear
them.
> > The system call overhead tends to scale up very well with CPU speed (the
> > one esception being the P4 which just has some internal problems with "int
> > 0x80" and slowed down compared to a PIII).
> >
> > So I would just suggest not spending a lot of effort on it, considering
> > the problems it already has.
>
> Agreed. The only problem left I see is the need to have an interrupt of
> every CPU from time to time to update the per-cpu time values, and to
> synchronize those to the 'global timer interrupt' somehow.
Yes, this would be needed for per-cpu tsc.
thanks
-john
As a beginning, what about the attached patch? It eliminates the compile time
TSC options (and thus hopefully the sources of confusion). I've exported
tsc_disable, so it can be set by the subarchs if desired (voyager does this)
and moved the notsc option into the timer_tsc code (which is where it looks
like it belongs).
James
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet v2.5.47 -> 1.825
# arch/i386/kernel/timers/Makefile 1.2 -> 1.4
# arch/i386/kernel/cpu/common.c 1.13 -> 1.14
# include/asm-i386/processor.h 1.31 -> 1.32
# arch/i386/Kconfig 1.2.1.4 -> 1.6
# arch/i386/kernel/timers/timer_tsc.c 1.5 -> 1.6
# arch/i386/mach-voyager/setup.c 1.1 -> 1.2
# arch/i386/kernel/timers/timer_pit.c 1.3.1.2 -> 1.7
# arch/i386/kernel/timers/timer.c 1.3 -> 1.5
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/11/10 [email protected] 1.823
# Linux v2.5.47
# --------------------------------------------
# 02/11/11 jejb@mulgrave.(none) 1.824
# Merge mulgrave.(none):/home/jejb/BK/timer-2.5
# into mulgrave.(none):/home/jejb/BK/timer-new-2.5
# --------------------------------------------
# 02/11/11 jejb@mulgrave.(none) 1.825
# make TSC purely a run-time determined thing
#
# - remove X86_TSC and X86_PIT compile time options
# - export tsc_disable for architecture setup
# - disable tsc in voyager pre_arch_setup_hook()
# - move "notsc" option into timers_tsc
# --------------------------------------------
#
diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig
--- a/arch/i386/Kconfig Mon Nov 11 15:56:40 2002
+++ b/arch/i386/Kconfig Mon Nov 11 15:56:40 2002
@@ -253,11 +253,6 @@
depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCYRIXIII || MELAN ||
MK6 || M586MMX || M586TSC || M586 || M486
default y
-config X86_TSC
- bool
- depends on MWINCHIP3D || MWINCHIP2 || MCRUSOE || MCYRIXIII || MK7 || MK6 ||
MPENTIUM4 || MPENTIUMIII || M686 || M586MMX || M586TSC
- default y
-
config X86_GOOD_APIC
bool
depends on MK7 || MPENTIUM4 || MPENTIUMIII || M686 || M586MMX
diff -Nru a/arch/i386/kernel/cpu/common.c b/arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/cpu/common.c Mon Nov 11 15:56:40 2002
@@ -42,25 +42,6 @@
}
__setup("cachesize=", cachesize_setup);
-#ifndef CONFIG_X86_TSC
-static int tsc_disable __initdata = 0;
-
-static int __init tsc_setup(char *str)
-{
- tsc_disable = 1;
- return 1;
-}
-#else
-#define tsc_disable 0
-
-static int __init tsc_setup(char *str)
-{
- printk("notsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC.\n");
- return 1;
-}
-#endif
-__setup("notsc", tsc_setup);
-
int __init get_model_name(struct cpuinfo_x86 *c)
{
unsigned int *v;
diff -Nru a/arch/i386/kernel/timers/Makefile b/arch/i386/kernel/timers/Makefile
--- a/arch/i386/kernel/timers/Makefile Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/timers/Makefile Mon Nov 11 15:56:40 2002
@@ -2,10 +2,8 @@
# Makefile for x86 timers
#
-obj-y:= timer.o
+obj-y:= timer.o timer_tsc.o timer_pit.o
-obj-y += timer_tsc.o
-obj-y += timer_pit.o
-obj-$(CONFIG_X86_CYCLONE) += timer_cyclone.o
+obj-$(CONFIG_X86_CYCLONE) += timer_cyclone.o
include $(TOPDIR)/Rules.make
diff -Nru a/arch/i386/kernel/timers/timer.c b/arch/i386/kernel/timers/timer.c
--- a/arch/i386/kernel/timers/timer.c Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/timers/timer.c Mon Nov 11 15:56:40 2002
@@ -8,9 +8,7 @@
/* list of timers, ordered by preference, NULL terminated */
static struct timer_opts* timers[] = {
&timer_tsc,
-#ifndef CONFIG_X86_TSC
&timer_pit,
-#endif
NULL,
};
diff -Nru a/arch/i386/kernel/timers/timer_pit.c b/arch/i386/kernel/timers/timer
_pit.c
--- a/arch/i386/kernel/timers/timer_pit.c Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/timers/timer_pit.c Mon Nov 11 15:56:40 2002
@@ -9,7 +9,9 @@
#include <linux/irq.h>
#include <asm/mpspec.h>
#include <asm/timer.h>
+#include <asm/smp.h>
#include <asm/io.h>
+#include <asm/arch_hooks.h>
extern spinlock_t i8259A_lock;
extern spinlock_t i8253_lock;
diff -Nru a/arch/i386/kernel/timers/timer_tsc.c b/arch/i386/kernel/timers/timer
_tsc.c
--- a/arch/i386/kernel/timers/timer_tsc.c Mon Nov 11 15:56:40 2002
+++ b/arch/i386/kernel/timers/timer_tsc.c Mon Nov 11 15:56:40 2002
@@ -11,6 +11,10 @@
#include <asm/timer.h>
#include <asm/io.h>
+/* processor.h for distable_tsc flag */
+#include <asm/processor.h>
+
+int tsc_disable __initdata = 0;
extern int x86_udelay_tsc;
extern spinlock_t i8253_lock;
@@ -286,6 +290,18 @@
}
return -ENODEV;
}
+
+/* disable flag for tsc. Takes effect by clearing the TSC cpu flag
+ * in cpu/common.c */
+static int __init tsc_setup(char *str)
+{
+ tsc_disable = 1;
+ return 1;
+}
+
+__setup("notsc", tsc_setup);
+
+
/************************************************************/
diff -Nru a/arch/i386/mach-voyager/setup.c b/arch/i386/mach-voyager/setup.c
--- a/arch/i386/mach-voyager/setup.c Mon Nov 11 15:56:40 2002
+++ b/arch/i386/mach-voyager/setup.c Mon Nov 11 15:56:40 2002
@@ -7,6 +7,7 @@
#include <linux/irq.h>
#include <linux/interrupt.h>
#include <asm/arch_hooks.h>
+#include <asm/processor.h>
void __init pre_intr_init_hook(void)
{
@@ -29,6 +30,7 @@
void __init pre_setup_arch_hook(void)
{
+ tsc_disable = 1;
}
void __init trap_init_hook(void)
diff -Nru a/include/asm-i386/processor.h b/include/asm-i386/processor.h
--- a/include/asm-i386/processor.h Mon Nov 11 15:56:40 2002
+++ b/include/asm-i386/processor.h Mon Nov 11 15:56:40 2002
@@ -18,6 +18,9 @@
#include <linux/config.h>
#include <linux/threads.h>
+/* flag for disabling the tsc */
+extern int tsc_disable;
+
struct desc_struct {
unsigned long a,b;
};
On Mon, Nov 11, 2002 at 03:57:28PM -0500, J.E.J. Bottomley wrote:
> As a beginning, what about the attached patch? It eliminates the
> compile time TSC options (and thus hopefully the sources of confusion).
> I've exported tsc_disable, so it can be set by the subarchs if desired
> (voyager does this) and moved the notsc option into the timer_tsc code
> (which is where it looks like it belongs).
This will be very useful to me.
Thanks,
Bill
On Mon, 2002-11-11 at 12:57, J.E.J. Bottomley wrote:
> As a beginning, what about the attached patch? It eliminates the compile time
> TSC options (and thus hopefully the sources of confusion). I've exported
> tsc_disable, so it can be set by the subarchs if desired (voyager does this)
> and moved the notsc option into the timer_tsc code (which is where it looks
> like it belongs).
Looks good to me.
We'd still need to go back and yank out the #ifdef CONFIG_X86_TSC'ed
macros in profile.h and pksched.h or replace them w/ inlines that wrap
the rdtsc calls w/ if(cpu_has_tsc && !tsc_disable) or some such line.
But yea, its a start, assuming no one screams about not being able to
optimize out the timer_pit code.
thanks
-john
On Mon, Nov 11, 2002 at 12:40:49PM -0800, john stultz wrote:
> On Sun, 2002-11-10 at 11:46, Vojtech Pavlik wrote:
> > On Sun, Nov 10, 2002 at 10:59:55AM -0800, Linus Torvalds wrote:
> > > On Sun, 10 Nov 2002, Pavel Machek wrote:
> > > > Unfortunately, this means "bye bye vsyscalls for gettimeofday".
> > >
> > > Not necessarily. All of the fastpatch and the checking can be done by the
> > > vsyscall, and if the vsyscall notices that there is a backwards jump in
> > > time it just gives up and does a real system call. The vsyscall does need
> > > to figure out the CPU it's running on somehow, but that should be solvable
> > > - indexing through the thread ID or something.
> >
> > I'm planning to store the CPU number in the highest bits of the TSC ...
>
> I could be wrong, but we had considered this earlier, and found that
> there isn't a way to set the high bits of the TSC, you can only clear
> them.
I'll have to test that. Another option is per-cpu page mappings for
vsyscalls. But that's rather ugly.
> > > The system call overhead tends to scale up very well with CPU speed (the
> > > one esception being the P4 which just has some internal problems with "int
> > > 0x80" and slowed down compared to a PIII).
> > >
> > > So I would just suggest not spending a lot of effort on it, considering
> > > the problems it already has.
> >
> > Agreed. The only problem left I see is the need to have an interrupt of
> > every CPU from time to time to update the per-cpu time values, and to
> > synchronize those to the 'global timer interrupt' somehow.
>
> Yes, this would be needed for per-cpu tsc.
--
Vojtech Pavlik
SuSE Labs
[email protected] said:
> We'd still need to go back and yank out the #ifdef CONFIG_X86_TSC'ed
> macros in profile.h and pksched.h or replace them w/ inlines that wrap
> the rdtsc calls w/ if(cpu_has_tsc && !tsc_disable) or some such line.
Actually, the best way to do this might be to vector the rdtsc calls through a
function pointer (i.e. they return zero always if the TSC is disabled, or the
TSC value if it's OK). I think this might be better than checking the
cpu_has_tsc flag in the code (well it's more expandable anyway, it won't be
faster...)
When the TSC code is sorted out on a per cpu basis, consumers are probably
going to expect rdtsc to return usable values whatever CPU it is called on, so
vectoring the calls now may help this.
James
On Mon, 2002-11-11 at 14:49, J.E.J. Bottomley wrote:
> [email protected] said:
> > We'd still need to go back and yank out the #ifdef CONFIG_X86_TSC'ed
> > macros in profile.h and pksched.h or replace them w/ inlines that wrap
> > the rdtsc calls w/ if(cpu_has_tsc && !tsc_disable) or some such line.
>
> Actually, the best way to do this might be to vector the rdtsc calls through a
> function pointer (i.e. they return zero always if the TSC is disabled, or the
> TSC value if it's OK). I think this might be better than checking the
> cpu_has_tsc flag in the code (well it's more expandable anyway, it won't be
> faster...)
Sounds good, I'm planning on moving get_cycles to timer_opts, so how
about using that?
> When the TSC code is sorted out on a per cpu basis, consumers are probably
> going to expect rdtsc to return usable values whatever CPU it is called on, so
> vectoring the calls now may help this.
Yea, this is an ugly topic. I'm really not very enthusiastic about
per-cpu tsc, because it doesn't necessarilly solve the problem on the
few machines that can't use the global tsc implementation (such as the
x440). True, many of the points Linus made about the current timer_tsc
implementation are valid. It does need to be cleaned up further, and I
have some patches to do so (I'll resend tomorrow, as I'm out sick
today). We should be looking towards multi-frequency systems, and seeing
what we can do to clean things up (ie: cpu_khz is global, etc).
If you are deadset on doing the percpu method, I'd strongly suggest
creating a new timer_per_cpu_tsc timer_opt struct and implementing it
there, rather then munging the existing code, which works well for most
systems.
thanks
-john
Hi!
> As a beginning, what about the attached patch? It eliminates the compile time
> TSC options (and thus hopefully the sources of confusion). I've exported
> tsc_disable, so it can be set by the subarchs if desired (voyager does this)
> and moved the notsc option into the timer_tsc code (which is where it looks
> like it belongs).
Looks good to me.
--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.