2005-12-13 07:26:39

by Adrian Yee

[permalink] [raw]
Subject: tsc clock issues with dual core and question about irq balancing

Hi,

I've been having tsc issues where it counts back occasionally causing
things like ping to break with errors: "Warning: time of day goes back
(-1451987us), taking countermeasures." It seems related to
http://bugzilla.kernel.org/show_bug.cgi?id=5105 , but that bug seems to
be closed (and more x86_64 related). I also get other timing issues
like single clicks registering as double clicks, and at times double
clicks that don't register. In addition, if I stress the system with
something like prime95, then after about 2 minutes the system clock will
speed up where the clock advances by minutes every second. As suggested
in bug 5105, I switched to use the pmtimer (clock=pmtmr, my system
doesn't seem to support hpet) and it has fixed the ping and clock issue,
but my system doesn't 'feel' right. For example, ssh'ing out of the
machine is fine, but when ssh'ing into the system a dmesg is very slow
(spurts out a few pages then pauses for 10-20 seconds, then repeat).
Also, general desktop usage seems a little sluggish and not what a smp
system should feel like.

I'm currently running an i386 (ie. not x86_64) 2.6.15-rc5 kernel w/ SMP,
APIC and ACPI enabled (AMD Cool & Quiet disabled), an Athlon 64 X2 3800+
and EVGA nForce4 SLI (NF41) motherboard. I previously had the processor
running on an Abit AV8 (K8T800 Pro chipset) board and was having similar
issues, so it seems to be a dual core issue. I'd just like to add that
I'm currently testing the system with "nosmp noapic acpi=off clock=tsc"
(it was losing interrupts and wouldn't boot properly with apic/acpi on)
and so far everything seems to work (this includes ssh and desktop usage
is better).

My other question is about irq balancing - I turned it on, but it
doesn't seem to be working properly:

CPU0 CPU1
0: 109208 975 IO-APIC-edge timer
1: 1226 10 IO-APIC-edge i8042
8: 275272 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-level acpi
12: 4133 4 IO-APIC-edge i8042
14: 5135 8 IO-APIC-edge ide0
15: 17 8 IO-APIC-edge ide1
16: 25084 1 IO-APIC-level eth0
17: 43597 1 IO-APIC-level eth1
18: 185 5 IO-APIC-level libata
19: 0 0 IO-APIC-level libata
20: 11525 1 IO-APIC-level EMU10K1
21: 24870 1 IO-APIC-level nvidia
NMI: 0 0
LOC: 110119 110118
ERR: 0
MIS: 0

Are there certain conditions where irq balancing doesn't work properly?
Thanks.

Adrian


2005-12-14 01:04:53

by john stultz

[permalink] [raw]
Subject: Re: tsc clock issues with dual core and question about irq balancing

On Mon, 2005-12-12 at 23:26 -0800, Adrian Yee wrote:
> I've been having tsc issues where it counts back occasionally causing
> things like ping to break with errors: "Warning: time of day goes back
> (-1451987us), taking countermeasures." It seems related to
> http://bugzilla.kernel.org/show_bug.cgi?id=5105 , but that bug seems to
> be closed (and more x86_64 related). I also get other timing issues
> like single clicks registering as double clicks, and at times double
> clicks that don't register. In addition, if I stress the system with
> something like prime95, then after about 2 minutes the system clock will
> speed up where the clock advances by minutes every second. As suggested
> in bug 5105, I switched to use the pmtimer (clock=pmtmr, my system
> doesn't seem to support hpet) and it has fixed the ping and clock issue,
> but my system doesn't 'feel' right. For example, ssh'ing out of the
> machine is fine, but when ssh'ing into the system a dmesg is very slow
> (spurts out a few pages then pauses for 10-20 seconds, then repeat).
> Also, general desktop usage seems a little sluggish and not what a smp
> system should feel like.

I can't speak about the irq routing issue, but I'm interested in your
issues with the ACPI PM timer.

> I'm currently running an i386 (ie. not x86_64) 2.6.15-rc5 kernel w/ SMP,
> APIC and ACPI enabled (AMD Cool & Quiet disabled), an Athlon 64 X2 3800+
> and EVGA nForce4 SLI (NF41) motherboard. I previously had the processor
> running on an Abit AV8 (K8T800 Pro chipset) board and was having similar
> issues, so it seems to be a dual core issue. I'd just like to add that
> I'm currently testing the system with "nosmp noapic acpi=off clock=tsc"
> (it was losing interrupts and wouldn't boot properly with apic/acpi on)
> and so far everything seems to work (this includes ssh and desktop usage
> is better).

So keeping the above settings, does removing just the "clock=tsc" cause
the sluggishness to appear?

The TSC is *much* faster then the ACPI PM, however it is just not usable
for reliable timekeeping on many SMP systems. That said, the ACPI PM
should not cause performance issues unless you are constantly calling
gettimeofday().

Also would you open a bugzilla bug on this and attach your .config and
dmesg?

thanks
-john


2005-12-14 09:07:49

by Adrian Yee

[permalink] [raw]
Subject: Re: tsc clock issues with dual core and question about irq balancing

Hi John,

>> I'm currently testing the system with "nosmp noapic acpi=off
>> clock=tsc" (it was losing interrupts and wouldn't boot properly
>> with apic/acpi on) and so far everything seems to work (this
>> includes ssh and desktop usage is better).
>
> So keeping the above settings, does removing just the "clock=tsc"
> cause the sluggishness to appear?

I just tried booting with the pmtmr enabled and incoming ssh is bad
(I had an ls pause for over 20 seconds, while another connection was
somewhat fine). I wish I had more concrete tests since the problems
I'm seeing are so subjective. I guess I'll have to ignore this
problem until I get a better test.

> Also would you open a bugzilla bug on this and attach your .config
> and dmesg?

Done: http://bugzilla.kernel.org/show_bug.cgi?id=5740

Thanks.

Adrian

2005-12-14 09:51:43

by Jonas Oreland

[permalink] [raw]
Subject: Re: tsc clock issues with dual core and question about irq balancing

Adrian Yee wrote:
> Hi John,
>
>
>>>I'm currently testing the system with "nosmp noapic acpi=off
>>>clock=tsc" (it was losing interrupts and wouldn't boot properly
>>>with apic/acpi on) and so far everything seems to work (this
>>>includes ssh and desktop usage is better).
>>
>>So keeping the above settings, does removing just the "clock=tsc"
>>cause the sluggishness to appear?
>
>
> I just tried booting with the pmtmr enabled and incoming ssh is bad
> (I had an ls pause for over 20 seconds, while another connection was
> somewhat fine). I wish I had more concrete tests since the problems
> I'm seeing are so subjective. I guess I'll have to ignore this
> problem until I get a better test.
>
>
>>Also would you open a bugzilla bug on this and attach your .config
>>and dmesg?
>
>
> Done: http://bugzilla.kernel.org/show_bug.cgi?id=5740
>
> Thanks.
>
> Adrian
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Hi

Dono if this helps, but

I also had problems with tsc, and ACPI timer wasnt properly detected

http://bugzilla.kernel.org/show_bug.cgi?id=5283 fixed the ACPI problem.

(idle=poll should fix it aswell, i think)

/Jonas

2005-12-14 20:14:45

by john stultz

[permalink] [raw]
Subject: Re: tsc clock issues with dual core and question about irq balancing

On Wed, 2005-12-14 at 01:07 -0800, Adrian Yee wrote:
> Hi John,
>
> >> I'm currently testing the system with "nosmp noapic acpi=off
> >> clock=tsc" (it was losing interrupts and wouldn't boot properly
> >> with apic/acpi on) and so far everything seems to work (this
> >> includes ssh and desktop usage is better).
> >
> > So keeping the above settings, does removing just the "clock=tsc"
> > cause the sluggishness to appear?
>
> I just tried booting with the pmtmr enabled and incoming ssh is bad
> (I had an ls pause for over 20 seconds, while another connection was
> somewhat fine). I wish I had more concrete tests since the problems
> I'm seeing are so subjective. I guess I'll have to ignore this
> problem until I get a better test.

>From your dmesg, you're still running w/ smp, apic, acpi as well. I was
curious if you could run just as you had before without issue using
"nosmp noapic acpi=off clock=tsc", only drop the clock=tsc bit.

I just want to be sure we're only changing one variable at a time. :)


> > Also would you open a bugzilla bug on this and attach your .config
> > and dmesg?
>
> Done: http://bugzilla.kernel.org/show_bug.cgi?id=5740

Thanks for filling that out! I'll see if I cannot reproduce anything
similar using your config.


thanks again,
-john

2005-12-14 20:27:04

by Adrian Yee

[permalink] [raw]
Subject: Re: tsc clock issues with dual core and question about irq balancing

Hi John,

> >>>From your dmesg, you're still running w/ smp, apic, acpi as well.
> I was curious if you could run just as you had before without issue
> using "nosmp noapic acpi=off clock=tsc", only drop the clock=tsc bit.
>
> I just want to be sure we're only changing one variable at a time.
> :)

I also have a dmesg with those options that I can upload, but I'm not
completely sure about the validity of the sluggishness "tests" because
the system felt the same after I booted with the different
configurations this time around. ssh seems fine right now, so I guess
my Internet just happened to go bad at the same time I started play with
my hardware and kernel configurations.

I think the only solid problem I've got here is the tsc ocassionally
counting back. Is switching to clock=pmtmr the permanent/proper
solution for this, or is there a bug in the kernel/hardware that should
be fixable? Thanks.

Adrian

2005-12-14 20:57:16

by john stultz

[permalink] [raw]
Subject: Re: tsc clock issues with dual core and question about irq balancing

On Wed, 2005-12-14 at 12:27 -0800, Adrian Yee wrote:
> Hi John,
>
> > >>>From your dmesg, you're still running w/ smp, apic, acpi as well.
> > I was curious if you could run just as you had before without issue
> > using "nosmp noapic acpi=off clock=tsc", only drop the clock=tsc bit.
> >
> > I just want to be sure we're only changing one variable at a time.
> > :)
>
> I also have a dmesg with those options that I can upload, but I'm not
> completely sure about the validity of the sluggishness "tests" because
> the system felt the same after I booted with the different
> configurations this time around. ssh seems fine right now, so I guess
> my Internet just happened to go bad at the same time I started play with
> my hardware and kernel configurations.

Hmm. Please keep an eye on this. If there is something going funky
either in accessing the PM Timer hardware on your chipset, or some other
quirk (locking issues, timer starvation, etc) it would be good to
discover.

> I think the only solid problem I've got here is the tsc ocassionally
> counting back. Is switching to clock=pmtmr the permanent/proper
> solution for this, or is there a bug in the kernel/hardware that should
> be fixable? Thanks.

If the ACPI PM timer is enabled it should be used by default (is that
not the case? if you do not use clock= at all, what clocksource gets
selected?). Unfortunately using the TSC on some SMP systems is just not
feasible.

thanks
-john


2005-12-14 23:41:24

by Jeff Carr

[permalink] [raw]
Subject: Re: tsc clock issues with dual core and question about irq balancing

On 12/12/05 23:26, Adrian Yee wrote:

> My other question is about irq balancing - I turned it on, but it
> doesn't seem to be working properly:
>
> CPU0 CPU1
> 0: 109208 975 IO-APIC-edge timer
> 1: 1226 10 IO-APIC-edge i8042
> 8: 275272 1 IO-APIC-edge rtc
> 9: 0 0 IO-APIC-level acpi
> 12: 4133 4 IO-APIC-edge i8042
> 14: 5135 8 IO-APIC-edge ide0
> 15: 17 8 IO-APIC-edge ide1
> 16: 25084 1 IO-APIC-level eth0
> 17: 43597 1 IO-APIC-level eth1
> 18: 185 5 IO-APIC-level libata
> 19: 0 0 IO-APIC-level libata
> 20: 11525 1 IO-APIC-level EMU10K1
> 21: 24870 1 IO-APIC-level nvidia
> NMI: 0 0
> LOC: 110119 110118
> ERR: 0
> MIS: 0

I think there is an irqbalance userspace daemon.

2005-12-15 04:35:54

by Adrian Yee

[permalink] [raw]
Subject: Re: tsc clock issues with dual core and question about irq balancing

Hi Jeff,

>> My other question is about irq balancing - I turned it on, but it
>> doesn't seem to be working properly:
>>
>> CPU0 CPU1
>> 0: 109208 975 IO-APIC-edge timer
>> 1: 1226 10 IO-APIC-edge i8042
>> 8: 275272 1 IO-APIC-edge rtc
>> 9: 0 0 IO-APIC-level acpi
>> 12: 4133 4 IO-APIC-edge i8042
>> 14: 5135 8 IO-APIC-edge ide0
>> 15: 17 8 IO-APIC-edge ide1
>> 16: 25084 1 IO-APIC-level eth0
>> 17: 43597 1 IO-APIC-level eth1
>> 18: 185 5 IO-APIC-level libata
>> 19: 0 0 IO-APIC-level libata
>> 20: 11525 1 IO-APIC-level EMU10K1
>> 21: 24870 1 IO-APIC-level nvidia
>> NMI: 0 0
>> LOC: 110119 110118
>> ERR: 0
>> MIS: 0
>
> I think there is an irqbalance userspace daemon.

According to debian's package description for the irq balance package:

"Daemon to balance irq's across multiple CPUs on systems with the 2.4 or
2.6 kernel. This can lead to better performance and IO balance on SMP
systems. Useful mostly just for 2.4 kernels, or 2.6 kernels with
CONFIG_IRQBALANCE turned off."

I have the CONFIG_IRQBALANCE option turned on, so it should be balancing
the irq's itself; is it not? Would there even be any benefit from
balancing the irq's with a single dual core processor? Thanks.

Adrian