-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Setting the internal clock to 100 Hz stablizes the laptop - and the
synaptics touchpad stops "crashing" (when "crashed" the pad reads out
all kinds of seemingly random values). I would suspect the driver
needs adjusting for the variable clock. Also - it's definitely nicer
on the laptop power use as far as I can tell - should this be in the
documentation?
I'm very grateful that compact flash-based booting on a SATA system
works well. It hasn't been so reliable in 2.6.19-rc2-mm1 for IDE/CF
adaptors but I haven't yet solved why. (tested with various laptops)
resume from "suspend to ram" (ACPI S3 mode) - the keyboard and mouse do
not recover on 945G chipset. Note that otherwise the chipset works
well in 2.6.19-rc2-mm1 - and this is the first kernel that does work well).
LVM2 - when adding and removing physical volumes (again, on Compact
Flash cards via USB and Firewire adaptors) - it doesn't always remove
the volume properly (pvremove /dev/sda or equiv) from the device-mapper.
This leaves me unable to plug in another. I suspect this to be an
LVM2 problem (no hotplug?) rather than a compact flash or SCSI problem.
I would debug - but I'm not yet sure where to begin. Feel free to
offer suggestions (to my mailbox directly - I've waited for two weeks to
post as I don't want to add noise to the kernel list)
oh - my job involves working with these systems
Thank you for everything!
- Teunis Peters
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFN6JcbFT/SAfwLKMRAkDeAJ94FC1Zy0mS+y4jXpNHGSPIpGvc2QCfYl+D
oxLqfgqj0GUKOD/7iRXUPfs=
=6Gjc
-----END PGP SIGNATURE-----
On Thu, 19 Oct 2006 09:05:49 -0700
teunis <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Setting the internal clock to 100 Hz stablizes the laptop - and the
> synaptics touchpad stops "crashing" (when "crashed" the pad reads out
> all kinds of seemingly random values). I would suspect the driver
> needs adjusting for the variable clock. Also - it's definitely nicer
> on the laptop power use as far as I can tell - should this be in the
> documentation?
So you're saying that CONFIG_NO_HZ breaks the touchpad?
> I'm very grateful that compact flash-based booting on a SATA system
> works well. It hasn't been so reliable in 2.6.19-rc2-mm1 for IDE/CF
> adaptors but I haven't yet solved why. (tested with various laptops)
hm. What goes wrong?
> resume from "suspend to ram" (ACPI S3 mode) - the keyboard and mouse do
> not recover on 945G chipset. Note that otherwise the chipset works
> well in 2.6.19-rc2-mm1 - and this is the first kernel that does work well).
So this might not be a new bug?
> LVM2 - when adding and removing physical volumes (again, on Compact
> Flash cards via USB and Firewire adaptors) - it doesn't always remove
> the volume properly (pvremove /dev/sda or equiv) from the device-mapper.
> This leaves me unable to plug in another. I suspect this to be an
> LVM2 problem (no hotplug?) rather than a compact flash or SCSI problem.
Can you identify an earlier kernel in which this worked OK?
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Morton wrote:
> On Thu, 19 Oct 2006 09:05:49 -0700
> teunis <[email protected]> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Setting the internal clock to 100 Hz stablizes the laptop - and the
>> synaptics touchpad stops "crashing" (when "crashed" the pad reads out
>> all kinds of seemingly random values). I would suspect the driver
>> needs adjusting for the variable clock. Also - it's definitely nicer
>> on the laptop power use as far as I can tell - should this be in the
>> documentation?
>
> So you're saying that CONFIG_NO_HZ breaks the touchpad?
yes. At least for Acer Travelmate 8000 and HP nx6310 and HP nx7400.
Other than the touchpad - there is not a lot of common hardware between
these units. The readout becomes highly unreliable. (in X it starts
jumping around - it SORT OF resembles the output)
My suspicion is a timing problem in the synaptic USB driver - but I'm
not familiar with timing on kernels these days. It's been a few years
since I last really did any kernel work.
>
>> I'm very grateful that compact flash-based booting on a SATA system
>> works well. It hasn't been so reliable in 2.6.19-rc2-mm1 for IDE/CF
>> adaptors but I haven't yet solved why. (tested with various laptops)
>
> hm. What goes wrong?
Fails to boot some of the time on SanDisk ultraII 8.0GB and SanDisk
Extreme IV 8.0GB. The latter is less stable. It crashes during GRUB
read actually so I haven't been entirely sure it's a kernel problem...
>
>> resume from "suspend to ram" (ACPI S3 mode) - the keyboard and mouse do
>> not recover on 945G chipset. Note that otherwise the chipset works
>> well in 2.6.19-rc2-mm1 - and this is the first kernel that does work well).
>
> So this might not be a new bug?
A regression. 2.6.19-rc1-git6 seemed to work actually. I couldn't
get any other kernel to work though... mind you, 2.6.18 worked as well
although the intel 945G driver did not - so X was operating under VESA only.
>> LVM2 - when adding and removing physical volumes (again, on Compact
>> Flash cards via USB and Firewire adaptors) - it doesn't always remove
>> the volume properly (pvremove /dev/sda or equiv) from the device-mapper.
>> This leaves me unable to plug in another. I suspect this to be an
>> LVM2 problem (no hotplug?) rather than a compact flash or SCSI problem.
>
> Can you identify an earlier kernel in which this worked OK?
As far as I can it never has.
I've only tested it with:
2.6.16-debian, 2.6.17-debian, 2.6.18, 2.6.19, 2.6.19-rc1,
2.6.19-rc1-git4, 2.6.19-rc1-git6, 2.6.19-rc2, 2.6.19-rc2-mm1 (which
otherwise works quite nicely)
Thank you!
- Teunis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFOPmhbFT/SAfwLKMRAgtIAKCViVmBluTbdpRSYMxzYdEo5Qd7OQCgrhOJ
jtsnxbRGSlvSJB/WkYbrWrI=
=WG3a
-----END PGP SIGNATURE-----
On Fri, 20 Oct 2006 09:30:37 -0700
teunis <[email protected]> wrote:
>
Please don't play with the Cc:s! Just do reply-to-all, thanks.
> Andrew Morton wrote:
> > On Thu, 19 Oct 2006 09:05:49 -0700
> > teunis <[email protected]> wrote:
> >
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Setting the internal clock to 100 Hz stablizes the laptop - and the
> >> synaptics touchpad stops "crashing" (when "crashed" the pad reads out
> >> all kinds of seemingly random values). I would suspect the driver
> >> needs adjusting for the variable clock. Also - it's definitely nicer
> >> on the laptop power use as far as I can tell - should this be in the
> >> documentation?
> >
> > So you're saying that CONFIG_NO_HZ breaks the touchpad?
>
> yes. At least for Acer Travelmate 8000 and HP nx6310 and HP nx7400.
> Other than the touchpad - there is not a lot of common hardware between
> these units. The readout becomes highly unreliable. (in X it starts
> jumping around - it SORT OF resembles the output)
>
> My suspicion is a timing problem in the synaptic USB driver
OK, that's going to be hard to fix and it'd be awkward (and unpopular) to
make inclusion of the dynamic-ticks feature dependent on fixing this.
(Then again, it'd get Ingo into device drivers ;))
However I would suggest that NO_HZ (at least) be dependent upon
CONFIG_EXPERIMENTAL, no?
Also, NO_HZ breaks my laptop (and presumably quite a few others) quite
horridly, which means nobody can ship the feature. Some runtime
turn-it-off work needs to be done there.
> - but I'm
> not familiar with timing on kernels these days. It's been a few years
> since I last really did any kernel work.
>
> >
> >> I'm very grateful that compact flash-based booting on a SATA system
> >> works well. It hasn't been so reliable in 2.6.19-rc2-mm1 for IDE/CF
> >> adaptors but I haven't yet solved why. (tested with various laptops)
> >
> > hm. What goes wrong?
>
> Fails to boot some of the time on SanDisk ultraII 8.0GB and SanDisk
> Extreme IV 8.0GB. The latter is less stable. It crashes during GRUB
> read actually so I haven't been entirely sure it's a kernel problem...
Don't know, sorry.
> >
> >> resume from "suspend to ram" (ACPI S3 mode) - the keyboard and mouse do
> >> not recover on 945G chipset. Note that otherwise the chipset works
> >> well in 2.6.19-rc2-mm1 - and this is the first kernel that does work well).
> >
> > So this might not be a new bug?
>
> A regression. 2.6.19-rc1-git6 seemed to work actually. I couldn't
> get any other kernel to work though... mind you, 2.6.18 worked as well
> although the intel 945G driver did not - so X was operating under VESA only.
So you're saying that there is some patch in the -mm lineup which causes
the keyboard and mouse to die after resume-from-RAM?
What makes you think this is related to the 945G support?
> >> LVM2 - when adding and removing physical volumes (again, on Compact
> >> Flash cards via USB and Firewire adaptors) - it doesn't always remove
> >> the volume properly (pvremove /dev/sda or equiv) from the device-mapper.
> >> This leaves me unable to plug in another. I suspect this to be an
> >> LVM2 problem (no hotplug?) rather than a compact flash or SCSI problem.
> >
> > Can you identify an earlier kernel in which this worked OK?
>
> As far as I can it never has.
>
> I've only tested it with:
> 2.6.16-debian, 2.6.17-debian, 2.6.18, 2.6.19, 2.6.19-rc1,
> 2.6.19-rc1-git4, 2.6.19-rc1-git6, 2.6.19-rc2, 2.6.19-rc2-mm1 (which
> otherwise works quite nicely)
Please try 2.6.19-rc2-mm2: it has a blockdev refcounting fix (well - a
change, at least) which could conceivable help here.
Also, if you're keen,
http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt will
allow us to identify the problematic -mm patches. It would be super-useful
is you could do that.
Thanks.
On Fri, 2006-10-20 at 11:07 -0700, Andrew Morton wrote:
> > > So you're saying that CONFIG_NO_HZ breaks the touchpad?
> >
> > yes. At least for Acer Travelmate 8000 and HP nx6310 and HP nx7400.
> > Other than the touchpad - there is not a lot of common hardware between
> > these units. The readout becomes highly unreliable. (in X it starts
> > jumping around - it SORT OF resembles the output)
> >
> > My suspicion is a timing problem in the synaptic USB driver
>
> OK, that's going to be hard to fix and it'd be awkward (and unpopular) to
> make inclusion of the dynamic-ticks feature dependent on fixing this.
> (Then again, it'd get Ingo into device drivers ;))
Maybe Ingo is the lesser evil than me when it comes down to device
drivers :)
> However I would suggest that NO_HZ (at least) be dependent upon
> CONFIG_EXPERIMENTAL, no?
Fair enough.
> Also, NO_HZ breaks my laptop (and presumably quite a few others) quite
> horridly, which means nobody can ship the feature. Some runtime
> turn-it-off work needs to be done there.
We can make a commandline switch as for highres. Is that sufficient ?
tglx
On Fri, 20 Oct 2006 20:13:54 +0200
Thomas Gleixner <[email protected]> wrote:
> > Also, NO_HZ breaks my laptop (and presumably quite a few others) quite
> > horridly, which means nobody can ship the feature. Some runtime
> > turn-it-off work needs to be done there.
>
> We can make a commandline switch as for highres. Is that sufficient ?
I doubt it.
I don't know how many machines will be affected by this, but I'd expect
it's quite a few - the Vaio has a less-than-one-year-old Intel CPU in it.
I'd expect that if a distro were to enable NO_HZ, they'd have a large
number of unhappy users whose machines run like crap, some of whom would
find out that they need to add some funny dont-run-like-crap option and
some of whom would, after wasting considerable amounts of time, just give
up and use windows or RH5.2 or something.
IOW, it would be vastly better to make it simply work out-of-the-box.
On Fri, 2006-10-20 at 11:26 -0700, Andrew Morton wrote:
> On Fri, 20 Oct 2006 20:13:54 +0200
> Thomas Gleixner <[email protected]> wrote:
>
> > > Also, NO_HZ breaks my laptop (and presumably quite a few others) quite
> > > horridly, which means nobody can ship the feature. Some runtime
> > > turn-it-off work needs to be done there.
> >
> > We can make a commandline switch as for highres. Is that sufficient ?
>
> I doubt it.
>
> I don't know how many machines will be affected by this, but I'd expect
> it's quite a few - the Vaio has a less-than-one-year-old Intel CPU in it.
>
> I'd expect that if a distro were to enable NO_HZ, they'd have a large
> number of unhappy users whose machines run like crap, some of whom would
> find out that they need to add some funny dont-run-like-crap option and
> some of whom would, after wasting considerable amounts of time, just give
> up and use windows or RH5.2 or something.
>
> IOW, it would be vastly better to make it simply work out-of-the-box.
Sorry, I misinterpreted the "runtime turn-it-off work".
tglx
On Fri, 2006-10-20 at 11:26 -0700, Andrew Morton wrote:
> On Fri, 20 Oct 2006 20:13:54 +0200
> Thomas Gleixner <[email protected]> wrote:
>
> > > Also, NO_HZ breaks my laptop (and presumably quite a few others) quite
> > > horridly, which means nobody can ship the feature. Some runtime
> > > turn-it-off work needs to be done there.
> >
> > We can make a commandline switch as for highres. Is that sufficient ?
>
> I doubt it.
>
> I don't know how many machines will be affected by this, but I'd expect
> it's quite a few - the Vaio has a less-than-one-year-old Intel CPU in it.
Is this still the broken lapic issue ? I think about a detection
mechanism for that one.
tglx
On 10/20/06, Andrew Morton <[email protected]> wrote:
> On Fri, 20 Oct 2006 09:30:37 -0700
> teunis <[email protected]> wrote:
>
> >
>
> Please don't play with the Cc:s! Just do reply-to-all, thanks.
>
> > Andrew Morton wrote:
> > > On Thu, 19 Oct 2006 09:05:49 -0700
> > > teunis <[email protected]> wrote:
> > >
> > >> -----BEGIN PGP SIGNED MESSAGE-----
> > >> Hash: SHA1
> > >>
> > >> Setting the internal clock to 100 Hz stablizes the laptop - and the
> > >> synaptics touchpad stops "crashing" (when "crashed" the pad reads out
> > >> all kinds of seemingly random values). I would suspect the driver
> > >> needs adjusting for the variable clock. Also - it's definitely nicer
> > >> on the laptop power use as far as I can tell - should this be in the
> > >> documentation?
> > >
> > > So you're saying that CONFIG_NO_HZ breaks the touchpad?
> >
> > yes. At least for Acer Travelmate 8000 and HP nx6310 and HP nx7400.
> > Other than the touchpad - there is not a lot of common hardware between
> > these units. The readout becomes highly unreliable. (in X it starts
> > jumping around - it SORT OF resembles the output)
> >
> > My suspicion is a timing problem in the synaptic USB driver
>
> OK, that's going to be hard to fix and it'd be awkward (and unpopular) to
> make inclusion of the dynamic-ticks feature dependent on fixing this.
> (Then again, it'd get Ingo into device drivers ;))
>
I wonder if the problem is with the in-kernel synaptics driver or with
X (itself or synaptics driver in it). Does the touchpad misbehaves
when you using GPM on text console? What about when you using legacy
mouse driver (as opposed to synaptics) in X?
--
Dmitry
On Fri, 20 Oct 2006 20:46:55 +0200
Thomas Gleixner <[email protected]> wrote:
> On Fri, 2006-10-20 at 11:26 -0700, Andrew Morton wrote:
> > On Fri, 20 Oct 2006 20:13:54 +0200
> > Thomas Gleixner <[email protected]> wrote:
> >
> > > > Also, NO_HZ breaks my laptop (and presumably quite a few others) quite
> > > > horridly, which means nobody can ship the feature. Some runtime
> > > > turn-it-off work needs to be done there.
> > >
> > > We can make a commandline switch as for highres. Is that sufficient ?
> >
> > I doubt it.
> >
> > I don't know how many machines will be affected by this, but I'd expect
> > it's quite a few - the Vaio has a less-than-one-year-old Intel CPU in it.
>
> Is this still the broken lapic issue ?
yup. iirc the standard FC5 SMP kernel runs dog-slowly on that machine too.
> I think about a detection
> mechanism for that one.
Thanks.
* Andrew Morton <[email protected]> wrote:
> > > I don't know how many machines will be affected by this, but I'd
> > > expect it's quite a few - the Vaio has a less-than-one-year-old
> > > Intel CPU in it.
> >
> > Is this still the broken lapic issue ?
>
> yup. iirc the standard FC5 SMP kernel runs dog-slowly on that machine
> too.
hm. This is how lapic timer calibration works.
the lapic timer is really simple - it counts down from a value and
generates an irq if that counter reaches 0. Then it starts counting down
again.
the 'count down from' value is programmed via __setup_APIC_LVTT().
we first write a 'really large' number into it (1 billion):
__setup_APIC_LVTT(1000000000);
the unit of counting is '16 system bus cycles'.
i.e. if your system has a system bus of 333 MHz, then a value of 1
billion takes 48 seconds to count down. (so the calibration ought to be
pretty robust in this regard.)
then we use the wait_timer_tick() function, which waits until the PIT
counter reaches 0 (which is attached to the PIT whose frequency we know
and thus the PIT is already programmed correctly). Hence by calling
wait_timer_tick() we can generate a delay of one jiffy - and we can read
out the current lapic timer count and determine the calibration factor.
then we calculate the result as:
result = (tt1-tt2)*APIC_DIVISOR/LOOPS;
where tt1 is the counter before we start calibration, tt2 is the lapic
timer counter after we did calibration. (APIC_DIVISOR is 16)
i dont see where the error is - but there must be some calibration
problem as your system shows a systematic 1:60 difference between
expected and real lapic timer frequency.
Ingo
On Fri, 20 Oct 2006 22:37:31 +0200
Ingo Molnar <[email protected]> wrote:
>
> * Andrew Morton <[email protected]> wrote:
>
> > > > I don't know how many machines will be affected by this, but I'd
> > > > expect it's quite a few - the Vaio has a less-than-one-year-old
> > > > Intel CPU in it.
> > >
> > > Is this still the broken lapic issue ?
> >
> > yup. iirc the standard FC5 SMP kernel runs dog-slowly on that machine
> > too.
>
> hm. This is how lapic timer calibration works.
>
> the lapic timer is really simple - it counts down from a value and
> generates an irq if that counter reaches 0. Then it starts counting down
> again.
>
> the 'count down from' value is programmed via __setup_APIC_LVTT().
>
> we first write a 'really large' number into it (1 billion):
>
> __setup_APIC_LVTT(1000000000);
>
> the unit of counting is '16 system bus cycles'.
>
> i.e. if your system has a system bus of 333 MHz, then a value of 1
> billion takes 48 seconds to count down. (so the calibration ought to be
> pretty robust in this regard.)
>
> then we use the wait_timer_tick() function, which waits until the PIT
> counter reaches 0 (which is attached to the PIT whose frequency we know
> and thus the PIT is already programmed correctly). Hence by calling
> wait_timer_tick() we can generate a delay of one jiffy - and we can read
> out the current lapic timer count and determine the calibration factor.
>
> then we calculate the result as:
>
> result = (tt1-tt2)*APIC_DIVISOR/LOOPS;
>
> where tt1 is the counter before we start calibration, tt2 is the lapic
> timer counter after we did calibration. (APIC_DIVISOR is 16)
>
> i dont see where the error is - but there must be some calibration
> problem as your system shows a systematic 1:60 difference between
> expected and real lapic timer frequency.
>
Oh. I thought the problem was that the timer stops when the CPU is idle.
Maybe I misremembered. I'll try `idle=poll'.
* Andrew Morton <[email protected]> wrote:
> Oh. I thought the problem was that the timer stops when the CPU is
> idle. Maybe I misremembered. I'll try `idle=poll'.
hm, wouldnt in that case the box not boot at all? But yeah, idle=poll
would be nice.
could you also boot with apic=verbose and send us the full bootlog?
Ingo
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dmitry Torokhov wrote:
> On 10/20/06, Andrew Morton <[email protected]> wrote:
>> On Fri, 20 Oct 2006 09:30:37 -0700
>> teunis <[email protected]> wrote:
>>
>> >
>>
>> Please don't play with the Cc:s! Just do reply-to-all, thanks.
>>
>> > Andrew Morton wrote:
>> > > On Thu, 19 Oct 2006 09:05:49 -0700
>> > > teunis <[email protected]> wrote:
>> > >
>> > >> -----BEGIN PGP SIGNED MESSAGE-----
>> > >> Hash: SHA1
>> > >>
>> > >> Setting the internal clock to 100 Hz stablizes the laptop - and the
>> > >> synaptics touchpad stops "crashing" (when "crashed" the pad
>> reads out
>> > >> all kinds of seemingly random values). I would suspect the driver
>> > >> needs adjusting for the variable clock. Also - it's definitely
>> nicer
>> > >> on the laptop power use as far as I can tell - should this be in the
>> > >> documentation?
>> > >
>> > > So you're saying that CONFIG_NO_HZ breaks the touchpad?
>> >
>> > yes. At least for Acer Travelmate 8000 and HP nx6310 and HP nx7400.
>> > Other than the touchpad - there is not a lot of common hardware between
>> > these units. The readout becomes highly unreliable. (in X it starts
>> > jumping around - it SORT OF resembles the output)
>> >
>> > My suspicion is a timing problem in the synaptic USB driver
>>
>> OK, that's going to be hard to fix and it'd be awkward (and unpopular) to
>> make inclusion of the dynamic-ticks feature dependent on fixing this.
>> (Then again, it'd get Ingo into device drivers ;))
>>
>
> I wonder if the problem is with the in-kernel synaptics driver or with
> X (itself or synaptics driver in it). Does the touchpad misbehaves
> when you using GPM on text console? What about when you using legacy
> mouse driver (as opposed to synaptics) in X?
>
Not sure - but testing now.
on the flip side it seems that ACPI C3 (???) - restore after suspend to
RAM anyways - halts high resolution timer and NO_HZ on one of the laptops.
At that point the synaptics freezes solid.
I've had CONFIG_NO_HZ disabled already but am now testing with high
resolution timer turned off. (I'm too used to desktops)
with that test I can be -almost- certain it's a kernel problem.
now off to figure why ndiswrapper now doesn't load.... (it's a GPL
module and the kernel claims it isn't... something changed but I'm not
sure what yet as it works with rc1-git6)
- - Teunis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFOU0GbFT/SAfwLKMRAiufAJ4nds4qf7e29GWHTwabrrqV/kjt+wCdE+h4
OFnqetyHrxg5O8GyNErgA2U=
=oc4d
-----END PGP SIGNATURE-----
On Fri, 20 Oct 2006 22:56:51 +0200
Ingo Molnar <[email protected]> wrote:
>
> * Andrew Morton <[email protected]> wrote:
>
> > Oh. I thought the problem was that the timer stops when the CPU is
> > idle. Maybe I misremembered. I'll try `idle=poll'.
>
> hm, wouldnt in that case the box not boot at all? But yeah, idle=poll
> would be nice.
idle=poll fixes it. The fan gets a bit noisy though ;)
Perhaps a suitable test would be to set up a PIT interrupt, do a hlt, see
if the APIC timer counter has increased appropriately.
I got this:
[ 43.709238] TSC appears to be running slowly. Marking it as unstable
How come? It also happens with HIGH_RES_TIMERS=n and NO_HZ=n. It only
seems to happen when idle=poll is given.
> could you also boot with apic=verbose and send us the full bootlog?
>
http://userweb.kernel.org/~akpm/apic.txt
I gave up on waiting for it to complete initscripts.
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 13
model name : Intel(R) Pentium(R) M processor 2.00GHz
stepping : 8
cpu MHz : 800.000
cache size : 2048 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe nx est tm2
bogomips : 3994.15
On Fri, 2006-10-20 at 18:25 -0700, Andrew Morton wrote:
> On Fri, 20 Oct 2006 22:56:51 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Andrew Morton <[email protected]> wrote:
> >
> > > Oh. I thought the problem was that the timer stops when the CPU is
> > > idle. Maybe I misremembered. I'll try `idle=poll'.
> >
> > hm, wouldnt in that case the box not boot at all? But yeah, idle=poll
> > would be nice.
>
> idle=poll fixes it. The fan gets a bit noisy though ;)
So this is one of the boxen where C2 is actually C3 and lapic stops in
C3 mode. Probably BIOS magic.
What's the output of /proc/acpi/processor/CPU0/power ?
> Perhaps a suitable test would be to set up a PIT interrupt, do a hlt, see
> if the APIC timer counter has increased appropriately.
Yeah, but it has to be done later in the boot process. Looking into this
right now.
> I got this:
>
> [ 43.709238] TSC appears to be running slowly. Marking it as unstable
>
> How come? It also happens with HIGH_RES_TIMERS=n and NO_HZ=n. It only
> seems to happen when idle=poll is given.
Should happen always as the TSC is driven by the CPU clock and you have
CPUFREQ enabled.
> > could you also boot with apic=verbose and send us the full bootlog?
>
> http://userweb.kernel.org/~akpm/apic.txt
[ 11.515305] calibrating APIC timer ...
[ 11.618612] ..... tt1-tt2 831283
[ 11.618614] ..... mult: 35701101
[ 11.618616] ..... calibration result: 532021
[ 11.618619] ..... CPU clock speed is 1995.0325 MHz.
[ 11.618622] ..... host bus clock speed is 133.0021 MHz.
That looks reasonable. It really boils down to the lapic not working
when going idle.
tglx
On Fri, 2006-10-20 at 11:26 -0700, Andrew Morton wrote:
> On Fri, 20 Oct 2006 20:13:54 +0200
> Thomas Gleixner <[email protected]> wrote:
>
> > > Also, NO_HZ breaks my laptop (and presumably quite a few others) quite
> > > horridly, which means nobody can ship the feature. Some runtime
> > > turn-it-off work needs to be done there.
> >
> > We can make a commandline switch as for highres. Is that sufficient ?
>
> I doubt it.
>
> I don't know how many machines will be affected by this, but I'd expect
> it's quite a few - the Vaio has a less-than-one-year-old Intel CPU in it.
>
> I'd expect that if a distro were to enable NO_HZ, they'd have a large
> number of unhappy users whose machines run like crap, some of whom would
> find out that they need to add some funny dont-run-like-crap option and
> some of whom would, after wasting considerable amounts of time, just give
> up and use windows or RH5.2 or something.
well NO_HZ as is is incompatible with any machine which has support for
the C3 state (at least if they have an Intel CPU) since the local apic
timer just plain stops in C3 unfortunately.
We really need to think about using HPET for this, and potentially on a
single socket system, not do per core timer queues but just 1 global
timer queue.
On Sat, 2006-10-21 at 11:49 +0200, Thomas Gleixner wrote:
> [ 11.515305] calibrating APIC timer ...
> [ 11.618612] ..... tt1-tt2 831283
> [ 11.618614] ..... mult: 35701101
> [ 11.618616] ..... calibration result: 532021
> [ 11.618619] ..... CPU clock speed is 1995.0325 MHz.
> [ 11.618622] ..... host bus clock speed is 133.0021 MHz.
>
> That looks reasonable. It really boils down to the lapic not working
> when going idle.
This LAPIC business is weird. I found two boxen, where the LAPIC timer
calibration is wrong by factor 1.8 and 2.3 on every third/fifth boot.
Unsurprisingly one is a VAIO with a CoreDuo inside, which claims to have
a 4.6GHz CPU and 390MHz bus speed occasionally. This problem seems to be
independent of the "lapic stops on C2" one.
I have a patch ready, which should detect both problems, but having
acpi_processor as a module is painful, as we might enable the C2 states
way after we decided to use the LAPIC timer and switched over to
highres/dyntick mode. I need to find a way to back out from
highres/dyntick mode gracefully in that case except we can agree to make
the acpi_processor bits built-in only or at least make the Kconfig
tristate depending on experimental. Len ?
tglx
On Sun, 22 Oct 2006 23:22:39 +0200
Thomas Gleixner <[email protected]> wrote:
> This LAPIC business is weird.
So I tested your latest set of patches on the Vaio. Still not very good.
It all _seems_ to work for a while. But after a suspend-to-disk/resume
cycle (which may not be relevant) and five-odd minutes uptime the machine
shat itself.
See http://userweb.kernel.org/~akpm/1.txt for the whole log and
http://userweb.kernel.org/~akpm/config-sony.txt for the config.
It's presently sitting in an xterm echoing keyboard input and permitting
the mouse cursor to move, but it doesn't do anything else.
I have a bad feeling about the hrtimer+dynticks patches, frankly. We had a
lot of discussion and review of the original patchset and it almost all
seemed OK apart from this tsc-goes-silly problem. But then this lot:
highres-timer-core-fix-status-check.patch
highres-timer-core-fix-commandline-setup.patch
clockevents-smp-on-up-features.patch
highres-depend-on-clockevents.patch
i386-apic-cleanup.patch
pm-timer-allow-early-access.patch
i386-lapic-timer-calibration.patch
clockevents-add-broadcast-support.patch
clockevents-add-broadcast-support-fix.patch
acpi-include-apic-h.patch
acpi-include-apic-h-fix.patch
acpi-keep-track-of-timer-broadcast.patch
i386-apic-timer-use-clockevents-broadcast.patch
acpi-verify-lapic-timer.patch
acpi-verify-lapic-timer-exports.patch
acpi-verify-lapic-timer-fix.patch
got merged and I haven't looked at any of that and I don't know that anyone
else has and I don't even know if anyone knows what's in there.
But I do know that it fiddles with APICs, and they are quick to anger. I
have little confidence in merging all of that material.
I'll retain it all for a while so that we can continue to try to fix this
APIC problem but if/when we get that done I think it's time to drop all of
it and start again, because APIC changes really do need a lot of careful
review and thought.
<cycles the power>
No, it's no good at all. This time it just went back to its old ways of
taking a month to get through initscripts.
On Tue, 2006-11-07 at 23:19 -0800, Andrew Morton wrote:
> I have a bad feeling about the hrtimer+dynticks patches, frankly. We had a
> lot of discussion and review of the original patchset and it almost all
> seemed OK apart from this tsc-goes-silly problem. But then this lot:
>
> highres-timer-core-fix-status-check.patch
> highres-timer-core-fix-commandline-setup.patch
> clockevents-smp-on-up-features.patch
> highres-depend-on-clockevents.patch
Trivial fixups
> i386-apic-cleanup.patch
> pm-timer-allow-early-access.patch
> i386-lapic-timer-calibration.patch
This one solves real problems:
- hang in lapic calibration caused by buggy PIT readouts
- wrong lapic calibration seen on my VAIO CoreDuo (also reported by
others)
I did the i386-apic-cleanup.patch first, as I really did not want to add
more mess to the existing one. Sigh, I should have done that before
adding the clock events support.
> clockevents-add-broadcast-support.patch
> clockevents-add-broadcast-support-fix.patch
> acpi-include-apic-h.patch
> acpi-include-apic-h-fix.patch
> acpi-keep-track-of-timer-broadcast.patch
> i386-apic-timer-use-clockevents-broadcast.patch
Needs review
> acpi-verify-lapic-timer.patch
> acpi-verify-lapic-timer-exports.patch
> acpi-verify-lapic-timer-fix.patch
Those can be dropped, as the approach was too naive. At least we know,
that it can be detected, but this needs more effort to get this
straight.
I'm resorting to the following solution for now:
- Disable local APIC timer as the next event source on UP systems by
default.
- Add a command line option to enable it on sane hardware, as it is
faster.
I send a patch tomorrow morning including the fix for the OOPS reported
by Benoit Boissinot. -ETOOTIRED
If this works on your jinxed VAIO, I do a complete replacement rollup
for review.
tglx