Hello
I've been running until now 2.6.3 without problem. I had a test 2.6.7
kernel with reiserfs debugging enabled, and indeed it is running slow. Now
I compiled 2.6.9 without reiserfs debugging, but it is still slow... In
the attached tarball dmesg outputs plus configs for all 3 kernels. "Slow"
means just running top alone in a vt it takes 1.6% CPU. Under 2.6.3 it
takes 0.2% (Duron 900MHz). Another peculiarity with 2.6.7 and 2.6.9 is
that the power LED is blinking with about 1Hz frequency. It's an ASUS
A7VI-VM motherboard. In the manual there's nothing about error-codes. I
played with various APIC settings - no change. Related or not - if running
with LAPIC enabled (2.6.7), I get quite a few ERR in /proc/interrupts.
Any idea?
If not - will start compiling and testing different kernels. Does Reiserfs
(3) now always do debugging?...
Thanks
Guennadi
---
Guennadi Liakhovetski
On Thu, 18 Nov 2004, Guennadi Liakhovetski wrote:
> I've been running until now 2.6.3 without problem. I had a test 2.6.7
> kernel with reiserfs debugging enabled, and indeed it is running slow. Now
> I compiled 2.6.9 without reiserfs debugging, but it is still slow... In
Indeed, booting with "acpi=off" brings the system back to normal under
2.6.9. I'll try to see if I can norrow it down (after Saturday - I'm away
for 2 days now), but ideas are welcome.
Thanks
Guennadi
---
Guennadi Liakhovetski
(added linux-acpi)
Guennadi Liakhovetski <[email protected]> wrote:
>
> On Thu, 18 Nov 2004, Guennadi Liakhovetski wrote:
>
> > I've been running until now 2.6.3 without problem. I had a test 2.6.7
> > kernel with reiserfs debugging enabled, and indeed it is running slow. Now
> > I compiled 2.6.9 without reiserfs debugging, but it is still slow... In
>
> Indeed, booting with "acpi=off" brings the system back to normal under
> 2.6.9. I'll try to see if I can norrow it down (after Saturday - I'm away
> for 2 days now), but ideas are welcome.
>
I guess a kernel profile would be useful.
- Add "profile=1" to the kernel boot command line
- Run some workload
- In another xterm, do:
readprofile -r
sleep 10
readprofile -n -v -m /boot/System.map | sort -n +2 | tail -40
(make sure that you're using the correct System.map for the
currently-running kernel).
On Wed, 17 Nov 2004, Andrew Morton wrote:
> (added linux-acpi)
Thanks for the reply and sorry for the delay - was away for 2 days.
But now I've got some new data, that may give some hints.
1) I played with BIOS setup, switched off, switched on the PC, booted into
different kernels, maybe, re-configured and re-installed kernels, and
then, at some point, the problem was gone.
2) Today I added a coupld of points to the 2.6.9 config (attached), and
the problem re-appeared.
3) I noticed, the problem (blinking power LED) starts at
/etc/init.d/lm_sensors.start. More precisely - after sensors -s is run. At
the same time the system performance drops (you might know that this is
related to sensors already - do you?)
> readprofile -r
> sleep 10
> readprofile -n -v -m /boot/System.map | sort -n +2 | tail -40
Not sure, if this is still needed, but here goes: before sensors:
c0118b80 task_prio 1 0.0625
c0119630 task_curr 1 0.0125
c01197c0 add_wait_queue 1 0.0104
c010aa80 do_gettimeofday 2 0.0114
c01182d0 nr_running 2 0.1250
c0119610 in_sched_functions 2 0.0625
c0105143 syscall_exit 4 0.3636
c0119d00 mmput 4 0.0250
c0105138 syscall_call 6 0.5455
c0119da0 get_task_mm 26 0.2321
c0103d20 get_wchan 60 0.4167
c010510c system_call 100 2.2727
00000000 total 209 0.0019
and after:
c0105138 syscall_call 1 0.0909
c0116710 pgd_ctor 1 0.0063
c0118b80 task_prio 1 0.0625
c0119ec0 copy_mm 1 0.0010
c010aa80 do_gettimeofday 2 0.0114
c0118b90 task_nice 2 0.1250
c0119610 in_sched_functions 2 0.0625
c0103db0 get_free_idx 3 0.0469
c0105143 syscall_exit 3 0.2727
c0119d00 mmput 3 0.0187
c0116ab0 do_page_fault 4 0.0026
c01197c0 add_wait_queue 5 0.0521
c0119630 task_curr 8 0.1000
c0119da0 get_task_mm 31 0.2768
c0103d20 get_wchan 36 0.2500
c010510c system_call 95 2.1591
00000000 total 198 0.0018
In my /etc/sysconfig/lm_sensors:
MODULE_0=i2c_dev
MODULE_1=i2c_isa
MODULE_2=via686a
Thanks
Guennadi
---
Guennadi Liakhovetski
On Wed, 2004-11-17 at 19:25, Guennadi Liakhovetski wrote:
> "Slow" means just running top alone in a vt it takes 1.6% CPU. Under
> 2.6.3 it takes 0.2% (Duron 900MHz). Another peculiarity with 2.6.7 and
> 2.6.9 is that the power LED is blinking with about 1Hz frequency. It's
> an ASUS A7VI-VM motherboard. In the manual there's nothing about
> error-codes. I played with various APIC settings - no change. Related
> or not - if running with LAPIC enabled (2.6.7), I get quite a few ERR
> in /proc/interrupts.
PCI: Disabling Via external APIC routing
Curiously, this line appears in 2.6.3, but not in 2.6.7 or 2.6.9 dmesg
-- even though all the configs build in IOAPIC support.
Can you forward the /proc/interrupts from 2.6.3, and from 2.6.9 with and
without acpi=off? do you see a significant change in /proc/interrupts
before and after the sensor-provoked slowness starts?
if you build 2.6.9 w/o the CONFIG_ACPI_PROCESSOR and boot w/o cmdline
params, do you still see slowness?
if you boot 2.6.9 with these parameters, do you see any additional dmesg
lines?
acpi_dbg_level=0xF acpi_dbg_layer=0xFFFF3FFF
thanks,
-Len
Thanks for the reply
On Tue, 23 Nov 2004, Len Brown wrote:
> On Wed, 2004-11-17 at 19:25, Guennadi Liakhovetski wrote:
> > "Slow" means just running top alone in a vt it takes 1.6% CPU. Under
> > 2.6.3 it takes 0.2% (Duron 900MHz). Another peculiarity with 2.6.7 and
> > 2.6.9 is that the power LED is blinking with about 1Hz frequency. It's
> > an ASUS A7VI-VM motherboard. In the manual there's nothing about
> > error-codes. I played with various APIC settings - no change. Related
> > or not - if running with LAPIC enabled (2.6.7), I get quite a few ERR
> > in /proc/interrupts.
>
> PCI: Disabling Via external APIC routing
>
> Curiously, this line appears in 2.6.3, but not in 2.6.7 or 2.6.9 dmesg
> -- even though all the configs build in IOAPIC support.
Well, I think, there's just a local APIC on the system, and that is
disabled in BIOS (there's no way to enable it). 2.6.3 disables VIA
external APIC routing, as you noticed, whereas 2.6.7 sais
Local APIC disabled by BIOS -- reenabling.
Found and enabled local APIC!
2.6.9 respects BIOS decision (I think, there was a thread on LKML on
this?):
No local APIC present or hardware disabled
> Can you forward the /proc/interrupts from 2.6.3, and from 2.6.9 with and
> without acpi=off? do you see a significant change in /proc/interrupts
> before and after the sensor-provoked slowness starts?
>
> if you build 2.6.9 w/o the CONFIG_ACPI_PROCESSOR and boot w/o cmdline
> params, do you still see slowness?
>
> if you boot 2.6.9 with these parameters, do you see any additional dmesg
> lines?
>
> acpi_dbg_level=0xF acpi_dbg_layer=0xFFFF3FFF
I'll try to do all this tomorrow and report results.
Thanks
Guennadi
---
Guennadi Liakhovetski
(added Andrew to CC as he also answered my original email. Don't know if
[email protected] allows non-subscribers)
On Tue, 23 Nov 2004, Len Brown wrote:
> On Wed, 2004-11-17 at 19:25, Guennadi Liakhovetski wrote:
> > "Slow" means just running top alone in a vt it takes 1.6% CPU. Under
> > 2.6.3 it takes 0.2% (Duron 900MHz). Another peculiarity with 2.6.7 and
> > 2.6.9 is that the power LED is blinking with about 1Hz frequency. It's
> > an ASUS A7VI-VM motherboard. In the manual there's nothing about
>
> PCI: Disabling Via external APIC routing
>
> Curiously, this line appears in 2.6.3, but not in 2.6.7 or 2.6.9 dmesg
> -- even though all the configs build in IOAPIC support.
>
> Can you forward the /proc/interrupts from 2.6.3, and from 2.6.9 with and
> without acpi=off? do you see a significant change in /proc/interrupts
> before and after the sensor-provoked slowness starts?
>
> if you build 2.6.9 w/o the CONFIG_ACPI_PROCESSOR and boot w/o cmdline
> params, do you still see slowness?
>
> if you boot 2.6.9 with these parameters, do you see any additional dmesg
> lines?
>
> acpi_dbg_level=0xF acpi_dbg_layer=0xFFFF3FFF
Ok, I started debugging the problem closely, and after booting into 2.6.9
with acpi=off I still could reproduce the problem by starting sensors...
So, I guess, there's no need to do all the acpi debugging you are
suggesting above, right? As for /proc/interrupts with / without acpi and
before / after sensors I don't see any difference. Notice also, that the
slowness doesn't necessarily start immediately after starting sensors, it
can start later, and it can spontaneously stop later. Just now while
typing this email I saw the power LED stopped blinking and the speed went
back to normal.
This reminds me: about a year ago my CPU fan burnt down. Then too, shortly
after booting the PC, it slowed down. Then by accident I noticed in BIOS
CPU temperature 98 deg C. With a new fan problem disappeared.
So, can it be, that the BIOS automatically slows down (throttles) the CPU
at high temperature. And after ~ 2.6.7 sensors program the sensor
interface with some (wrong) coefficient, and then it throttles the CPU
wrongly? Yes, some coefficients are definitely wrong. Here are a couple of
snapshots:
via686a-isa-e200
Adapter: ISA adapter
CPU core: +1.09 V (min = +2.00 V, max = +2.50 V) ALARM
+2.5V: +1.16 V (min = +3.10 V, max = +1.57 V) ALARM
I/O: +3.40 V (min = +4.13 V, max = +4.13 V) ALARM
+5V: +5.55 V (min = +6.44 V, max = +6.44 V) ALARM
+12V: +4.81 V (min = +15.60 V, max = +15.60 V) ALARM
CPU Fan: 5443 RPM (min = 0 RPM, div = 2)
P/S Fan: 0 RPM (min = 0 RPM, div = 2)
SYS Temp: +45.4 C (high = +45 C, hyst = +40 C) ALARM
CPU Temp: +34.5 C (high = +60 C, hyst = +55 C)
SBr Temp: +28.4 C (high = +65 C, hyst = +60 C)
via686a-isa-e200
Adapter: ISA adapter
CPU core: +1.09 V (min = +2.00 V, max = +2.50 V) ALARM
+2.5V: +1.16 V (min = +3.10 V, max = +1.57 V) ALARM
I/O: +3.40 V (min = +4.13 V, max = +4.13 V) ALARM
+5V: +5.55 V (min = +6.44 V, max = +6.44 V) ALARM
+12V: +4.81 V (min = +15.60 V, max = +15.60 V) ALARM
CPU Fan: 5487 RPM (min = 0 RPM, div = 2)
P/S Fan: 0 RPM (min = 0 RPM, div = 2)
SYS Temp: +45.2 C (high = +91 C, hyst = +40 C) ALARM
CPU Temp: +34.4 C (high = +60 C, hyst = +55 C)
SBr Temp: +28.4 C (high = +65 C, hyst = +60 C)
Notice how SYS Temp high changed... Can my guesses be correct and how
can the situation be fixed? Again - no problems with 2.6.3.
Thanks
Guennadi
---
Guennadi Liakhovetski