Booting git snapshot of about 6 hours ago, getting the following:
USB Universal Host Controller Interface driver v3.0
ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21
ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.0 disabled
uhci_hcd 0000:00:10.0: init 0000:00:10.0 fail, -16
uhci_hcd: probe of 0000:00:10.0 failed with error -16
ACPI: PCI Interrupt 0000:00:10.1[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.1 disabled
uhci_hcd 0000:00:10.1: init 0000:00:10.1 fail, -16
uhci_hcd: probe of 0000:00:10.1 failed with error -16
ACPI: PCI Interrupt 0000:00:10.2[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.2 disabled
uhci_hcd 0000:00:10.2: init 0000:00:10.2 fail, -16
uhci_hcd: probe of 0000:00:10.2 failed with error -16
ACPI: PCI Interrupt 0000:00:10.3[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.3 disabled
uhci_hcd 0000:00:10.3: init 0000:00:10.3 fail, -16
uhci_hcd: probe of 0000:00:10.3 failed with error -16
ACPI: PCI Interrupt 0000:00:10.4[C] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
ACPI: PCI interrupt for device 0000:00:10.4 disabled
ehci_hcd 0000:00:10.4: init 0000:00:10.4 fail, -16
ehci_hcd: probe of 0000:00:10.4 failed with error -16
With "pci=routeirq" it is the same, but then it's "IRQ 17" instead of 18,
and the line
ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21
is missing. Works with Debian etch default 2.6.18. /proc/interrupts under
.23-rc9-...:
$ cat /proc/interrupts
CPU0
0: 31756 IO-APIC-edge timer
1: 2 IO-APIC-edge i8042
8: 1 IO-APIC-edge rtc
9: 0 IO-APIC-fasteoi acpi
12: 4 IO-APIC-edge i8042
16: 2627 IO-APIC-fasteoi sata_via
19: 472 IO-APIC-fasteoi eth0
Under 2.6.18:
ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21
ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 177
PCI: VIA IRQ fixup for 0000:00:10.0, from 10 to 1
uhci_hcd 0000:00:10.0: UHCI Host Controller
uhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 1
uhci_hcd 0000:00:10.0: irq 177, io base 0x0000f900
Thanks
Guennadi
---
Guennadi Liakhovetski
On Fri, 5 Oct 2007, Guennadi Liakhovetski wrote:
> Booting git snapshot of about 6 hours ago, getting the following:
>
> USB Universal Host Controller Interface driver v3.0
> ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21
> ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
> ACPI: PCI interrupt for device 0000:00:10.0 disabled
> uhci_hcd 0000:00:10.0: init 0000:00:10.0 fail, -16
> uhci_hcd: probe of 0000:00:10.0 failed with error -16
> With "pci=routeirq" it is the same, but then it's "IRQ 17" instead of 18,
> and the line
>
> ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21
>
> is missing. Works with Debian etch default 2.6.18. /proc/interrupts under
> .23-rc9-...:
What do you get with CONFIG_USB_DEBUG enabled?
Alan Stern
On Fri, 5 Oct 2007, Alan Stern wrote:
> On Fri, 5 Oct 2007, Guennadi Liakhovetski wrote:
>
> > Booting git snapshot of about 6 hours ago, getting the following:
> >
> > USB Universal Host Controller Interface driver v3.0
> > ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21
> > ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 18
> > ACPI: PCI interrupt for device 0000:00:10.0 disabled
> > uhci_hcd 0000:00:10.0: init 0000:00:10.0 fail, -16
> > uhci_hcd: probe of 0000:00:10.0 failed with error -16
>
> > With "pci=routeirq" it is the same, but then it's "IRQ 17" instead of 18,
> > and the line
> >
> > ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21
> >
> > is missing. Works with Debian etch default 2.6.18. /proc/interrupts under
> > .23-rc9-...:
>
> What do you get with CONFIG_USB_DEBUG enabled?
Will try as soon as my bisect is done. Interestingly, both problems with
this system - this one and http://lkml.org/lkml/2007/10/4/417 so far
regress together - already somewhere after 23-rc6 and both USB and
i2c-viapro still work... Might also be some configuration options that got
lost while bisecting .22 - .23-rc9.
Thanks
Guennadi
---
Guennadi Liakhovetski
Hi
Ok, after a day of biseting, it turns out to be a compiler problem. The
gcc-3.3.5 produces at least these two problems (Oops on i2c-viapro probe
and disabled IRQs in USB), whereas 4.1.2 has no problem so far. Up to now
3.3.5 had no problem compiling 2.6.20+ kernels here, for example, for P-II
SMP. Does it at all look realistic that such "random" run-time problems
are caused by a miscompilation?...
Thanks
Guennadi
---
Guennadi Liakhovetski
Guennadi Liakhovetski wrote:
> Hi
>
> Ok, after a day of biseting, it turns out to be a compiler problem. The
> gcc-3.3.5 produces at least these two problems (Oops on i2c-viapro probe
> and disabled IRQs in USB), whereas 4.1.2 has no problem so far. Up to now
> 3.3.5 had no problem compiling 2.6.20+ kernels here, for example, for P-II
> SMP. Does it at all look realistic that such "random" run-time problems
> are caused by a miscompilation?...
I can't say about compilers (but it looks to me somewhat possible still),
but I can say a bit about the platform/CPU. You can find my thread titled
"VIA C7 anyone" from several months back in archives - that was my first
expirience. Since that, I received several emails from others with similar
problems.
The things is that at least boards I'm using, but I suspect it's CPU not
the board, -- are somewhat... flaky, so to say, and their reliability (or
even ability to work) depends on several factors, starting with production
conditions (environment at a time when it has been produced) and up to
various thermal factors.
It seems there's quite significant percentage of C7-based boards that are
flaky/unreliable, replacing one with another from the same batch usually
fixes the prob.
Next, some boards are VERY sensitive to themperature, and their thermal
sensors are WRONG - it seems - 100% of the time, showing ~20% less
themperature than it really is (say, when the sensor shows 35 degrees
celsius, the themp really is about 45..50 degrees) -- when the themperature
(on SOME samples) grows above 40 degrees, the system becomes unreliable
and may crash randomly here and there.
More, due to geometry of the CPU chip, with very small square area that
touches the headsink and relatively large headsink, it's sometimes enouth
to just touch the headsink so it positions wrongly, with bad thermo-contact
between the CPU and the headsink, resulting in high themperatures and
system instability.
And even more interesting -- it seems that some sequence of instructions
are more frequently misinterpreted (under "abnormal" conditions above)
than other sequences doing the same thing. That is, the same program
compiled with gcc-3.4 may work almost 100% correct while the same thing
compiled with gcc-4.1 may almost alway fail (usually due to segmentation
fault), or exactly the opposite.
So umm.... ;)
I'm running VIA C7 on this machine where I'm typing right now - no single
glitch since the time I replaced the failing one (at a time of mentioned
thread), it's rock-solid. I even removed the fan from the headsink - the
themp grows up to 70..80 degrees celsius (according to lm-sensors, so actual
themperature should be higher) when I run CPU-intensive apps, and nothing
breaks. Previous motherboard which I replaced (the same model, from the
same batch) was breaking randomly, but worked relatively stable when
placed on the street and with a large "external" fan used in rooms for
ventilation... ;) even without any load when onboard sensors showed 35
degrees.
That to say - it may be some miscompilations, but may be some probs with
hardware itself. If you can, try to reproduce the same on another board
(I just tried to boot 2.6.23-rc5 on this machine, compiled for PIII CPU
using gcc-4.1.2 (no other version installed, sorry) - no issues so far).
And no, I'm not trying to say "don't use ViA C7" etc -- I just love this
my box, it's very good, powerful enouth and quiet. Just be prepared for
some... issues, which happens - rarely but still.
/mjt
Hi Guennadi,
On Fri, 5 Oct 2007 22:22:08 +0200 (CEST), Guennadi Liakhovetski wrote:
> Ok, after a day of biseting, it turns out to be a compiler problem. The
> gcc-3.3.5 produces at least these two problems (Oops on i2c-viapro probe
> and disabled IRQs in USB), whereas 4.1.2 has no problem so far. Up to now
> 3.3.5 had no problem compiling 2.6.20+ kernels here, for example, for P-II
> SMP. Does it at all look realistic that such "random" run-time problems
> are caused by a miscompilation?...
Miscompilation can do about anything. There have been a number of other
reports about compiler issues lately. Ingo Molnar here:
http://kerneltrap.org/Linux/Compiler_Optimization_Bugs_and_World_Domination
Me here:
http://marc.info/?l=linux-kernel&m=119127234804440&w=2
The trend I am seeing is that we are optimizing for, and testing with,
recent compilers (gcc 4.1 and later) and that older compilers tend to
break, even though compilers as old as gcc 3.2 are still supposed to be
supported. Not good.
--
Jean Delvare
On Sat, 6 Oct 2007, Michael Tokarev wrote:
[snip]
> That to say - it may be some miscompilations, but may be some probs with
> hardware itself. If you can, try to reproduce the same on another board
> (I just tried to boot 2.6.23-rc5 on this machine, compiled for PIII CPU
> using gcc-4.1.2 (no other version installed, sorry) - no issues so far).
Hm, well, I could only compile a i686 kernel and Intel chipset with
"otherwise the same" config with these two compiler options to test...
Maybe some time. No, I do not have another C7 system.
> And no, I'm not trying to say "don't use ViA C7" etc -- I just love this
> my box, it's very good, powerful enouth and quiet. Just be prepared for
> some... issues, which happens - rarely but still.
Well, this very system has been running git and compiling kernels for
itself the whole day today without a single issue. The gcc-3.3.5
miscompiled kernel was compiled on another machine. So, I hope my specific
sample is stable. And I need it to be stable, because I'm going to run my
mail-server on it... BTW, compiled a tickless kernel on it, so far without
looking into user-space after 12 min uptime 40290 timer interrupts, i.e.,
17Hz, not bad.
As for sensors - my system seems to have a w83627ehf chip, "sensors"
output (under 2.6.22) looks pretty funny too:
# sensors
w83627ehf-i2c-9191-290
ERROR: Can't get adapter or algorithm?!?
VCore: +0.98 V (min = +0.00 V, max = +1.74 V)
in1: +12.41 V (min = +3.17 V, max = +9.24 V) ALARM
AVCC: +3.26 V (min = +3.82 V, max = +1.79 V) ALARM
3VCC: +3.26 V (min = +2.86 V, max = +1.49 V) ALARM
in4: +1.54 V (min = +1.38 V, max = +1.46 V) ALARM
in5: +1.59 V (min = +2.04 V, max = +0.95 V) ALARM
in6: +4.71 V (min = +4.48 V, max = +3.05 V) ALARM
VSB: +3.26 V (min = +4.08 V, max = +4.08 V) ALARM
VBAT: +3.20 V (min = +3.57 V, max = +3.02 V) ALARM
in9: +1.59 V (min = +2.04 V, max = +2.01 V) ALARM
Case Fan: 0 RPM (min = 3668 RPM, div = 16) ALARM
CPU Fan: 0 RPM (min = 4440 RPM, div = 16) ALARM
Aux Fan: 0 RPM (min = 3125 RPM, div = 16) ALARM
fan5: 0 RPM (min = 0 RPM, div = 8)
Sys Temp: +39C (high = -6C, hyst = -2C) ALARM
CPU Temp: +43.0C (high = +100.0C, hyst = +95.0C)
AUX Temp: +42.5C (high = +100.0C, hyst = +95.0C)
Maybe at least CPU Temp. at least correlates with the real value:-) Yes,
I'll look in BIOS next time I boot with a connected monitor and a
keyboard.
Thanks for the info!
Guennadi
---
Guennadi Liakhovetski
On Fri, 5 Oct 2007, Guennadi Liakhovetski wrote:
> On Sat, 6 Oct 2007, Michael Tokarev wrote:
>
> [snip]
> > That to say - it may be some miscompilations, but may be some probs with
> > hardware itself. If you can, try to reproduce the same on another board
> > (I just tried to boot 2.6.23-rc5 on this machine, compiled for PIII CPU
> > using gcc-4.1.2 (no other version installed, sorry) - no issues so far).
>
> Hm, well, I could only compile a i686 kernel and Intel chipset with
> "otherwise the same" config with these two compiler options to test...
> Maybe some time. No, I do not have another C7 system.
No, doesn't work. I forgot that on that PC I can only boot with
"acpi=noirq", so, the whole ACPI IRQ-mapping code is not used. Otherwise,
I did build such a kernel for that PC - noticed no problem. So, either the
only two "miscompiled" places were i2c-viapro and acpi irq routing, or
indeed it only triggers problems on C7...
Thanks
Guennadi
---
Guennadi Liakhovetski