Hi,
trying to boot a 2.4.18-rc1, 2.4.18-rc2-ac1 or 2.5.5pre1 on a dual P3
with a VIA chipset hangs (randomly) at the "=====" signs, sometimes the
screen is flickering:
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
Intel machine check reporting enabled on CPU#0.
CPU0: Intel Pentium III (Coppermine) stepping 06
per-CPU timeslice cutoff: 731.39 usecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Booting processor 1/0 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 1874.32 BogoMIPS
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
Intel machine check reporting enabled on CPU#1.
CPU1: Intel Pentium III (Coppermine) stepping 06
Total of 2 processors activated (3742.10 BogoMIPS).
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
..TIMER: vector=0x31 pin1=2 pin2=0
=====
testing the IO APIC.......................
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 937.5155 MHz.
..... host bus clock speed is 133.9307 MHz.
cpu: 0, clocks: 1339307, slice: 446435
CPU0<T0:1339296,T1:892848,D:13,S:446435,C:1339307>
cpu: 1, clocks: 1339307, slice: 446435
=====
CPU1<T0:1339504,T1:446480,D:14,S:446505,C:1339515>
checking TSC synchronization across CPUs: passed.
Waiting on wait_init_idle (map = 0x2)
All processors have done init
=====
or it hangs after initialising the SCSI driver (sym), IDE, network etc.
seems ok:
...
sym0: SCSI BUS has been reset.
scsi0 : sym-2.1.17a
=====
It never went further.
Booting with "noapic" works, as do UP 2.4 kernels and 2.2.20 SMP
kernels.
00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo PRO133x] (rev c4)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP]
00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:04.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:04.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
00:04.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
Adam
--
Adam [email protected]
Lackorzynski http://a.home.dhs.org
> trying to boot a 2.4.18-rc1, 2.4.18-rc2-ac1 or 2.5.5pre1 on a dual P3
> with a VIA chipset hangs (randomly) at the "=====" signs, sometimes the
> screen is flickering:
Does 2.4.18pre8 work ? There is a small MP 1.4 change I tested and fed on
to Marcelo and it would be nice to know that wasnt the cause
On Wed Feb 20, 2002 at 15:24:38 +0000, Alan Cox wrote:
> > trying to boot a 2.4.18-rc1, 2.4.18-rc2-ac1 or 2.5.5pre1 on a dual P3
> > with a VIA chipset hangs (randomly) at the "=====" signs, sometimes the
> > screen is flickering:
> Does 2.4.18pre8 work ? There is a small MP 1.4 change I tested and fed on
> to Marcelo and it would be nice to know that wasnt the cause
No, same symptoms, hangs on several places and screen flickers
sometimes.
Adam
--
Adam [email protected]
Lackorzynski http://a.home.dhs.org
Hello
I'm having same problems booting SMP-enabled kernels. I just mailed
the list asking for help, but I could not provide more information abount
the mainboard (it's a Asus dual P3, via chipset, don't remember the model).
The messages are very similar (if not equal) to yours. I was able to boot to
the end disabling APM support, but the boot process itself became unstable
(some times it booted, some times not, just like you).
-----------------------------------------------
Fernando Korndorfer
Novo Hamburgo, RS, Brasil
-----------------------------------------------
On Thu, 21 Feb 2002 14:58:58 -0300
"Fernando Korndorfer" <[email protected]> wrote:
> Hello
>
> I'm having same problems booting SMP-enabled kernels. I just mailed
> the list asking for help, but I could not provide more information abount
> the mainboard (it's a Asus dual P3, via chipset, don't remember the model).
> The messages are very similar (if not equal) to yours. I was able to boot to
> the end disabling APM support, but the boot process itself became unstable
> (some times it booted, some times not, just like you).
Hm, interestingly there seem to be more people with via+SMP+somewhat problems. Interestingly, because I cannot confirm these troubles, using such a setup myself. Just have a look:
00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo PRO133x] (rev c4)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP]
00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:04.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:04.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
00:04.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:09.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03)
00:0a.0 Network controller: Elsa AG QuickStep 1000 (rev 01)
00:0b.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c1010 Ultra3 SCSI Adapter (rev 01)
00:0b.1 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c1010 Ultra3 SCSI Adapter (rev 01)
00:0c.0 Unknown mass storage controller: Promise Technology, Inc. 20268 (rev 01)
00:0d.0 Multimedia audio controller: Creative Labs SB Live! EMU10000 (rev 07)
00:0d.1 Input device controller: Creative Labs SB Live! (rev 07)
01:00.0 VGA compatible controller: nVidia Corporation NV11 (rev b2)
02:04.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
02:05.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
02:06.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
02:07.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
(This is Asus CUV4X-D, dual PIII, and a damn lot of stuff inside :-)
I compile my kernel (2.4.18-rc2) with the attached config. Please try it and tell your results. I can assure you that this machine runs rock solid over here for months.
Regards,
Stephan
On Thu Feb 21, 2002 at 21:11:42 +0100, Stephan von Krawczynski wrote:
> Hm, interestingly there seem to be more people with via+SMP+somewhat
> problems. Interestingly, because I cannot confirm these troubles,
> using such a setup myself. Just have a look:
> (This is Asus CUV4X-D, dual PIII, and a damn lot of stuff inside :-)
Same MB here, the lspci output is also the same (for the onboard stuff ;).
> I compile my kernel (2.4.18-rc2) with the attached config. Please try
> it and tell your results. I can assure you that this machine runs rock
> solid over here for months.
No luck here. Hangs during boot (tried with 2.4.18-rc2-ac2).
I even updated the BIOS from 1010 to 1014 as well (just in case). What
BIOS version are you running? And at how many MHz are the CPUs?
Adam
--
Adam [email protected]
Lackorzynski http://a.home.dhs.org
On Fri, 22 Feb 2002 14:02:46 +0100
Adam Lackorzynski <[email protected]> wrote:
> On Thu Feb 21, 2002 at 21:11:42 +0100, Stephan von Krawczynski wrote:
> > Hm, interestingly there seem to be more people with via+SMP+somewhat
> > problems. Interestingly, because I cannot confirm these troubles,
> > using such a setup myself. Just have a look:
>
> > (This is Asus CUV4X-D, dual PIII, and a damn lot of stuff inside :-)
>
> Same MB here, the lspci output is also the same (for the onboard stuff ;).
Ok, this is fine and makes the comparison at least possible to some extent.
> > I compile my kernel (2.4.18-rc2) with the attached config. Please try
> > it and tell your results. I can assure you that this machine runs rock
> > solid over here for months.
>
> No luck here. Hangs during boot (tried with 2.4.18-rc2-ac2).
Please start from a setup as close to mine as possible. That is 2.4.18-rc2.
In setup switch MPS 1.4 support to disable and Power Management to disable.
> I even updated the BIOS from 1010 to 1014 as well (just in case). What
> BIOS version are you running? And at how many MHz are the CPUs?
I use BIOS 1010, 2 x P3 1 GHz and tried RAM from 512MB to 2GB. Currently installed are 2GB being 2 x 1GB registered DIMM.
Regards,
Stephan
> Adam Lackorzynski <[email protected]> wrote:
>
> > Same MB here, the lspci output is also the same (for the onboard stuff
;).
>
> Ok, this is fine and makes the comparison at least possible to some
extent.
> Please start from a setup as close to mine as possible. That is
2.4.18-rc2.
> In setup switch MPS 1.4 support to disable and Power Management to
disable.
>
> > I even updated the BIOS from 1010 to 1014 as well (just in case). What
> > BIOS version are you running? And at how many MHz are the CPUs?
>
> I use BIOS 1010, 2 x P3 1 GHz and tried RAM from 512MB to 2GB. Currently
installed are 2GB being 2 x 1GB registered DIMM.
The MB is the same for me (I guess). The machine is about 60Km distant
from me now, so I have not compiled with your .config yet. Next time I'll
try to gather some info for the list. The lspci is almost the same.
-----------------------------------------------
Fernando Korndorfer
Novo Hamburgo, RS, Brasil
-----------------------------------------------
On Fri Feb 22, 2002 at 14:11:01 +0100, Stephan von Krawczynski wrote:
> > > I compile my kernel (2.4.18-rc2) with the attached config. Please try
> > > it and tell your results. I can assure you that this machine runs rock
> > > solid over here for months.
> >
> > No luck here. Hangs during boot (tried with 2.4.18-rc2-ac2).
>
> Please start from a setup as close to mine as possible. That is 2.4.18-rc2.
> In setup switch MPS 1.4 support to disable and Power Management to disable.
No luck, even with completely switched off PM. "noapic" works. I
attached the stripped down config. It mostly hangs while setting up the
second CPU.
> > I even updated the BIOS from 1010 to 1014 as well (just in case). What
> > BIOS version are you running? And at how many MHz are the CPUs?
>
> I use BIOS 1010, 2 x P3 1 GHz and tried RAM from 512MB to 2GB.
> Currently installed are 2GB being 2 x 1GB registered DIMM.
2x 933, RAM is 960MB.
Adam
--
Adam [email protected]
Lackorzynski http://a.home.dhs.org
On Fri, 22 Feb 2002 17:45:58 +0100
Adam Lackorzynski <[email protected]> wrote:
> On Fri Feb 22, 2002 at 14:11:01 +0100, Stephan von Krawczynski wrote:
> > > > I compile my kernel (2.4.18-rc2) with the attached config. Please try
> > > > it and tell your results. I can assure you that this machine runs rock
> > > > solid over here for months.
> > >
> > > No luck here. Hangs during boot (tried with 2.4.18-rc2-ac2).
> >
> > Please start from a setup as close to mine as possible. That is 2.4.18-rc2.
> > In setup switch MPS 1.4 support to disable and Power Management to disable.
>
> No luck, even with completely switched off PM. "noapic" works. I
> attached the stripped down config. It mostly hangs while setting up the
> second CPU.
Your config is not identical to the one I sent. If you want to find out what the problem is, you must first try to produce a setup that is known good. So simply use my config, even if it contains stuff you don't need, and especially if it does not contain stuff you want.
Your primary goal is: let the box boot.
Your secondary goal is: add your original options to my config one by one - means: add one, test it, add next.
Somewhere in between it is expected to break. Then you probably located the reason with the last option you added.
> > > I even updated the BIOS from 1010 to 1014 as well (just in case). What
> > > BIOS version are you running? And at how many MHz are the CPUs?
> >
> > I use BIOS 1010, 2 x P3 1 GHz and tried RAM from 512MB to 2GB.
> > Currently installed are 2GB being 2 x 1GB registered DIMM.
>
> 2x 933, RAM is 960MB.
I have several of those boards in production environment, one of those is exactly like yours (1GB RAM and 2 x PIII(933)). All of them work flawlessly. There is a chance it is related to my configs.
Regards,
Stephan
On Fri Feb 22, 2002 at 18:04:29 +0100, Stephan von Krawczynski wrote:
> Your config is not identical to the one I sent. If you want to find
> out what the problem is, you must first try to produce a setup that is
> known good. So simply use my config, even if it contains stuff you
> don't need, and especially if it does not contain stuff you want.
> Your primary goal is: let the box boot.
Yours (+serial console) doesn't work either, so I stripped out most
unneeded things. I'm going to rip out all cards except net and graphics
to see if that helps but that has to wait till Monday...
BTW: I just got this:
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 937.5536 MHz.
..... host bus clock speed is 133.9358 MHz.
cpu: 0, clocks: 1339358, slice: 446452
CPU0<T0:1339344,T1:892880,D:12,S:446452,C:1339358>
cpu: 1, clocks: 1339358, slice: 446452
CPU1<T0:1339344,T1:446432,D:8,S:446452,C:1339358>
checking TSC synchronization across CPUs:
BIOS BUG: CPU#0 improperly initialized, has -6 usecs TSC skew! FIXED.
BIOS BUG: CPU#1 improperly initialized, has 6 usecs TSC skew! FIXED.
Waiting on wait_init_idle (map = 0x
Maybe this means something...
Adam
--
Adam [email protected]
Lackorzynski http://a.home.dhs.org
On Fri, 22 Feb 2002 19:20:24 +0100
Adam Lackorzynski <[email protected]> wrote:
> On Fri Feb 22, 2002 at 18:04:29 +0100, Stephan von Krawczynski wrote:
> > Your config is not identical to the one I sent. If you want to find
> > out what the problem is, you must first try to produce a setup that is
> > known good. So simply use my config, even if it contains stuff you
> > don't need, and especially if it does not contain stuff you want.
> > Your primary goal is: let the box boot.
>
> Yours (+serial console) doesn't work either, so I stripped out most
> unneeded things. I'm going to rip out all cards except net and graphics
> to see if that helps but that has to wait till Monday...
>
> BTW: I just got this:
> Using local APIC timer interrupts.
> calibrating APIC timer ...
> .... CPU clock speed is 937.5536 MHz.
> ..... host bus clock speed is 133.9358 MHz.
> cpu: 0, clocks: 1339358, slice: 446452
> CPU0<T0:1339344,T1:892880,D:12,S:446452,C:1339358>
> cpu: 1, clocks: 1339358, slice: 446452
> CPU1<T0:1339344,T1:446432,D:8,S:446452,C:1339358>
> checking TSC synchronization across CPUs:
> BIOS BUG: CPU#0 improperly initialized, has -6 usecs TSC skew! FIXED.
> BIOS BUG: CPU#1 improperly initialized, has 6 usecs TSC skew! FIXED.
> Waiting on wait_init_idle (map = 0x
>
>
> Maybe this means something...
Aha, here is my output on 2 x 1 GHz:
<4>Using local APIC timer interrupts.
<4>calibrating APIC timer ...
<4>..... CPU clock speed is 1004.5421 MHz.
<4>..... host bus clock speed is 133.9388 MHz.
<4>cpu: 0, clocks: 1339388, slice: 446462
<4>CPU0<T0:1339376,T1:892912,D:2,S:446462,C:1339388>
<4>cpu: 1, clocks: 1339388, slice: 446462
<4>CPU1<T0:1339376,T1:446448,D:4,S:446462,C:1339388>
<4>checking TSC synchronization across CPUs: passed.
<4>Waiting on wait_init_idle (map = 0x2)
<4>All processors have done init_idle
And the same part for 2 x 933 MHz:
<4>Using local APIC timer interrupts.
<4>calibrating APIC timer ...
<4>..... CPU clock speed is 937.5672 MHz.
<4>..... host bus clock speed is 133.9380 MHz.
<4>cpu: 0, clocks: 1339380, slice: 446460
<4>CPU0<T0:1339376,T1:892912,D:4,S:446460,C:1339380>
<4>cpu: 1, clocks: 1339380, slice: 446460
<4>CPU1<T0:1339376,T1:446448,D:8,S:446460,C:1339380>
<4>checking TSC synchronization across CPUs: passed.
<4>Waiting on wait_init_idle (map = 0x2)
<4>All processors have done init_idle
I would say this means the TSC skew fix is broken and shooting down your box. What do you think, Alan?
Regards,
Stephan
> <4>CPU1<T0:1339376,T1:446448,D:8,S:446460,C:1339380>
> <4>checking TSC synchronization across CPUs: passed.
> <4>Waiting on wait_init_idle (map = 0x2)
> <4>All processors have done init_idle
>
> I would say this means the TSC skew fix is broken and shooting down your box. What do you think, Alan?
Seems a reasonable guess. However that TSC skew itself may point to other
problems. It means one processor started running successfully a little after
the other. That might be normal behaviour for that board or might point to
something else
On Sat, 23 Feb 2002 17:22:01 +0000 (GMT)
Alan Cox <[email protected]> wrote:
> > <4>CPU1<T0:1339376,T1:446448,D:8,S:446460,C:1339380>
> > <4>checking TSC synchronization across CPUs: passed.
> > <4>Waiting on wait_init_idle (map = 0x2)
> > <4>All processors have done init_idle
> >
> > I would say this means the TSC skew fix is broken and shooting down your box. What do you think, Alan?
>
> Seems a reasonable guess. However that TSC skew itself may point to other
> problems. It means one processor started running successfully a little after
> the other. That might be normal behaviour for that board or might point to
> something else
It seems no normal behaviour, I checked several other boards of this type and none had a TSC skew (and all work). Purely guessing I would suggest two try some other 2 processors to verify the behaviour is really processor-independent. Another guess would of course be the MB itself being broken to some extent.
Has anybody ever seen a _working_ skew correction? Is this known-to-work code?
Regards,
Stephan
>> > <4>CPU1<T0:1339376,T1:446448,D:8,S:446460,C:1339380>
>> > <4>checking TSC synchronization across CPUs: passed.
>> > <4>Waiting on wait_init_idle (map = 0x2)
>> > <4>All processors have done init_idle
>> >
>> > I would say this means the TSC skew fix is broken and shooting down
>> > your box. What do you think, Alan?
>>
>> Seems a reasonable guess. However that TSC skew itself may point to other
>> problems. It means one processor started running successfully a little
>> after the other. That might be normal behaviour for that board or might
>> point to something else
>
> It seems no normal behaviour, I checked several other boards of this type
> and none had a TSC skew (and all work). Purely guessing I would suggest
> two try some other 2 processors to verify the behaviour is really
> processor-independent. Another guess would of course be the MB itself
> being broken to some extent.
>
> Has anybody ever seen a _working_ skew correction? Is this known-to-work
> code?
Yes. Works every time for me (on NUMA-Q), with huge corrections:
checking TSC synchronization across CPUs:
BIOS BUG: CPU#0 improperly initialized, has 6571 usecs TSC skew! FIXED.
BIOS BUG: CPU#1 improperly initialized, has 6571 usecs TSC skew! FIXED.
BIOS BUG: CPU#2 improperly initialized, has 6571 usecs TSC skew! FIXED.
BIOS BUG: CPU#3 improperly initialized, has 6571 usecs TSC skew! FIXED.
BIOS BUG: CPU#4 improperly initialized, has 20664 usecs TSC skew! FIXED.
BIOS BUG: CPU#5 improperly initialized, has 20664 usecs TSC skew! FIXED.
BIOS BUG: CPU#6 improperly initialized, has 20665 usecs TSC skew! FIXED.
BIOS BUG: CPU#7 improperly initialized, has 20664 usecs TSC skew! FIXED.
BIOS BUG: CPU#8 improperly initialized, has -4424 usecs TSC skew! FIXED.
BIOS BUG: CPU#9 improperly initialized, has -4424 usecs TSC skew! FIXED.
BIOS BUG: CPU#10 improperly initialized, has -4424 usecs TSC skew! FIXED.
BIOS BUG: CPU#11 improperly initialized, has -4424 usecs TSC skew! FIXED.
BIOS BUG: CPU#12 improperly initialized, has -22812 usecs TSC skew! FIXED.
BIOS BUG: CPU#13 improperly initialized, has -22812 usecs TSC skew! FIXED.
BIOS BUG: CPU#14 improperly initialized, has -22812 usecs TSC skew! FIXED.
BIOS BUG: CPU#15 improperly initialized, has -22811 usecs TSC skew! FIXED.
I did try disabling it once, which stopped the system booting.
I never looked at it any further.
If you are crashing near the wait_init_idle fix, you might
try Ingo's scheduler patch - it has a different way of
fixing this race condition.
M.
On Sun Feb 24, 2002 at 08:42:52 -0800, Martin J. Bligh wrote:
> If you are crashing near the wait_init_idle fix, you might
> try Ingo's scheduler patch - it has a different way of
> fixing this race condition.
I tried sched-O1-2.4.18-pre8-K3 and it didn't work with a SMP kernel
either.
Furthermore it also doesn't work with a UP kernel with Local IO APIC support
(CONFIG_X86_UP_IOAPIC). Without it does work. It reliably hangs in
arch/i386/kernel/io_apic.c:check_timer (in 2.4.18-pre8-K3 about line 1510)
printk(KERN_INFO "..TIMER: vector=0x%02X pin1=%d pin2=%d\n", vector, pin1, pin2);
if (pin1 != -1) {
/*
* Ok, does IRQ0 through the IOAPIC work?
*/
printk(KERN_ERR "1\n");
unmask_IO_APIC_irq(0);
printk(KERN_ERR "2\n");
if (timer_irq_works()) {
The "1" still occurs, then it hangs...
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
..TIMER: vector=0x31 pin1=2 pin2=0
1
Adam
--
Adam [email protected]
Lackorzynski http://a.home.dhs.org
> Furthermore it also doesn't work with a UP kernel with Local IO APIC
> support (CONFIG_X86_UP_IOAPIC). Without it does work. It reliably hangs in
> arch/i386/kernel/io_apic.c:check_timer (in 2.4.18-pre8-K3 about line 1510)
>
> printk(KERN_INFO "..TIMER: vector=0x%02X pin1=%d pin2=%d\n",
> vector, pin1, pin2);
>
> if (pin1 != -1) {
> /*
> * Ok, does IRQ0 through the IOAPIC work?
> */
> printk(KERN_ERR "1\n");
> unmask_IO_APIC_irq(0);
> printk(KERN_ERR "2\n");
> if (timer_irq_works()) {
>
> The "1" still occurs, then it hangs...
So if you comment out the unmask and the whole if clause
below it, does your system then boot, or do you just crash
and burn a few lines later?
M.
On Mon Feb 25, 2002 at 07:32:05 -0800, Martin J. Bligh wrote:
> So if you comment out the unmask and the whole if clause
> below it, does your system then boot, or do you just crash
> and burn a few lines later?
I just spits out some (not all) chars from the following printk and then
dies.
Adam
--
Adam [email protected]
Lackorzynski http://a.home.dhs.org