2002-11-21 16:34:50

by Peter Kjellström

[permalink] [raw]
Subject: Problems with 2.4.20-rc-ac (smp+piix)

Hi Alan,

Sorry for the double send, forgot subject, cc ... Need sleep... ;-)

I have some problems here running late 2.4.20+ac on a dual Xeon system
(supermicro mb). The kernels tried are 2.4.20-rc1-ac4 and rc2-ac1 with
various configurations. All combinations give the same result. One
.config with corresponding boot output and lspci -vv output has been
attached.

The following kernels does not have this problem (smp + dma piix works)
2.4.18-17.7.x (rh)
2.4.20-pre1
2.4.19-pre8-ac5
2.4.18 (custom)
... plus a few more


*Problem 1 (piix driver)
System takes a strange BUG when running rc.sysinit (right after sysinit
has enabled dma for the ide drive). The kernel doesn't panic or Oops only
a BUG is printed (approx. like this):

BUG at panic.c:286, inv. op. 0000
...regdump...
approx. calltrace (no oops):
__out_of_line_bug
'ide_dma stuff'
dorwdisk
ext3_get_blk
...
system call

That is what I manually wrote down of the calltrace and mapped back
through System.map. I guess I could spend some time on getting a full
calltrace if you really want.


*Problem 2 (smp detection/init stuff)
After disabling dma and getting past problem 1 the system surprised by
booting up with only one cpu.

part of bootup messages:

kernel: CPU: L1 I cache: 0K, L1 D cache: 8K
kernel: CPU: L2 cache: 512K
kernel: CPU: Physical Processor ID: 0
kernel: Enabling fast FPU save and restore... done.
kernel: Enabling unmasked SIMD FPU exception support... done.
kernel: Checking 'hlt' instruction... OK.
kernel: POSIX conformance testing by UNIFIX
kernel: mtrr: v1.40 (20010327) Richard Gooch ([email protected])
kernel: mtrr: detected mtrr type: Intel
kernel: CPU: L1 I cache: 0K, L1 D cache: 8K
kernel: CPU: L2 cache: 512K
kernel: CPU: Physical Processor ID: 0
kernel: CPU0: Intel(R) XEON(TM) CPU 2.20GHz stepping 04
kernel: per-CPU timeslice cutoff: 1462.49 usecs.
kernel: task migration cache decay timeout: 10 msecs.
kernel: enabled ExtINT on CPU#0
kernel: ESR value before enabling vector: 00000000
kernel: ESR value after enabling vector: 00000000
kernel: Error: only one processor found.

Could any of the following patches be relevant? (from -ac changelog)

Linux 2.4.20-pre1-ac1
* Fix a harmless physical/logical cpu confusion (me)
in the APM code

Linux 2.4.19rc5-ac1
+ Switch 'processor id' to 'physical id' (me)
| Keeps glibc happy until we sort out cpu numbers longer term
o Fix incorrect marking of phys_proc_id init (David Luyer)


/Peter


--
------------------------------------------------------------
Peter Kjellstroem | E-mail: [email protected]
National Supercomputer Centre |
Sweden | http://www.nsc.liu.se


Attachments:
kernel-2.4.20-rc-ac (12.27 kB)
lspci.out (15.64 kB)
config-2.4.20-rc2-ac1 (18.76 kB)
Download all attachments