Hi,
recently upgrading one of my two CPUs, I found kernel-2.4.2 to be unable to
handle the situation with 2 different CPUs (AMP = Assymmetric
multiprocessing ;-) correctly.
Some details on my system:
Dual BX board (DFI P2XBL/D), iPII 350 (Deschutes) + iPIII 850 (Coppermine)
Note: The difference in features is the XMM (SSE) flag.
The problems are twofold
(a) Determination of the correct common features (=: COMCAP), i.e.
boot_cpu_data.x86_capaility[0] at the correct time
(b) TSC stuff
Ad (a):
There is code in identify_cpu() to make sure the common subset of features
is stored in the COMCAP, however this does not work at all, as the freshly
booted CPUs just overwrite it in head.S.
Fixed this, so the system basically worked, if the iPII was the boot CPU.
With iPIII as boot CPU, it rebooted after (captured via serial console):
Asserting INIT.
Waiting for send to finish...
+Deasserting INIT.
Waiting for send to finish...
+#startup loops: 2.
Sending STARTUP #1.
After apic_write.
The problem is that the C/FPU gets initialized to use XMM instructions,
before the other CPUs are even started and checked for capabilities.
Booting with nofxsr (or noxmm which I added) helps.
However, the kernel should be able to find out himself.
To find out about the COMCAP earlier, one has two choices
* boot the secondary CPUs earlier (before initialiying usage of FXSR/XMM)
* try to find out without booting the 2ndary CPUs
The only way I found to do the latter is to use the results of the MPtable
parsing. However, this information may not be reliable. I did implement
this, but on my system, a lot of features were not reported at all ...
So I chose to do smp_init() earlier. It's a good idea to have this info
available before we get to do RAID5 init anyway.
I chose to move smp_init()/smp_boot_cpus() [which now calls identify_cpu()
and check_config() for the boot_cpu and later identify_cpu() on the secondary
CPUs and sets COMCAP correctly], then check_bugs() [which does not do the
identify_cpu() on SMP anymore) before bdev_init().
Problem solved. It does certainly solve XMM/SSE and FXSR issues. PSE36
differences, e.g., may still cause problems on 64GB kernels ...
Ad (b):
The fast_gettimeoffset() does return nonsense, if it runs on the wrong CPU.
There are two reasons: On the one hand, the TSC speed is only calculated for
one CPU, which does not fit the other. On the other hand, xtime gets updated
and the low TSC bits are saved. If gettimeofday() is run on the other CPU,
you compare the TSC of this CPU with the saved one from the other, which
will return nonsense. I've even seen negative numbers.
This behaviour screws up timing information. xntp did not work any more and
the glx module crashed the AGP/PCI bus for my MGA G400.
notsc could work around this of course.
I decided to fix this problem as well.
Therefore, cpu_khz, fast_gettimeoffset_quotient, last_tsc_low have been
moved to, and xtime duplicated in the cpuinfo_x86 struct.
identify_cpu() now calls time_init_cpu() for the secondary CPUs, which
initializes cpu_khz and the quotient.
The timer irq now stores the last_tsc_low in the CPU specific struct.
It also copies the xtime to the current CPU's struct.
gettimeofday() just uses this CPU specific info.
It works.
Find here some evidence:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 851.946
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1701.88
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 350.800
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr
bogomips : 701.54
ntpq> pe
remote refid st t when poll reach delay offset jitter
==============================================================================
LOCAL(0) LOCAL(0) 10 l 15 64 377 0.000 0.000 0.000
*casa.casa-etp.n ntp2.ptb.de 2 u 15 64 377 0.365 1.453 5.000
boot_cpu_data (COMCAP) contains:
Intel Pentium III (Coppermine) caps: 0183fbff fpu vme de pse tsc msr pae mce
cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr stepping 02
Patch against 2.4.2 is attached.
Feedback is welcome. I think the patch is safe, but I could imagine that the
earlier SMP initialization might cause problems for some people or other archs.
I would like this patch to go into the mainstream kernel, if no problem can
be found.
Regards,
--
Kurt Garloff <[email protected]> [Eindhoven, NL]
Physics: Plasma simulations <[email protected]> [TU Eindhoven, NL]
Linux: SCSI, Security <[email protected]> [SuSE Nuernberg, FRG]
(See mail header or public key servers for PGP2 and GPG public keys.)
> recently upgrading one of my two CPUs, I found kernel-2.4.2 to be unable to
> handle the situation with 2 different CPUs (AMP = Assymmetric
> multiprocessing ;-) correctly.
"correctly". Intel doesn't support this (mis)configuration:
especially with different steppings, not to mention models.
Alan has, or is working on, a workaround to handle differing
multipliers by turning off the use of RDTSC. this is the right approach
to take in the kernel: disable features not shared by both processors,
so correctly-configured machines are not penalized.
and the kernel should LOUDLY WARN ABOUT this stuff on boot.
regards, mark hahn.
In article <[email protected]>,
Kurt Garloff <[email protected]> wrote:
>
>recently upgrading one of my two CPUs, I found kernel-2.4.2 to be unable to
>handle the situation with 2 different CPUs (AMP =3D Assymmetric
>multiprocessing ;-) correctly.
This is not really a configuration Linux supports. You can hack it to
work in many cases, but I'm generally not inclined to make this a an
issue for me because:
- intel explicitly doesn't support it necessarily even in hardware.
You're supposed to only mix CPU's of the same stepping within a
family, never mind different families. They sometimes explicitly say
which steppings are compatible and can be mixed.
NOTE! For all I know, this might, for all I know, actually be due to
fundamental issues like cache coherency protocol timing or similar.
Safe answer: just say no.
- The boot CPU under Linux is special, and will be used to determine
things like support for 4M pages etc. It will then re-write the page
tables to be more efficient. If the other CPU's don't support all the
features the boot CPU has, they'll have serious trouble booting up.
NOTE! I'm not all that interested in trying to complicate the bootup
logic to take into account all the differences that can occur.
Especially as it only happens on arguably very broken hardware that
doesn't meet the specs anyway.
So I'm perfectly happy with you fixing it on your machine, but right now
I have no incentives to make this a "real" option for a standard kernel.
I retain the right to change my mind, as always. Le Linus e mobile.
Linus
> recently upgrading one of my two CPUs, I found kernel-2.4.2 to be
> unable to handle the situation with 2 different CPUs (AMP =3D
> Assymmetric multiprocessing ;-) correctly. Some details on my system:
> Dual BX board (DFI P2XBL/D), iPII 350 (Deschutes) + iPIII 850
> (Coppermine) Note: The difference in features is the XMM (SSE) flag.
> The problems are twofold (a) Determination of the correct common
> features (=3D: COMCAP), i.e.
> boot_cpu_data.x86_capaility[0] at the correct time (b) TSC stuff
I have similar problems. I've got a reconfigurable non-APIC 8 way system with
(currently) 4 p5-66 and 4 p5-166 processors. I found the answer to (b) was
simply to disable the TSC stuff---my processors aren't even guaranteed to be
fed from the same clock, so there's no hope for TSC coherency.
I run into your problem (a) when trying a mixture of 486 and 586 processors.
The simplest work around I find is just to make sure that the boot CPU has the
lowest capability set (i.e. boot off a 486). Could you just swap the order of
your processors to achieve the same effect?
James Bottomley
> > handle the situation with 2 different CPUs (AMP = Assymmetric
> > multiprocessing ;-) correctly.
>
> "correctly". Intel doesn't support this (mis)configuration:
> especially with different steppings, not to mention models.
Actually for a lot of cases its quite legal.
> Alan has, or is working on, a workaround to handle differing
> multipliers by turning off the use of RDTSC. this is the right approach
> to take in the kernel: disable features not shared by both processors,
> so correctly-configured machines are not penalized.
> and the kernel should LOUDLY WARN ABOUT this stuff on boot.
I've been working on reading the multipliers directly from the MSR 0x2A data,
Kurt is redoing the timing each run - possibly thats not so clean but its
more robust.
I rather like Kurt's patch
On Wed, Mar 21, 2001 at 11:41:33PM +0000, Alan Cox wrote:
> > > handle the situation with 2 different CPUs (AMP = Assymmetric
> > > multiprocessing ;-) correctly.
> >
> > "correctly". Intel doesn't support this (mis)configuration:
> > especially with different steppings, not to mention models.
I wouldn't call it misconfiguration, just because it's a bit more difficult
to handle.
On the iontel side: You should watch out for matching APICs, voltages and
cache coherency (MESI) protocol. Actually, Deschutes and Coppermine just
work fine in spite of slightly different voltage.
> Actually for a lot of cases its quite legal.
It should be always legal to have the same CPU with different speeds. (FSB
must be the same for obvious reasons), e.g.. The TSC part of the patch
addresses this.
> > Alan has, or is working on, a workaround to handle differing
> > multipliers by turning off the use of RDTSC. this is the right approach
> > to take in the kernel: disable features not shared by both processors,
> > so correctly-configured machines are not penalized.
TSC is supported by both CPUs ... so why not use the nice rdtsc based time
routine.
The penalty:
It is 5 (20 bytes) ints more in the struct cpuinfo_x86, so you have
a kernel with (NR_CPUS*20 - 12) = 628 bytes more kernel data ...
Your bootup time may be a fraction of a second longer, as every CPU
calibrates the TSC on its own; OTOH they do it in parallel ...
The timer interrupt does an extra copy of xtime (2 ints) to the to the per
CPU struct. And one extra indirection for accessing the per CPU struct.
Something like 4 CPU cycles per timer interrupt.
gettimeofday() also has this extra indirection; OTOH not accessing global
data saves a cache line flush the next time the structure is written to ...
So this is probably a net cost of zero.
> > and the kernel should LOUDLY WARN ABOUT this stuff on boot.
>
> I've been working on reading the multipliers directly from the MSR 0x2A data,
> Kurt is redoing the timing each run - possibly thats not so clean but its
> more robust.
>
> I rather like Kurt's patch
Thx!
If you have some requests to make it suitable for -ac kernels, I'll do my
best.
Regards,
--
Kurt Garloff <[email protected]> [Eindhoven, NL]
Physics: Plasma simulations <[email protected]> [TU Eindhoven, NL]
Linux: SCSI, Security <[email protected]> [SuSE Nuernberg, FRG]
(See mail header or public key servers for PGP2 and GPG public keys.)
> > > > handle the situation with 2 different CPUs (AMP = Assymmetric
> > > > multiprocessing ;-) correctly.
> > >
> > > "correctly". Intel doesn't support this (mis)configuration:
> > > especially with different steppings, not to mention models.
>
> I wouldn't call it misconfiguration, just because it's a bit more difficult
> to handle.
again, I *would* call it misconfiguration. intel says explicitly that
they don't support mixing model/family parts. and they only test
same-clock combinations (but do support mixed steppings.) just so people
don't get the impression that random, different CPUs are a sure thing...
Hi!
> Kurt Garloff <[email protected]> wrote:
> >
> >recently upgrading one of my two CPUs, I found kernel-2.4.2 to be unable to
> >handle the situation with 2 different CPUs (AMP =3D Assymmetric
> >multiprocessing ;-) correctly.
>
> This is not really a configuration Linux supports. You can hack it to
> work in many cases, but I'm generally not inclined to make this a an
> issue for me because:
Notice, that one of your CPUs is twice as fast as second one. You'll
need some heavy updates in scheduler.
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
Kurt Garloff <[email protected]> writes:
> On Wed, Mar 21, 2001 at 11:41:33PM +0000, Alan Cox wrote:
> > > > handle the situation with 2 different CPUs (AMP = Assymmetric
> > > > multiprocessing ;-) correctly.
> > >
> > > "correctly". Intel doesn't support this (mis)configuration:
> > > especially with different steppings, not to mention models.
>
> I wouldn't call it misconfiguration, just because it's a bit more difficult
> to handle.
> On the iontel side: You should watch out for matching APICs, voltages and
> cache coherency (MESI) protocol. Actually, Deschutes and Coppermine just
> work fine in spite of slightly different voltage.
The spooky thing is if there is that it may work just fine most of the
time but the differences between the CPU's might cause very strange
behavior every once in a great while. Which is a hardware argument, for
why you shouldn't trust such a configuration.
However it is still worth some thought. The hardware argument gets much
weaker when you have something like dual AMD's. The reason is that
with a point to point bus you may actually be able to sanely support
multiple cpu revs and speeds without even any theoretical hardware consequences.
And NUMA machines make this argument even stronger.
However I would suggest that we build some good kernel->kernel apis
for dealing with kernels with a wicked fast interconnect. And then
for NUMA and for the other cases where it really matters we can run multiple
kernels, and the mismatch problems just drop away.
Eric
On Thu, Mar 22, 2001 at 01:20:40PM +0000, Pavel Machek wrote:
> > Kurt Garloff <[email protected]> wrote:
> Notice, that one of your CPUs is twice as fast as second one. You'll
> need some heavy updates in scheduler.
I know that making sure to have a fair scheduling on non-symmetric
multiprocessor machiens would require some more work.
I can live with imperfect scheduling ...
Or are you refering to something more serious than non-fairness?
Note: My machine just runs fine for a couple of days now ...
Regards,
--
Kurt Garloff <[email protected]> Eindhoven, NL
GPG key: See mail header, key servers Linux kernel development
SuSE GmbH, Nuernberg, FRG SCSI, Security