2000-12-19 00:05:08

by Petr Vandrovec

[permalink] [raw]
Subject: Re: Startup IPI (was: Re: test13-pre3)

On 18 Dec 00 at 19:44, Maciej W. Rozycki wrote:
> > No, I'll try. It occured with either AGP (Matrox G200/G400/G450) or
> > PCI (S3, CL5434) VGA adapter. I did not tried real ISA VGA...
>
> Oops, I've forgotten there exist non-ISA display adapters. ;-) Just try
> if accessing one bus or another changes the behaviour.

Uh. It took couple of hours to find it. Just place

{ int i; volatile unsigned short* p = 0xC00B8000; for (i = 0; i < 6553600;
i++) { *p; } } (**)

instead of udelay(300) and this loop does not finish. Same for
unsigned long* p. inb/outb(0x3C0) are ok. Writes are OK too. Only
simple fetches from videoram kills it.

When I replaced address with 0xC01B8000 (some cachable memory), it worked
fine. When replaced with 0xC00C8000 (supposedly unused address, but maybe
it is just set as cacheable in chipset), it works too.

Symptoms of lockup are same as hangup in printk() without udelay(300), only
problem is that 'vt_console_print' (*) does not do fetches from videoram, it
does stores only...

Placing this loop before sending startup IPI, or just below udelay(300)
is OK (modulo that this loop takes so long that secondary CPU complains
about no callin received).

I even tried to add:

mov $0xB800,%ax
mov %ax,%ds
movw %ax,0

at the beginning of trampoline.S, and then boot with 'no-scroll', but
character in upper left corner did not change, so secondary CPU probably
even did not start code fetches. That's all I can say until
I put non-AGP card into the box (but I need AGP, so it is not real option).

> > and VT82C686 (rev 22) ISA bridge. I tried to request documentation
> > of 694X from VIA, but I did not heard from them. They have probably
> > some secrets hidden in their hardware...
>
> They wan't to keep the competition from being bug-compatible, it would
> seem...

Yeah. Just do not read video memory when another CPU starts. I'll try
disabling cache on both CPUs, maybe it will make some difference, as
secondary CPU should start with caches disabled. But maybe that it is
just broken AGP bus, and nothing else. But until I find what's really
broken on my hardware, I'd like to leave 'udelay(300)' in.

(*) When I was calling directly
vt_console_print(NULL, "Message1\n", 9);
vt_console_print(NULL, "Message2\n", 9);
instead of printk, I got
Message1
Messag<0x..><0x..><0x00><0x80><0x..><0x80><0x..><0x80>...
- wrong text with wrong length, so it probably started fetching garbage
instead of string as soon as second CPU started (no, it did not race due
to missing console_lock; before first printk() secondary CPU should fill
whole screen with letter '2'. It did not).

(**) When I had '*p = i; *p' in loop, from visual inspection it was
dying in range i=0x1380-0x13FF (blue background, cyan letter with diacritics).

End of guessing.
Best regards,
Petr Vandrovec
[email protected]


2000-12-19 00:21:20

by Alan

[permalink] [raw]
Subject: Re: Startup IPI (was: Re: test13-pre3)

> Yeah. Just do not read video memory when another CPU starts. I'll try
> disabling cache on both CPUs, maybe it will make some difference, as
> secondary CPU should start with caches disabled. But maybe that it is
> just broken AGP bus, and nothing else. But until I find what's really
> broken on my hardware, I'd like to leave 'udelay(300)' in.

In the case where it boots does it also report mismatched MTRRs ??

2000-12-19 06:31:04

by ferret

[permalink] [raw]
Subject: Re: Startup IPI (was: Re: test13-pre3)


Pardon me for not fully groking the issues here and possibly coming to a
wrong conclusion, but this has to do with SMP systems crashing at APIC
init time, just before penguin display (with fbcon at least)? If so, I
have a board that does this with certain cache settings made in the BIOS.
It's a 430HX chipset with two Pentium MMX 200s installed, *ancient* BIOS.

-- Ferret


2000-12-19 19:08:47

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Startup IPI (was: Re: test13-pre3)

On Tue, 19 Dec 2000, Petr Vandrovec wrote:

> Uh. It took couple of hours to find it. Just place
>
> { int i; volatile unsigned short* p = 0xC00B8000; for (i = 0; i < 6553600;
> i++) { *p; } } (**)
>
> instead of udelay(300) and this loop does not finish. Same for
> unsigned long* p. inb/outb(0x3C0) are ok. Writes are OK too. Only
> simple fetches from videoram kills it.
>
> When I replaced address with 0xC01B8000 (some cachable memory), it worked
> fine. When replaced with 0xC00C8000 (supposedly unused address, but maybe
> it is just set as cacheable in chipset), it works too.

Hmm, a read from an uncached location could result in sending delayed
APIC writes to the bus in case of an incorrect MTRR setting for the APIC
space. Could you please disable CONFIG_X86_GOOD_APIC? This will result
in using locked cycles for APIC writes, i.e. immediate bus accesses.

Please also check MTRR settings, especially for the APIC range. They
might need fixing.

> at the beginning of trampoline.S, and then boot with 'no-scroll', but
> character in upper left corner did not change, so secondary CPU probably
> even did not start code fetches. That's all I can say until
> I put non-AGP card into the box (but I need AGP, so it is not real
> option).

An easier way to check an application processor is alive could be
enabling the speaker -- after setting it up by the bootstrap CPU it only
takes three instructions to set bits 0 and 1 of port 0x61 and the result
is not volatile. A LED diagnostic display would be better, but typical
PCs don't have one, unfortunately.

> Yeah. Just do not read video memory when another CPU starts. I'll try
> disabling cache on both CPUs, maybe it will make some difference, as
> secondary CPU should start with caches disabled. But maybe that it is
> just broken AGP bus, and nothing else. But until I find what's really
> broken on my hardware, I'd like to leave 'udelay(300)' in.

If the problem is with write combining then disabling the cache won't
help, I'm afraid.

> (*) When I was calling directly
> vt_console_print(NULL, "Message1\n", 9);
> vt_console_print(NULL, "Message2\n", 9);
> instead of printk, I got
> Message1
> Messag<0x..><0x..><0x00><0x80><0x..><0x80><0x..><0x80>...
> - wrong text with wrong length, so it probably started fetching garbage
> instead of string as soon as second CPU started (no, it did not race due
> to missing console_lock; before first printk() secondary CPU should fill
> whole screen with letter '2'. It did not).

I would still verify (i.e. with the speaker) that's really the second CPU
causing the corruption.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +