2002-03-20 22:13:19

by Tom Epperly

[permalink] [raw]
Subject: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box

This is a followup to an earlier thread whose subject was "Re: RH7.2
running 2.4.9-21-SMP (dual Xeon's) yields "Illegal instructions". Now
I am running a self-compiled 2.4.18 kernel with small changes shown
below to log illegal instruction traps in the kernel log.

The kernel log showed me that various standard programs such as
/bin/sh are generating bogus illegal instruction traps on a legal
opcode (0x55) as part of a standard function preamble. After receiving
an illegal instruction trap on opcode (0x55), the modified kernel does
a wbinvd() to flush the cache and a __flush_tlb() to flush the TLB
and then retries the "illegal" opcode. The retry produces a second
illegal instruction trap on the same legal opcode (0x55). Information
from /var/log/messages is shown below.

The problem disappears if I disable the second CPU (via a BIOS
switch). I've tried physically switching processors on the
motherboard, and both chips behave correctly in single-CPU mode. The
system passes Dell's hardware diagnostics (twice) and memtest-86 2.9,
and I seen identical problems on two other Dell Precision 530
Workstations purchased at different times with different clock speeds.

I initiated a support call with Dell at around 3:30pm PST on Friday
15-Mar-2002, and all the feedback I've received from this so far shows
that they are clueless. They are trying to portray this as a Linux
problem.

The machine doesn't run X11, so the nVidia drivers are never loaded. I
pulled the sound card out too. It has 512MB of ECC RAM.

Does anyone else have any suggestions about what could be causing this
problem or how one might further diagnose the issue. Is there anyway
that this might not be a hardware problem? Please Cc me in
replies.

Tom Epperly

*SAMPLE MESSAGES* from /var/log/messages:

Mar 18 20:56:30 tux06 kernel: Restarting 13766 0x805aa80 sh
Mar 18 20:56:30 tux06 kernel: 55 89 e5 83 ec 08 8b 45 08 85 c0 74 0a 8b 15 00 24 0c 08 85
Mar 18 20:56:30 tux06 kernel: invalid operand: 0000
Mar 18 20:56:30 tux06 kernel: CPU: 1
Mar 18 20:56:30 tux06 kernel: EIP: 0023:[usb_stor_exit+134588960/-1072693344] Not tainted
Mar 18 20:56:30 tux06 kernel: EIP: 0023:[<0805aa80>] Not tainted
Mar 18 20:56:30 tux06 kernel: EFLAGS: 00010292
Mar 18 20:56:30 tux06 kernel: eax: 000035c6 ebx: 000035c6 ecx: bfffe730 edx: 00000001
Mar 18 20:56:30 tux06 kernel: esi: 00000000 edi: 00000000 ebp: bfffe7c8 esp: bfffe69c
Mar 18 20:56:30 tux06 kernel: ds: 002b es: 002b ss: 002b
Mar 18 20:56:30 tux06 kernel: Process sh (pid: 13766, stackpage=caff1000)
Mar 18 20:56:30 tux06 kernel: Stack: 0806f58c 00000000 bfffe730 bfffe6b0 0806f580 00010000 00000000 00000000
Mar 18 20:56:30 tux06 kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Mar 18 20:56:30 tux06 kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Mar 18 20:56:30 tux06 kernel: Call Trace:
Mar 18 20:56:30 tux06 kernel:
Mar 18 20:56:30 tux06 kernel: Code: 55 89 e5 83 ec 08 8b 45 08 85 c0 74 0a 8b 15 00 24 0c 08 85
Mar 19 05:13:01 tux06 kernel: Restarting 11895 0x4011f8a0 sh
Mar 19 05:13:01 tux06 kernel: 55 53 56 57 8b 5c 24 14 8b 4c 24 18 8b 54 24 1c 8b 74 24 20
Mar 19 05:13:01 tux06 kernel: invalid operand: 0000
Mar 19 05:13:01 tux06 kernel: CPU: 0
Mar 19 05:13:01 tux06 kernel: EIP: 0023:[usb_stor_exit+1074919488/-1072693344] Not tainted
Mar 19 05:13:01 tux06 kernel: EIP: 0023:[<4011f8a0>] Not tainted
Mar 19 05:13:01 tux06 kernel: EFLAGS: 00010206
Mar 19 05:13:01 tux06 kernel: eax: 00001000 ebx: 4017c690 ecx: bfffb200 edx: 00001000
Mar 19 05:13:01 tux06 kernel: esi: 080cd00c edi: 00001000 ebp: bfffb278 esp: bfffb1dc
Mar 19 05:13:01 tux06 kernel: ds: 002b es: 002b ss: 002b
Mar 19 05:13:01 tux06 kernel: Process sh (pid: 11895, stackpage=c6f97000)
Mar 19 05:13:01 tux06 kernel: Stack: 4009d145 00000000 00001000 00000003 00000022 ffffffff 00000000 00000001
Mar 19 05:13:01 tux06 kernel: 00000001 00000805 00000000 00000000 000517b5 000081a4 00000001 00000000
Mar 19 05:13:01 tux06 kernel: 00000000 00000000 00000000 00000000 00000a29 00000000 00001000 00000008
Mar 19 05:13:01 tux06 kernel: Call Trace:
Mar 19 05:13:01 tux06 kernel:
Mar 19 05:13:01 tux06 kernel: Code: 55 53 56 57 8b 5c 24 14 8b 4c 24 18 8b 54 24 1c 8b 74 24 20
Mar 19 05:13:01 tux06 kernel: Restarting 11898 0x4011f8a0 sh
Mar 19 05:13:01 tux06 kernel: 55 53 56 57 8b 5c 24 14 8b 4c 24 18 8b 54 24 1c 8b 74 24 20
Mar 19 05:13:01 tux06 kernel: invalid operand: 0000
Mar 19 05:13:01 tux06 kernel: CPU: 0
Mar 19 05:13:01 tux06 kernel: EIP: 0023:[usb_stor_exit+1074919488/-1072693344] Not tainted
Mar 19 05:13:01 tux06 kernel: EIP: 0023:[<4011f8a0>] Not tainted
Mar 19 05:13:01 tux06 kernel: EFLAGS: 00010206
Mar 19 05:13:01 tux06 kernel: eax: 00001000 ebx: 4017c690 ecx: bfffb200 edx: 00001000
Mar 19 05:13:01 tux06 kernel: esi: 080cd00c edi: 00001000 ebp: bfffb278 esp: bfffb1dc
Mar 19 05:13:01 tux06 kernel: ds: 002b es: 002b ss: 002b
Mar 19 05:13:01 tux06 kernel: Process sh (pid: 11898, stackpage=c6f97000)
Mar 19 05:13:01 tux06 kernel: Stack: 4009d145 00000000 00001000 00000003 00000022 ffffffff 00000000 00000001
Mar 19 05:13:01 tux06 kernel: 00000001 00000805 00000000 00000000 000517b5 000081a4 00000001 00000000
Mar 19 05:13:01 tux06 kernel: 00000000 00000000 00000000 00000000 00000a29 00000000 00001000 00000008
Mar 19 05:13:01 tux06 kernel: Call Trace:
Mar 19 05:13:01 tux06 kernel:
Mar 19 05:13:01 tux06 kernel: Code: 55 53 56 57 8b 5c 24 14 8b 4c 24 18 8b 54 24 1c 8b 74 24 20
Mar 19 05:13:01 tux06 kernel: Restarting 11902 0x4011f8a0 runAll
Mar 19 05:13:01 tux06 kernel: 55 53 56 57 8b 5c 24 14 8b 4c 24 18 8b 54 24 1c 8b 74 24 20
Mar 19 05:13:01 tux06 kernel: invalid operand: 0000
Mar 19 05:13:01 tux06 kernel: CPU: 0
Mar 19 05:13:01 tux06 kernel: EIP: 0023:[usb_stor_exit+1074919488/-1072693344] Not tainted
Mar 19 05:13:01 tux06 kernel: EIP: 0023:[<4011f8a0>] Not tainted
Mar 19 05:13:01 tux06 kernel: EFLAGS: 00010206
Mar 19 05:13:01 tux06 kernel: eax: 00001000 ebx: 4017c690 ecx: bfffb260 edx: 00001000
Mar 19 05:13:01 tux06 kernel: esi: 080cd00c edi: 00001000 ebp: bfffb2d8 esp: bfffb23c
Mar 19 05:13:01 tux06 kernel: ds: 002b es: 002b ss: 002b
Mar 19 05:13:01 tux06 kernel: Process runAll (pid: 11902, stackpage=cbe0b000)
Mar 19 05:13:01 tux06 kernel: Stack: 4009d145 00000000 00001000 00000003 00000022 ffffffff 00000000 00000001
Mar 19 05:13:01 tux06 kernel: 00000001 00000805 00000000 00000000 000517b5 000081a4 00000001 00000000
Mar 19 05:13:01 tux06 kernel: 00000000 00000000 00000000 00000000 00000a29 00000000 00001000 00000008
Mar 19 05:13:01 tux06 kernel: Call Trace:
Mar 19 05:13:01 tux06 kernel:
Mar 19 05:13:01 tux06 kernel: Code: 55 53 56 57 8b 5c 24 14 8b 4c 24 18 8b 54 24 1c 8b 74 24 20
Mar 19 05:13:01 tux06 kernel: Restarting 11919 0x4011f8a0 runAll
Mar 19 05:13:01 tux06 kernel: 55 53 56 57 8b 5c 24 14 8b 4c 24 18 8b 54 24 1c 8b 74 24 20
Mar 19 05:13:01 tux06 kernel: invalid operand: 0000
Mar 19 05:13:01 tux06 kernel: CPU: 0
Mar 19 05:13:01 tux06 kernel: EIP: 0023:[usb_stor_exit+1074919488/-1072693344] Not tainted
Mar 19 05:13:01 tux06 kernel: EIP: 0023:[<4011f8a0>] Not tainted
Mar 19 05:13:01 tux06 kernel: EFLAGS: 00010206
Mar 19 05:13:01 tux06 kernel: eax: 00001000 ebx: 4017c690 ecx: bfffb250 edx: 00001000
Mar 19 05:13:01 tux06 kernel: esi: 080cd00c edi: 00001000 ebp: bfffb2c8 esp: bfffb22c
Mar 19 05:13:01 tux06 kernel: ds: 002b es: 002b ss: 002b
Mar 19 05:13:01 tux06 kernel: Process runAll (pid: 11919, stackpage=cbe0b000)
Mar 19 05:13:01 tux06 kernel: Stack: 4009d145 00000000 00001000 00000003 00000022 ffffffff 00000000 00000001
Mar 19 05:13:01 tux06 kernel: 00000001 00000805 00000000 00000000 000517b5 000081a4 00000001 00000000
Mar 19 05:13:01 tux06 kernel: 00000000 00000000 00000000 00000000 00000a29 00000000 00001000 00000008
Mar 19 05:13:01 tux06 kernel: Call Trace:
Mar 19 05:13:01 tux06 kernel:
Mar 19 05:13:01 tux06 kernel: Code: 55 53 56 57 8b 5c 24 14 8b 4c 24 18 8b 54 24 1c 8b 74 24 20

*PATCH* to add the logging (note this patch is not intended for anything other than experimenting & debugging):


$ diff -c ~epperly/linux/arch/i386/kernel/traps.c /usr/src/linux/arch/i386/kernel/traps.c
*** /home/epperly/linux/arch/i386/kernel/traps.c Sun Sep 30 12:26:08 2001
--- /usr/src/linux/arch/i386/kernel/traps.c Fri Mar 15 16:06:06 2002
***************
*** 214,227 ****
* When in-kernel, we also print out the stack and code at the
* time of the fault..
*/
! if (in_kernel) {

printk("\nStack: ");
show_stack((unsigned long*)esp);

printk("\nCode: ");
if(regs->eip < PAGE_OFFSET)
goto bad;

for(i=0;i<20;i++)
{
--- 214,229 ----
* When in-kernel, we also print out the stack and code at the
* time of the fault..
*/
! if (1|in_kernel) {

printk("\nStack: ");
show_stack((unsigned long*)esp);

printk("\nCode: ");
+ /*
if(regs->eip < PAGE_OFFSET)
goto bad;
+ */

for(i=0;i<20;i++)
{
***************
*** 267,304 ****
}

static void inline do_trap(int trapnr, int signr, char *str, int vm86,
! struct pt_regs * regs, long error_code, siginfo_t *info) {
! if (vm86 && regs->eflags & VM_MASK)
! goto vm86_trap;
! if (!(regs->xcs & 3))
! goto kernel_trap;
!
! trap_signal: {
! struct task_struct *tsk = current;
! tsk->thread.error_code = error_code;
! tsk->thread.trap_no = trapnr;
! if (info)
! force_sig_info(signr, info, tsk);
! else
! force_sig(signr, tsk);
! return;
! }
!
! kernel_trap: {
! unsigned long fixup = search_exception_table(regs->eip);
! if (fixup)
! regs->eip = fixup;
! else
! die(str, regs, error_code);
! return;
! }
!
! vm86_trap: {
! int ret = handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, trapnr);
! if (ret) goto trap_signal;
! return;
! }
}

#define DO_ERROR(trapnr, signr, str, name) \
--- 269,340 ----
}

static void inline do_trap(int trapnr, int signr, char *str, int vm86,
! struct pt_regs * regs, long error_code,
! siginfo_t *info)
{
! int i;
! if (vm86 && regs->eflags & VM_MASK)
! goto vm86_trap;
! if (!(regs->xcs & 3))
! goto kernel_trap;
!
! trap_signal: {
! struct task_struct *tsk = current;
! tsk->thread.error_code = error_code;
! tsk->thread.trap_no = trapnr;
!
! /*debug for processes getting illegal operation faults*/
! if(trapnr==6){
! unsigned char c;
!
! __get_user(c, &((unsigned char*)regs->eip)[0]);
!
! if( c==0x55 ){ /*push ebp*/
! if(tsk->per_cpu_utime[31]==regs->eip) {
! /*This guy's been through the mill
! once already*/
! die(str, regs, error_code);
! }else{
! /*first timer, so flag him*/
! tsk->per_cpu_utime[31]=regs->eip;
! printk("Restarting %d 0x%lx %s\n",tsk->pid,regs->eip,tsk->comm);
! for(i=0;i<20;i++) {
! unsigned char c;
! if(__get_user(c,
! &((unsigned char*)regs->eip)[i])) {
! printk(" Bad EIP value.");
! break;
! }
! printk("%02x ", c);
! }
! printk("\n");
! wbinvd();
! __flush_tlb();
! return;
! }
! }
! }
! if (info)
! force_sig_info(signr, info, tsk);
! else
! force_sig(signr, tsk);
! return;
! }
!
! kernel_trap: {
! unsigned long fixup = search_exception_table(regs->eip);
! if (fixup)
! regs->eip = fixup;
! else
! die(str, regs, error_code);
! return;
! }
!
! vm86_trap: {
! int ret = handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, trapnr);
! if (ret) goto trap_signal;
! return;
! }
}

#define DO_ERROR(trapnr, signr, str, name) \


2002-03-20 23:15:51

by Alan

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box

> I initiated a support call with Dell at around 3:30pm PST on Friday
> 15-Mar-2002, and all the feedback I've received from this so far shows
> that they are clueless. They are trying to portray this as a Linux
> problem.

Well to be honest they aren't the only ones who are totally baffled by it.
Do you have the current microcode updates in your BIOS or via the ucode
driver ?

Do all the problem boxes have the same stepping of CPU ?

2002-03-20 23:26:41

by Kurt Garloff

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box

On Wed, Mar 20, 2002 at 01:35:30PM -0800, Tom Epperly wrote:
> The kernel log showed me that various standard programs such as
> /bin/sh are generating bogus illegal instruction traps on a legal
> opcode (0x55) as part of a standard function preamble. After receiving
> an illegal instruction trap on opcode (0x55), the modified kernel does
> a wbinvd() to flush the cache and a __flush_tlb() to flush the TLB
> and then retries the "illegal" opcode. The retry produces a second
> illegal instruction trap on the same legal opcode (0x55). Information
> from /var/log/messages is shown below.

The CPU is what triggers the exception.
So this sounds like a defect (or overheated) CPU to me.

OTOH, the kernel logs "invalid operand". Could you run ksymoops to get a
disassembly?
AFAICS, its a push %ebp instruction, which should not be illegal. So either
your stack is overflowing or my suspicion with the defect CPU is applicable.

Regards,
--
Kurt Garloff <[email protected]> [Eindhoven, NL]
Physics: Plasma simulations <[email protected]> [TU Eindhoven, NL]
Linux: SCSI, Security <[email protected]> [SuSE Nuernberg, DE]
(See mail header or public key servers for PGP2 and GPG public keys.)


Attachments:
(No filename) (1.20 kB)
(No filename) (232.00 B)
Download all attachments

2002-03-20 23:32:21

by Tom Epperly

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box

Alan Cox wrote:

>>I initiated a support call with Dell at around 3:30pm PST on Friday
>>15-Mar-2002, and all the feedback I've received from this so far shows
>>that they are clueless. They are trying to portray this as a Linux
>>problem.
>>
>
>Well to be honest they aren't the only ones who are totally baffled by it.
>Do you have the current microcode updates in your BIOS or via the ucode
>driver ?
>
One box, tux06, has the latest Dell BIOS, A05. I don't know how to
determine if it has the latest microcode updates. Where can one get the
current microcode updates, and how do I install it?

>
>
>Do all the problem boxes have the same stepping of CPU ?
>
According to cat /proc/cpuinfo, two boxes tux06 & tux34 have stepping
10, and tux47 has stepping 2. I have seen the unexplained "Illegal
instruction" messages on tux34 and tux47, but I haven't run the modified
kernel on them. root access is restricted here.

Tom

2002-03-20 23:44:41

by James Washer

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box


Just to clarify things.

Lots of processes die from illegal op traps.. gcc, bash, make, etc... but
the instruction is ALWAYS opcode 0x55 and is part of a subroutine preamble
in every case.. You are correct... 0x55 should not generate a trap.

Bad cpu? Hmmm, Tom has 6 different CPU's ( all p4 xeons ), on three
systems, that have this EXACT same problem.

and why does this require the system to be running smp?


- jim

Kurt Garloff <[email protected]>@vger.kernel.org on 03/20/2002 03:26:10 PM

Sent by: [email protected]


To: Tom Epperly <[email protected]>
cc: Linux kernel list <[email protected]>
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell
box



On Wed, Mar 20, 2002 at 01:35:30PM -0800, Tom Epperly wrote:
> The kernel log showed me that various standard programs such as
> /bin/sh are generating bogus illegal instruction traps on a legal
> opcode (0x55) as part of a standard function preamble. After receiving
> an illegal instruction trap on opcode (0x55), the modified kernel does
> a wbinvd() to flush the cache and a __flush_tlb() to flush the TLB
> and then retries the "illegal" opcode. The retry produces a second
> illegal instruction trap on the same legal opcode (0x55). Information
> from /var/log/messages is shown below.

The CPU is what triggers the exception.
So this sounds like a defect (or overheated) CPU to me.

OTOH, the kernel logs "invalid operand". Could you run ksymoops to get a
disassembly?
AFAICS, its a push %ebp instruction, which should not be illegal. So either
your stack is overflowing or my suspicion with the defect CPU is
applicable.

Regards,
--
Kurt Garloff <[email protected]> [Eindhoven, NL]
Physics: Plasma simulations <[email protected]> [TU Eindhoven, NL]
Linux: SCSI, Security <[email protected]> [SuSE Nuernberg, DE]
(See mail header or public key servers for PGP2 and GPG public keys.)


Attachments:
C.DTF (242.00 B)

2002-03-20 23:49:01

by Alan

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box

> One box, tux06, has the latest Dell BIOS, A05. I don't know how to
> determine if it has the latest microcode updates. Where can one get the
> current microcode updates, and how do I install it?

The microcode updates change the stepping value for the CPU afaik.

> According to cat /proc/cpuinfo, two boxes tux06 & tux34 have stepping
> 10, and tux47 has stepping 2. I have seen the unexplained "Illegal
> instruction" messages on tux34 and tux47, but I haven't run the modified
> kernel on them. root access is restricted here.

Humm. I'm still as baffled as Dell I'm afraid

2002-03-20 23:49:31

by Alan

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box

> disassembly?
> AFAICS, its a push %ebp instruction, which should not be illegal. So either
> your stack is overflowing or my suspicion with the defect CPU is applicable.

Or somehow the I/D TLB's got messed up and the ITLB for that entry is now
wrong.

2002-03-21 00:04:11

by Dave Jones

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box

On Wed, Mar 20, 2002 at 03:31:50PM -0800, Tom Epperly wrote:

> One box, tux06, has the latest Dell BIOS, A05. I don't know how to
> determine if it has the latest microcode updates. Where can one get the
> current microcode updates, and how do I install it?

http://www.urbanmyth.org/microcode/

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-03-21 00:30:26

by James Washer

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box


The iTLB would be flushed when he did the reload of cr3 ( per your
suggestion ) UNLESS the G bit was set.
I suppose theres some small chance, that at the time this instruction was
first cached and its corresponding iTLB entry was loaded, the G bit may
have been set.. Seems unlikely. but I'll hack up something to
unconditionally flush the iTLB.

- jim

Alan Cox <[email protected]>@vger.kernel.org on 03/20/2002 04:04:51
PM

Sent by: [email protected]


To: [email protected] (Kurt Garloff)
cc: [email protected] (Tom Epperly), [email protected] (Linux
kernel list)
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell
box



> disassembly?
> AFAICS, its a push %ebp instruction, which should not be illegal. So
either
> your stack is overflowing or my suspicion with the defect CPU is
applicable.

Or somehow the I/D TLB's got messed up and the ITLB for that entry is now
wrong.

2002-03-21 07:30:10

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box

On Wed, 20 Mar 2002, James Washer wrote:

>
> The iTLB would be flushed when he did the reload of cr3 ( per your
> suggestion ) UNLESS the G bit was set.
> I suppose theres some small chance, that at the time this instruction was
> first cached and its corresponding iTLB entry was loaded, the G bit may
> have been set.. Seems unlikely. but I'll hack up something to
> unconditionally flush the iTLB.

I find vol3 somewhat confusing in this regard...

P104 - The only ways to deterministically invalidate global page entries
are as follows:
o Clear the PGE flag and then invalidate the TLBs.
o Execute the INVLPG instruction to invalidate individual page-directory
or page-table entries in the TLBs.
o Write to control register CR3 to invalidate all TLB entries.

Then on page 381.

The following operations invalidate all TLB entries except global entries.
(A global entry is one for which the G (global) flag is set in its
corresponding page-directory or page-table entry. The global flag was
introduced into the IA-32 architecture in the P6 family processors, see
Section 10.5., Cache Control .)

o Writing to control register CR3.
o A task switch that changes control register CR3.

I would reckon reference 1 (p104) is incorrect, can someone shed some
light?

Thanks,
Zwane


2002-03-21 14:52:37

by James Washer

[permalink] [raw]
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell box


Yes, I agree that page 104 ( Section 3.11 ) is inconsistent with itself
wrt
"Write to control register CR3 to invalidate all TLB entries."

For the particular problem Tom is seeing however. I've recoded do_trap() to
do an invlpg to the particular page that is causing the problem.. Just in
case the G bit was set and the pte was stale. I suspect he'll be able to
test this code this morning.

- jim

Zwane Mwaikambo <[email protected]>@vger.kernel.org on 03/20/2002
11:19:41 PM

Sent by: [email protected]


To: James Washer/Beaverton/IBM@IBMUS
cc: Alan Cox <[email protected]>, <[email protected]>
Subject: Re: Bad Illegal instruction traps on dual-Xeon (p4) Linux Dell
box



On Wed, 20 Mar 2002, James Washer wrote:

>
> The iTLB would be flushed when he did the reload of cr3 ( per your
> suggestion ) UNLESS the G bit was set.
> I suppose theres some small chance, that at the time this instruction was
> first cached and its corresponding iTLB entry was loaded, the G bit may
> have been set.. Seems unlikely. but I'll hack up something to
> unconditionally flush the iTLB.

I find vol3 somewhat confusing in this regard...

P104 - The only ways to deterministically invalidate global page entries
are as follows:
o Clear the PGE flag and then invalidate the TLBs.
o Execute the INVLPG instruction to invalidate individual page-directory
or page-table entries in the TLBs.
o Write to control register CR3 to invalidate all TLB entries.

Then on page 381.

The following operations invalidate all TLB entries except global entries.
(A global entry is one for which the G (global) flag is set in its
corresponding page-directory or page-table entry. The global flag was
introduced into the IA-32 architecture in the P6 family processors, see
Section 10.5., Cache Control .)

o Writing to control register CR3.
o A task switch that changes control register CR3.

I would reckon reference 1 (p104) is incorrect, can someone shed some
light?

Thanks,
Zwane