I got a problem using LOCAL APIC and IO-APIC on my uniprocessor nforce2 board.
With recent kernels (latest -mm and 2.6.5-linus) the timer irq gets set to
XT-PIC, which results in having a constant hi-load of 15% (after booting) to
about 25% (after having the system run about 12 h). Earlier versions of -mm
set the timer-irq to IO-APIC-level (or edge, i dont remember it well) and i
never had any constant hi-load with these versions. Since mainline kernel
versions never ever set the timer irq to IO-APIC-{level,edge} i used to patch
them with the ross' nforce-patches, so that the timer-irq gets to be
IO-APCI-edge, which worked even though the patch applied with offset. Anyways
with the latest mm-kernels these patches dont work anymore. I could apply
them with offset but it seems the code isn't used or something else is wrong
since the timer-irq stays XT-PIC, which results in the problems above. Could
anyone point out, how to resolve this problem or tell me what I could do, to
get my timer-irq right? I'm sure willing to test patches...
Thanks in advance, christian.
Christian Kr?ner wrote
> I got a problem using LOCAL APIC and IO-APIC on my uniprocessor nforce2 board.
> With recent kernels (latest -mm and 2.6.5-linus) the timer irq gets set to
> XT-PIC, which results in having a constant hi-load of 15% (after booting) to
> about 25% (after having the system run about 12 h). Earlier versions of -mm
> set the timer-irq to IO-APIC-level (or edge, i dont remember it well) and i
> never had any constant hi-load with these versions. Since mainline kernel
> versions never ever set the timer irq to IO-APIC-{level,edge} i used to patch
> them with the ross' nforce-patches, so that the timer-irq gets to be
> IO-APCI-edge, which worked even though the patch applied with offset. Anyways
> with the latest mm-kernels these patches dont work anymore. I could apply
> them with offset but it seems the code isn't used or something else is wrong
> since the timer-irq stays XT-PIC, which results in the problems above. Could
> anyone point out, how to resolve this problem or tell me what I could do, to
> get my timer-irq right? I'm sure willing to test patches...
> Thanks in advance, christian.
> -
Hi Christian
I don't know why the high load on xtpic except maybe heaps of spurious irq's
under the hood.
I am working with 2.4.26-rc2 and have noticed a change with the the recent acpi?
update. The recent fix to stop unnecessary ioapic irq routing entries puts the
following if statement into io_apic.c, io_apic_set_pci_routing()
/*
* IRQs < 16 are already in the irq_2_pin[] map
*/
if (irq >= 16)
add_pin_to_irq(irq, ioapic, pin);
which prevents my io-apic patch from using that function to reprogram the
io-apic pin on irq0 from pin2 to pin0.
As a quick fix you could drop the "if (irq >= 16)".
I don't know what harm if any that would do other than create unwanted
irq mapping entries as in the past.
As a better solution to work with the new code I have created a function to
change the pin an irq comes into the io-apic on and also re-initialise the
io-apic to deal with the change.
Here is the function for 2.4.26-rc2.
/*
* reroute irq to different pin clearing old and enabling new
*/
static void __init replace_IO_APIC_pin_at_irq(unsigned int irq,
int oldapic, int oldpin,
int newapic, int newpin)
{
struct IO_APIC_route_entry entry;
unsigned long flags;
/*
* read oldapic entry
*/
spin_lock_irqsave(&ioapic_lock, flags);
*(((int*)&entry) + 0) = io_apic_read(oldapic, 0x10 + 2 * oldpin);
*(((int*)&entry) + 1) = io_apic_read(oldapic, 0x11 + 2 * oldpin);
spin_unlock_irqrestore(&ioapic_lock, flags);
/*
* Check delivery_mode to be sure we're not clearing an SMI pin
*/
if (entry.delivery_mode == dest_SMI)
return;
/*
* clear oldpin on oldapic
*/
clear_IO_APIC_pin(oldapic, oldpin);
/*
* reroute irq to newpin on newapic
*/
replace_pin_at_irq(irq, oldapic, oldpin, newapic, newpin);
/*
* Enable newpin on newapic
*/
spin_lock_irqsave(&ioapic_lock, flags);
io_apic_write(newapic, 0x10 + 2*newpin, *(((int *)&entry) + 0));
io_apic_write(newapic, 0x11 + 2*newpin, *(((int *)&entry) + 1));
spin_unlock_irqrestore(&ioapic_lock, flags);
}
I am now using this instead of the io_apic_set_pci_routing().
My modified check_timer() to work with it is as follows.
/*
* This code may look a bit paranoid, but it's supposed to cooperate with
* a wide range of boards and BIOS bugs. Fortunately only the timer IRQ
* is so screwy. Thanks to Brian Perkins for testing/hacking this beast
* fanatically on his truly buggy board.
*/
static inline void check_timer(void)
{
extern int timer_ack;
int pin1, pin2;
int vector, i;
/*
* get/set the timer IRQ vector:
*/
disable_8259A_irq(0);
vector = assign_irq_vector(0);
set_intr_gate(vector, interrupt[0]);
/*
* Subtle, code in do_timer_interrupt() expects an AEOI
* mode for the 8259A whenever interrupts are routed
* through I/O APICs. Also IRQ0 has to be enabled in
* the 8259A which implies the virtual wire has to be
* disabled in the local APIC.
*/
apic_write_around(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_EXTINT);
init_8259A(1);
timer_ack = 1;
enable_8259A_irq(0);
pin1 = find_isa_irq_pin(0, mp_INT);
pin2 = find_isa_irq_pin(0, mp_ExtINT);
printk(KERN_INFO "..TIMER: vector=0x%02X pin1=%d pin2=%d\n", vector, pin1, pin2);
if (pin1 != -1) {
for(i=0;i<2;i++) {
/*
* Ok, does IRQ0 through the IOAPIC work?
*/
unmask_IO_APIC_irq(0);
if (timer_irq_works()) {
if (nmi_watchdog == NMI_IO_APIC) {
disable_8259A_irq(0);
setup_nmi();
enable_8259A_irq(0);
check_nmi_watchdog();
}
printk(KERN_INFO "..TIMER: works OK on IO-APIC irq0\n" );
return;
}
mask_IO_APIC_irq(0);
if(!i) { /* try INTIN0 if INTIN2 failed */
printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC INTIN%d\n",pin1);
printk(KERN_INFO "..TIMER: Check if 8254 timer connected to IO-APIC INTIN0? ...\n");
replace_IO_APIC_pin_at_irq(0, 0, pin1, 0, 0);
timer_ack=0;
disable_8259A_irq(0);
} else { /* restore settings */
clear_IO_APIC_pin(0, 0);
printk(KERN_ERR "..TIMER: 8254 timer not connected to IO-APIC INTIN0\n");
timer_ack=1;
enable_8259A_irq(0);
}
}
}
printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... ");
if (pin2 != -1) {
printk("\n..... (found pin %d) ...", pin2);
/*
* legacy devices should be connected to IO APIC #0
*/
setup_ExtINT_IRQ0_pin(pin2, vector);
if (timer_irq_works()) {
printk("works.\n");
if (pin1 != -1)
replace_pin_at_irq(0, 0, pin1, 0, pin2);
else
add_pin_to_irq(0, 0, pin2);
if (nmi_watchdog == NMI_IO_APIC) {
setup_nmi();
check_nmi_watchdog();
}
return;
}
/*
* Cleanup, just in case ...
*/
clear_IO_APIC_pin(0, pin2);
}
printk(" failed.\n");
if (nmi_watchdog) {
printk(KERN_WARNING "timer doesn't work through the IO-APIC - disabling NMI Watchdog!\n");
nmi_watchdog = 0;
}
printk(KERN_INFO "...trying to set up timer as Virtual Wire IRQ...");
disable_8259A_irq(0);
irq_desc[0].handler = &lapic_irq_type;
apic_write_around(APIC_LVT0, APIC_DM_FIXED | vector); /* Fixed mode */
enable_8259A_irq(0);
if (timer_irq_works()) {
printk(" works.\n");
return;
}
apic_write_around(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | vector);
printk(" failed.\n");
printk(KERN_INFO "...trying to set up timer as ExtINT IRQ...");
init_8259A(0);
make_8259A_irq(0);
apic_write_around(APIC_LVT0, APIC_DM_EXTINT);
unlock_ExtINT_logic();
if (timer_irq_works()) {
printk(" works.\n");
return;
}
printk(" failed :(.\n");
panic("IO-APIC + timer doesn't work! pester [email protected]");
}
This version loops twice on the "pin1" attempt, firstly trying the bios assigned
pin, then trying pin0 with no timer acks and the 8259 xtpic disabled.
I have not as yet downloaded 2.6.5xxx
From memory this 2.4.26-rc2 code should be very similar to the (2.6.5-linus)
but a bit different to the -mm series. For the -mm series I think you can drop
the "timer_ack=" lines from my changes as it still has Maciej Rozycki's 8259
ack patch? The timer ack should already have been correctly set to off by it's
checking if the apic is an integrated one.
Here are the changes as a diff on the io_apic.c in 2.4.26-rc2
--- io_apic.c.orig 2004-04-08 15:56:53.000000000 +1000
+++ io_apic.c 2004-04-10 02:33:02.000000000 +1000
@@ -197,10 +197,48 @@ static void clear_IO_APIC (void)
for (pin = 0; pin < nr_ioapic_registers[apic]; pin++)
clear_IO_APIC_pin(apic, pin);
}
/*
+ * reroute irq to different pin clearing old and enabling new
+ */
+static void __init replace_IO_APIC_pin_at_irq(unsigned int irq,
+ int oldapic, int oldpin,
+ int newapic, int newpin)
+{
+ struct IO_APIC_route_entry entry;
+ unsigned long flags;
+ /*
+ * read oldapic entry
+ */
+ spin_lock_irqsave(&ioapic_lock, flags);
+ *(((int*)&entry) + 0) = io_apic_read(oldapic, 0x10 + 2 * oldpin);
+ *(((int*)&entry) + 1) = io_apic_read(oldapic, 0x11 + 2 * oldpin);
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+ /*
+ * Check delivery_mode to be sure we're not clearing an SMI pin
+ */
+ if (entry.delivery_mode == dest_SMI)
+ return;
+ /*
+ * clear oldpin on oldapic
+ */
+ clear_IO_APIC_pin(oldapic, oldpin);
+ /*
+ * reroute irq to newpin on newapic
+ */
+ replace_pin_at_irq(irq, oldapic, oldpin, newapic, newpin);
+ /*
+ * Enable newpin on newapic
+ */
+ spin_lock_irqsave(&ioapic_lock, flags);
+ io_apic_write(newapic, 0x10 + 2*newpin, *(((int *)&entry) + 0));
+ io_apic_write(newapic, 0x11 + 2*newpin, *(((int *)&entry) + 1));
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
+/*
* support for broken MP BIOSs, enables hand-redirection of PIRQ0-7 to
* specific CPU-side IRQs.
*/
#define MAX_PIRQS 8
@@ -1582,11 +1620,11 @@ static inline void unlock_ExtINT_logic(v
*/
static inline void check_timer(void)
{
extern int timer_ack;
int pin1, pin2;
- int vector;
+ int vector, i;
/*
* get/set the timer IRQ vector:
*/
disable_8259A_irq(0);
@@ -1609,25 +1647,39 @@ static inline void check_timer(void)
pin2 = find_isa_irq_pin(0, mp_ExtINT);
printk(KERN_INFO "..TIMER: vector=0x%02X pin1=%d pin2=%d\n", vector, pin1, pin2);
if (pin1 != -1) {
- /*
- * Ok, does IRQ0 through the IOAPIC work?
- */
- unmask_IO_APIC_irq(0);
- if (timer_irq_works()) {
- if (nmi_watchdog == NMI_IO_APIC) {
+ for(i=0;i<2;i++) {
+ /*
+ * Ok, does IRQ0 through the IOAPIC work?
+ */
+ unmask_IO_APIC_irq(0);
+ if (timer_irq_works()) {
+ if (nmi_watchdog == NMI_IO_APIC) {
+ disable_8259A_irq(0);
+ setup_nmi();
+ enable_8259A_irq(0);
+ check_nmi_watchdog();
+ }
+ printk(KERN_INFO "..TIMER: works OK on IO-APIC irq0\n" );
+ return;
+ }
+ mask_IO_APIC_irq(0);
+ if(!i) { /* try INTIN0 if INTIN2 failed */
+ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC INTIN%d\n",pin1);
+ printk(KERN_INFO "..TIMER: Check if 8254 timer connected to IO-APIC INTIN0? ...\n");
+ replace_IO_APIC_pin_at_irq(0, 0, pin1, 0, 0);
+ timer_ack=0;
disable_8259A_irq(0);
- setup_nmi();
+ } else { /* restore settings */
+ clear_IO_APIC_pin(0, 0);
+ printk(KERN_ERR "..TIMER: 8254 timer not connected to IO-APIC INTIN0\n");
+ timer_ack=1;
enable_8259A_irq(0);
- check_nmi_watchdog();
}
- return;
}
- clear_IO_APIC_pin(0, pin1);
- printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n");
}
printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... ");
if (pin2 != -1) {
printk("\n..... (found pin %d) ...", pin2);
Also attached as tarball if whitespace problems,
Hope this helps, Please cc me on responses,
Ross Dickson
Very odd. i'm using plain 2.6.5 with your 2.6.3
APIC patches, and left all this io_apic_set_pci_routing()
stuff in. And, for this first time in who knows
how long i seem to have a stable computer. Actually
been up more than eight days.
This is an old overclocked MSI K7N2 with the first
revision of the nForce2 chipset, the one that's only
supposed to have UDMA100 (dunno if that's the chipset
or the MSI mboard: the 2.6.X kernels have always said
during bootup that it's running UDMA133). i use an
old Tulip ethercard instead of the onboard LAN.
This machine is the beater box: an HTPC and a 24/7
file share client, compile and test stuff, play music
thru an Audigy sound card, burn DVD's, play video
files, many of these things at the same time.
Before this kernel i was lucky to have uptimes over
two days.
b
On Tue, Apr 13, 2004 at 11:17:31AM +1000, Ross Dickson wrote:
> I am working with 2.4.26-rc2 and have noticed a change with the the recent acpi?
> update. The recent fix to stop unnecessary ioapic irq routing entries puts the
> following if statement into io_apic.c, io_apic_set_pci_routing()
>
> /*
> * IRQs < 16 are already in the irq_2_pin[] map
> */
> if (irq >= 16)
> add_pin_to_irq(irq, ioapic, pin);
>
> which prevents my io-apic patch from using that function to reprogram the
> io-apic pin on irq0 from pin2 to pin0.
On Tuesday 13 April 2004 14:01, really bensoo_at_soo_dot_com wrote:
> Very odd. i'm using plain 2.6.5 with your 2.6.3
Yes odd, it's the first report of a "hi-load" XT-PIC issue I know of.
> APIC patches, and left all this io_apic_set_pci_routing()
> stuff in. And, for this first time in who knows
> how long i seem to have a stable computer. Actually
> been up more than eight days.
Sounds Very Good.
Are you using my io-apic patch with the apic ack delay or with the
C1idle version?
i.e. patched io_apic.c and apic.c and using kernel arg "apic_tack="
or patched io_apic.c and process.c and using kernel arg "idle=C1halt"?
My cat proc/cmdline
root=/dev/hdb2 idle=C1halt nmi_watchdog=1
Could you please cat /proc/interrupts.
I would like to see how irq0 is routed.
Mine looks like.
CPU0
0: 229404 IO-APIC-edge timer
1: 376 IO-APIC-edge keyboard
2: 0 XT-PIC cascade
9: 0 IO-APIC-level acpi
12: 13499 IO-APIC-edge PS/2 Mouse
14: 10482 IO-APIC-edge ide0
15: 73 IO-APIC-edge ide1
16: 27055 IO-APIC-level nvidia
20: 46913 IO-APIC-level eth0, usb-ohci
21: 3660 IO-APIC-level ehci_hcd, NVIDIA nForce Audio
22: 0 IO-APIC-level usb-ohci
NMI: 229547
LOC: 229340
ERR: 0
MIS: 0
And from boot log
with my new timer setup
ENABLING IO-APIC IRQs
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC INTIN2
..TIMER: Check if 8254 timer connected to IO-APIC INTIN0? ...
activating NMI Watchdog ... done.
testing NMI watchdog ... OK.
..TIMER: works OK on IO-APIC irq0
Using local APIC timer interrupts.
calibrating APIC timer ...
and my ioapic routing
number of MP IRQ sources: 15.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 00170011
....... : max redirection entries: 0017
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 001 01 0 0 0 0 0 1 1 31
01 001 01 0 0 0 0 0 1 1 39
02 000 00 1 0 0 0 0 0 0 00
03 001 01 0 0 0 0 0 1 1 41
04 001 01 0 0 0 0 0 1 1 49
05 001 01 0 0 0 0 0 1 1 51
06 001 01 0 0 0 0 0 1 1 59
07 001 01 0 0 0 0 0 1 1 61
08 001 01 0 0 0 0 0 1 1 69
09 001 01 0 1 0 0 0 1 1 71
0a 001 01 0 0 0 0 0 1 1 79
0b 001 01 0 0 0 0 0 1 1 81
0c 001 01 0 0 0 0 0 1 1 89
0d 001 01 0 0 0 0 0 1 1 91
0e 001 01 0 0 0 0 0 1 1 99
0f 001 01 0 0 0 0 0 1 1 A1
10 001 01 1 1 0 0 0 1 1 D9
11 001 01 1 1 0 0 0 1 1 E1
12 001 01 1 1 0 0 0 1 1 C9
13 001 01 1 1 0 0 0 1 1 D1
14 001 01 1 1 0 0 0 1 1 B1
15 001 01 1 1 0 0 0 1 1 C1
16 001 01 1 1 0 0 0 1 1 B9
17 001 01 1 1 0 0 0 1 1 A9
IRQ to pin mappings:
IRQ0 -> 0:0
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ16 -> 0:16
IRQ17 -> 0:17
IRQ18 -> 0:18
IRQ19 -> 0:19
IRQ20 -> 0:20
IRQ21 -> 0:21
IRQ22 -> 0:22
IRQ23 -> 0:23
.................................... done.
PCI: Using ACPI for IRQ routing
>
> This is an old overclocked MSI K7N2 with the first
> revision of the nForce2 chipset, the one that's only
> supposed to have UDMA100 (dunno if that's the chipset
> or the MSI mboard: the 2.6.X kernels have always said
> during bootup that it's running UDMA133). i use an
> old Tulip ethercard instead of the onboard LAN.
>
I am now using forcedeth for onboard ether. It works well and is
convenient when rebuilding and testing kernels and modules.
> This machine is the beater box: an HTPC and a 24/7
> file share client, compile and test stuff, play music
> thru an Audigy sound card, burn DVD's, play video
> files, many of these things at the same time.
>
> Before this kernel i was lucky to have uptimes over
> two days.
Yes I remember how frustrating it felt to have linux regularly die and
even fail to boot properly.
>
> b
>
> On Tue, Apr 13, 2004 at 11:17:31AM +1000, Ross Dickson wrote:
> > I am working with 2.4.26-rc2 and have noticed a change with the the recent acpi?
> > update. The recent fix to stop unnecessary ioapic irq routing entries puts the
> > following if statement into io_apic.c, io_apic_set_pci_routing()
> >
> > /*
> > * IRQs < 16 are already in the irq_2_pin[] map
> > */
> > if (irq >= 16)
> > add_pin_to_irq(irq, ioapic, pin);
> >
> > which prevents my io-apic patch from using that function to reprogram the
> > io-apic pin on irq0 from pin2 to pin0.
>
>
>
>
I did some more reading on kernel version re Maciej's 8259 ack patch
Ignore my comments in previous posting as patch was fully pulled from all
kernels at end of 2.6.3 ie. never appeared in 2.6.4 or later
>>I have not as yet downloaded 2.6.5xxx
>>From memory this 2.4.26-rc2 code should be very similar to the (2.6.5-linus)
>>but a bit different to the -mm series. For the -mm series I think you can drop
>>the "timer_ack=" lines from my changes as it still has Maciej Rozycki's 8259
>>ack patch? The timer ack should already have been correctly set to off by it's
>>checking if the apic is an integrated one.
Shame as it seemed theoretically correct to me to not ack.
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-04/2143.html
Regards
Ross.
On Mon, 2004-04-12 at 21:17, Ross Dickson wrote:
> I am working with 2.4.26-rc2 and have noticed a change with the the recent acpi?
> update. The recent fix to stop unnecessary ioapic irq routing entries puts the
> following if statement into io_apic.c, io_apic_set_pci_routing()
>
> /*
> * IRQs < 16 are already in the irq_2_pin[] map
> */
> if (irq >= 16)
> add_pin_to_irq(irq, ioapic, pin);
>
> which prevents my io-apic patch from using that function to reprogram the
> io-apic pin on irq0 from pin2 to pin0.
>
> As a quick fix you could drop the "if (irq >= 16)".
> I don't know what harm if any that would do other than create unwanted
> irq mapping entries as in the past.
I made that change -- sorry I broke your patch.
No, I doubt it would matter if you hacked out "if (irq >=16)"
for the time being.
I haven't been following this thread closely, but
http://bugme.osdl.org/show_bug.cgi?id=1203 says I should;-)
I understand that these boards have the timer attached to pin0
in APIC mode, but that the BIOS says it is connected to pin2:
ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0]
trigger[0x0])
Wouldn't it be a simpler patch to recognize this board and simply
disable this bogus BIOS INT_SRC_OVR?
Also, what is the symptom of the XT-PIC timer? Is it the source
of the nForce2 hangs, or something else? The latest message
suggested that it caused a backround load on the system, but
I don't recall hearing that one on this thread before.
thanks,
-Len
On Tuesday 13 April 2004 15:08, Len Brown wrote:
> On Mon, 2004-04-12 at 21:17, Ross Dickson wrote:
>
> > I am working with 2.4.26-rc2 and have noticed a change with the the recent acpi?
> > update. The recent fix to stop unnecessary ioapic irq routing entries puts the
> > following if statement into io_apic.c, io_apic_set_pci_routing()
> >
> > /*
> > * IRQs < 16 are already in the irq_2_pin[] map
> > */
> > if (irq >= 16)
> > add_pin_to_irq(irq, ioapic, pin);
> >
> > which prevents my io-apic patch from using that function to reprogram the
> > io-apic pin on irq0 from pin2 to pin0.
> >
> > As a quick fix you could drop the "if (irq >= 16)".
> > I don't know what harm if any that would do other than create unwanted
> > irq mapping entries as in the past.
>
> I made that change -- sorry I broke your patch.
> No, I doubt it would matter if you hacked out "if (irq >=16)"
> for the time being.
Thanks Len, my patch was a bit of a quick hack anyway.
>
> I haven't been following this thread closely, but
> http://bugme.osdl.org/show_bug.cgi?id=1203 says I should;-)
>
> I understand that these boards have the timer attached to pin0
> in APIC mode, but that the BIOS says it is connected to pin2:
>
> ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0]
> trigger[0x0])
>
> Wouldn't it be a simpler patch to recognize this board and simply
> disable this bogus BIOS INT_SRC_OVR?
I will go with you on this one as I have read the intel spec docs
but have not yet learnt the acpi code base.
Maciej forwarded me some an override patch he developed for another
architecture where one could spec MP info as kernel args and that worked but
we still had no nmi_debug=1 with the timer_ack=1 situation, which he then
fixed in 2.6.3-mm3 but it got pulled for 2.6.4
Maciej, is that override code good to go on latest kernels? I am a novice to
acpi parsing etc.
Also some users reported clock skew with timer routed via io-apic pin0.
We never got to the bottom of that so I don't know if doing a pci quirk
on nforce2 would satisfy all for widespread use.
>
> Also, what is the symptom of the XT-PIC timer? Is it the source
> of the nForce2 hangs, or something else? The latest message
> suggested that it caused a backround load on the system, but
> I don't recall hearing that one on this thread before.
Christian could we have more detail on "hi-load" XTPIC please?
Source of nforce2 hang is officially not commented on by Nvidia or
AMD.
>From what I know it appears now to be an Athlon to chipset problem as it has
also occured on an SIS-740 board. It seems to have less to do with the
interrupt routing and everything to do with the timing of back to back C1
disconnect cycles when those cycles are occuring at a high rate.
Unfortunately spurious interrupts contribute to disconnect rate - and there
are lots of those in XT-PIC mode. I hacked the proc/interrupts code to view
them on irq7 and it was really bad if I used local apic without io-apic.
What I think the mechanism is...
After C1 cycle has occured, if the HLT instruction (to disconnect again) is
executed sooner than about 1us after the interrupt that pulled cpu out
of the C1 cycle occured then likely -we die. The probability of this happening
greatly increases with the rate of C1 cycles. Evident by 1000Hz timer ticks
of 2.6 showing problem up more than 100Hz 2.4.
Also acpi support for nforce2 in apic with io-apic mode is not widespread
amongst major 2.4 distros to my knowledge - they stick with
XTPIC on install. Also in XTPIC mode the southbridge accesses provide
the delay time needed for stability in most cases but of course NVIDIA to my
knowledge have not published PCI irq routing registers to be able to manually
route irqs so devices get stuck unnecessarily sharing a single irq in XTPIC
mode. I tried a kernel hacked with the AMD 76x registers but they were
obviously different.
I think this is going to be a major headache when 2.6 is the main stream distro
as there is a lot of cheap nforce2 out there.
Judging from the silence from AMD hardware vendors- they seem prepared to
wait it out - maybe hoping everyone will go 64bit before it hits the fan?
-Ross.
>
> thanks,
> -Len
>
>
>
>
>
thanks very much for the replies... Ross, I will test your patch as soon
as I get home.
Some more infos: my machine worked pretty well and stable even with the
irq0 as XT-PIC. Though I worried about the constant hi-load I got.
At the moment, using 2.6.4-ck2 patched only with the io_apic.c-diff from
Ross I get this output on system-log:
ENABLING IO-APIC IRQs
IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23
not connected.
A..TIMER: vector=0x31 pin1=2 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC INTIN2
..TIMER: Is timer irq0 connected to IO-APIC INTIN0?...
IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0)
..TIMER: works OK on IO-APIC INTIN0 irq0
Using local APIC timer interrupts.
calibrating APIC timer ...
cat /proc/interrupts gives me:
CPU0
0: 13681442 IO-APIC-edge timer
1: 35 IO-APIC-edge i8042
2: 0 XT-PIC cascade
8: 4 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 2054 IO-APIC-edge i8042
14: 81623 IO-APIC-edge ide0
15: 87 IO-APIC-edge ide1
16: 2397 IO-APIC-level ide2, saa7134[0]
17: 156 IO-APIC-level CMI8738
19: 1161008 IO-APIC-level nvidia
20: 3050303 IO-APIC-level ohci_hcd, eth0
21: 2316037857 IO-APIC-level ehci_hcd
22: 76 IO-APIC-level ohci_hcd
NMI: 0
LOC: 13633382
ERR: 0
MIS: 0
irq-routing:
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 00170011
....... : max redirection entries: 0017
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 001 01 0 0 0 0 0 1 1 31
01 001 01 0 0 0 0 0 1 1 39
02 000 00 0 0 0 0 0 0 0 00
03 001 01 0 0 0 0 0 1 1 41
04 001 01 0 0 0 0 0 1 1 49
05 001 01 0 0 0 0 0 1 1 51
06 001 01 0 0 0 0 0 1 1 59
07 001 01 0 0 0 0 0 1 1 61
08 001 01 0 0 0 0 0 1 1 69
09 001 01 0 1 0 0 0 1 1 71
0a 001 01 0 0 0 0 0 1 1 79
0b 001 01 0 0 0 0 0 1 1 81
0c 001 01 0 0 0 0 0 1 1 89
0d 001 01 0 0 0 0 0 1 1 91
0e 001 01 0 0 0 0 0 1 1 99
0f 001 01 0 0 0 0 0 1 1 A1
10 001 01 1 1 0 0 0 1 1 D1
11 001 01 1 1 0 0 0 1 1 D9
12 001 01 1 1 0 0 0 1 1 E1
13 001 01 1 1 0 0 0 1 1 C9
14 001 01 1 1 0 0 0 1 1 B1
15 001 01 1 1 0 0 0 1 1 C1
16 001 01 1 1 0 0 0 1 1 B9
17 001 01 1 1 0 0 0 1 1 A9
IRQ to pin mappings:
IRQ0 -> 0:2-> 0:0
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9-> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ16 -> 0:16
IRQ17 -> 0:17
IRQ18 -> 0:18
IRQ19 -> 0:19
IRQ20 -> 0:20
IRQ21 -> 0:21
IRQ22 -> 0:22
IRQ23 -> 0:23
Maybe some of the copy+paste wrent wrong (sucky Mozilla mail here).
Earlier versions of -mm reported setting irq0 as virtual wire irq worked
and I didnt experience any uncommon hi-load with them. -mm latest sets
the timer irq as ExtINT, resulting in these strange constant hi-loads.
I will report results of testing your patch later, Ross.
thanks, christian.
On Tue, 13 Apr 2004, Ross Dickson wrote:
> Maciej forwarded me some an override patch he developed for another
> architecture where one could spec MP info as kernel args and that worked but
> we still had no nmi_debug=1 with the timer_ack=1 situation, which he then
> fixed in 2.6.3-mm3 but it got pulled for 2.6.4
>
> Maciej, is that override code good to go on latest kernels? I am a novice to
> acpi parsing etc.
I suppose it should be fine.
> Unfortunately spurious interrupts contribute to disconnect rate - and there
> are lots of those in XT-PIC mode. I hacked the proc/interrupts code to view
> them on irq7 and it was really bad if I used local apic without io-apic.
Spurious interrupts are normally recorded in the "ERR" entry in
/proc/interrupts, so you shouldn't have to record them separately. And
there should be none counted, except perhaps a few arriving upon
initialization of the local APIC.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +
Here is some other (maybe useful) info I can give:
This is part of my system log from kernel 2.6.5-mm4 (no other patches than -mm).
The irq0 gets set to XT-PIC with this kernel version...
timer setup:
ENABLING IO-APIC IRQs
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ... failed.
...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ... works.
Using local APIC timer interrupts.
interrupt routing:
number of MP IRQ sources: 15.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 00170011
....... : max redirection entries: 0017
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 001 01 0 0 0 0 0 1 1 39
02 000 00 1 0 0 0 0 0 0 00
03 001 01 0 0 0 0 0 1 1 41
04 001 01 0 0 0 0 0 1 1 49
05 001 01 0 0 0 0 0 1 1 51
06 001 01 0 0 0 0 0 1 1 59
07 001 01 1 0 0 0 0 1 1 61
08 001 01 0 0 0 0 0 1 1 69
09 001 01 0 1 0 0 0 1 1 71
0a 001 01 0 0 0 0 0 1 1 79
0b 001 01 0 0 0 0 0 1 1 81
0c 001 01 0 0 0 0 0 1 1 89
0d 001 01 0 0 0 0 0 1 1 91
0e 001 01 0 0 0 0 0 1 1 99
0f 001 01 0 0 0 0 0 1 1 A1
10 001 01 1 1 0 0 0 1 1 D1
11 001 01 1 1 0 0 0 1 1 D9
12 001 01 1 1 0 0 0 1 1 E1
13 001 01 1 1 0 0 0 1 1 C9
14 001 01 1 1 0 0 0 1 1 B1
15 001 01 1 1 0 0 0 1 1 C1
16 001 01 1 1 0 0 0 1 1 B9
17 001 01 1 1 0 0 0 1 1 A9
Now, with 2.6.5-mm5-1, patched by hand with the io_apic.c-patch I got from Ross
(removing the declaration of extern int timer_ack in check_timer(), changing nothing else),
I get the following:
timer setup:
ENABLING IO-APIC IRQs
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC INTIN2
..TIMER: Check if 8254 timer connected to IO-APIC INTIN0? ...
..TIMER: works OK on IO-APIC irq0
Using local APIC timer interrupts.
irq routing:
number of MP IRQ sources: 15.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 00170011
....... : max redirection entries: 0017
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 001 01 0 0 0 0 0 1 1 31
01 001 01 0 0 0 0 0 1 1 39
02 000 00 1 0 0 0 0 0 0 00
03 001 01 0 0 0 0 0 1 1 41
04 001 01 0 0 0 0 0 1 1 49
05 001 01 0 0 0 0 0 1 1 51
06 001 01 0 0 0 0 0 1 1 59
07 001 01 0 0 0 0 0 1 1 61
08 001 01 0 0 0 0 0 1 1 69
09 001 01 0 1 0 0 0 1 1 71
0a 001 01 0 0 0 0 0 1 1 79
0b 001 01 0 0 0 0 0 1 1 81
0c 001 01 0 0 0 0 0 1 1 89
0d 001 01 0 0 0 0 0 1 1 91
0e 001 01 0 0 0 0 0 1 1 99
0f 001 01 0 0 0 0 0 1 1 A1
10 001 01 1 1 0 0 0 1 1 D1
11 001 01 1 1 0 0 0 1 1 D9
12 001 01 1 1 0 0 0 1 1 E1
13 001 01 1 1 0 0 0 1 1 C9
14 001 01 1 1 0 0 0 1 1 B1
15 001 01 1 1 0 0 0 1 1 C1
16 001 01 1 1 0 0 0 1 1 B9
17 001 01 1 1 0 0 0 1 1 A9
IRQ to pin mappings:
IRQ0 -> 0:0
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ16 -> 0:16
IRQ17 -> 0:17
IRQ18 -> 0:18
IRQ19 -> 0:19
IRQ20 -> 0:20
IRQ21 -> 0:21
IRQ22 -> 0:22
IRQ23 -> 0:23
.................................... done.
cat /proc/interrupts gets me:
CPU0
0: 568313 IO-APIC-edge timer
1: 1359 IO-APIC-edge i8042
2: 0 XT-PIC cascade
7: 0 IO-APIC-edge parport0
8: 4 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 14859 IO-APIC-edge i8042
14: 17983 IO-APIC-edge ide0
15: 92 IO-APIC-edge ide1
16: 2335 IO-APIC-level ide2, saa7134[0]
17: 142 IO-APIC-level CMI8738
19: 31779 IO-APIC-level nvidia
20: 72619 IO-APIC-level ohci_hcd, eth0
21: 86626041 IO-APIC-level ehci_hcd
22: 78 IO-APIC-level ohci_hcd
NMI: 0
LOC: 566374
ERR: 0
MIS: 0
There is NO constant hi-load anymore, cool!
thanks, christian.
P.S: Ross, could you send a patch that could be applied using the patch-utility?
On Tue, Apr 13, 2004 at 02:55:52PM +1000, Ross Dickson wrote:
> Are you using my io-apic patch with the apic ack delay or with the
> C1idle version?
> i.e. patched io_apic.c and apic.c and using kernel arg "apic_tack="
> or patched io_apic.c and process.c and using kernel arg "idle=C1halt"?
i'm using your C1idle patches:
nforce2-idleC1halt-rd-2.6.3.patch
nforce2-ioapic-rd-2.6.3.patch
cat /proc/cmdline
BOOT_IMAGE=linux-test ro root=305 idebus=33 acpi=on idle=C1halt
> Could you please cat /proc/interrupts.
> I would like to see how irq0 is routed.
My irq0 says XT-PIC. i'm not complaining, box's still
very stable and since the last post i've burned a few
DVDs on it while running the file share client and
playing music.
cat /proc/interrupts
CPU0
0: 759809583 XT-PIC timer
1: 382279 IO-APIC-edge i8042
2: 0 XT-PIC cascade
8: 1 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 6386931 IO-APIC-edge i8042
14: 2117474 IO-APIC-edge ide0
15: 5575006 IO-APIC-edge ide1
201: 6425958 IO-APIC-level EMU10K1
209: 167929203 IO-APIC-level eth0
NMI: 0
LOC: 759718637
ERR: 0
MIS: 0
>
> And from boot log
> with my new timer setup
my boot dmesg timer setup:
ENABLING IO-APIC IRQs
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC INTIN2
..TIMER: Is timer irq0 connected to IO-APIC INTIN0? ...
IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0)
IOAPIC[0]: Set PCI routing entry (2-2 -> 0x31 -> IRQ 0 Mode:0 Active:0)
..MP-BIOS: 8254 timer not connected to IO-APIC INTIN0
...trying to set up timer (IRQ0) through the 8259A ... failed.
...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ... works.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 2135.0772 MHz.
..... host bus clock speed is 388.0322 MHz.
-------------------------------------------------------------
ioapic routing:
number of MP IRQ sources: 15.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 00170011
....... : max redirection entries: 0017
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 001 01 0 0 0 0 0 1 1 39
02 001 01 1 0 0 0 0 1 1 31
03 001 01 0 0 0 0 0 1 1 41
04 001 01 0 0 0 0 0 1 1 49
05 001 01 0 0 0 0 0 1 1 51
06 001 01 0 0 0 0 0 1 1 59
07 001 01 1 0 0 0 0 1 1 61
08 001 01 0 0 0 0 0 1 1 69
09 001 01 0 1 0 0 0 1 1 71
0a 001 01 0 0 0 0 0 1 1 79
0b 001 01 0 0 0 0 0 1 1 81
0c 001 01 0 0 0 0 0 1 1 89
0d 001 01 0 0 0 0 0 1 1 91
0e 001 01 0 0 0 0 0 1 1 99
0f 001 01 0 0 0 0 0 1 1 A1
10 001 01 1 1 0 0 0 1 1 D1
11 001 01 1 1 0 0 0 1 1 D9
12 001 01 1 1 0 0 0 1 1 E1
13 001 01 1 1 0 0 0 1 1 C9
14 001 01 1 1 0 0 0 1 1 B1
15 001 01 1 1 0 0 0 1 1 C1
16 001 01 1 1 0 0 0 1 1 B9
17 001 01 1 1 0 0 0 1 1 A9
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ16 -> 0:16
IRQ17 -> 0:17
IRQ18 -> 0:18
IRQ19 -> 0:19
IRQ20 -> 0:20
IRQ21 -> 0:21
IRQ22 -> 0:22
IRQ23 -> 0:23
.................................... done.
PCI: Using ACPI for IRQ routing
> I am now using forcedeth for onboard ether. It works well and is
> convenient when rebuilding and testing kernels and modules.
i would too but am still on a coax network here...
What? Upgrade? What?
b
Re: IRQ0 XT-PIC timer issue
Since the hardware is connected to APIC pin0, it is a BIOS bug
that an ACPI interrupt source override from pin2 to IRQ0 exists.
With this simple 2.6.5 patch you can specify "acpi_skip_timer_override"
to ignore that bogus BIOS directive. The result is with your
ACPI-enabled APIC-enabled kernel, you'll get IRQ0 IO-APIC-edge timer.
Probably there is a more clever way to trigger this workaround
automatcially instead of via boot parameter.
cheers,
-Len
===== Documentation/kernel-parameters.txt 1.44 vs edited =====
--- 1.44/Documentation/kernel-parameters.txt Mon Mar 22 16:03:22 2004
+++ edited/Documentation/kernel-parameters.txt Tue Apr 13 17:47:11 2004
@@ -122,6 +122,10 @@
acpi_serialize [HW,ACPI] force serialization of AML methods
+ acpi_skip_timer_override [HW,ACPI]]
+ Recognize IRQ0/pin2 Interrupt Source Override
+ and ignore it -- for broken nForce2 BIOS.
+
ad1816= [HW,OSS]
Format: <io>,<irq>,<dma>,<dma2>
See also Documentation/sound/oss/AD1816.
===== arch/i386/kernel/setup.c 1.115 vs edited =====
--- 1.115/arch/i386/kernel/setup.c Fri Apr 2 07:21:43 2004
+++ edited/arch/i386/kernel/setup.c Tue Apr 13 17:41:31 2004
@@ -614,6 +614,12 @@
else if (!memcmp(from, "acpi_sci=low", 12))
acpi_sci_flags.polarity = 3;
+ else if (!memcmp(from, "acpi_skip_timer_override", 24)) {
+ extern int acpi_skip_timer_override;
+
+ acpi_skip_timer_override = 1;
+ }
+
#ifdef CONFIG_X86_LOCAL_APIC
/* disable IO-APIC */
else if (!memcmp(from, "noapic", 6))
===== arch/i386/kernel/acpi/boot.c 1.57 vs edited =====
--- 1.57/arch/i386/kernel/acpi/boot.c Tue Mar 30 17:05:19 2004
+++ edited/arch/i386/kernel/acpi/boot.c Tue Apr 13 17:50:14 2004
@@ -62,6 +62,7 @@
acpi_interrupt_flags acpi_sci_flags __initdata;
int acpi_sci_override_gsi __initdata;
+int acpi_skip_timer_override __initdata;
#ifdef CONFIG_X86_LOCAL_APIC
static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
@@ -327,6 +328,12 @@
acpi_sci_ioapic_setup(intsrc->global_irq,
intsrc->flags.polarity, intsrc->flags.trigger);
return 0;
+ }
+
+ if (acpi_skip_timer_override &&
+ intsrc->bus_irq == 0 && intsrc->global_irq == 2) {
+ printk(PREFIX "BIOS IRQ0 pin2 override ignored.\n");
+ return 0;
}
mp_override_legacy_irq (
i must add that i've been using your patches for
the nForce chipset since they first appeared on
this mailing list, and while they've all helped
this box to last a bit longer between lockups
none of them cured it. Once the IO-APIC code was
compiled in and the Athlon idle powersaving
turned on it would inevitabley lock up in a day
or two.
This incorrect result from the mismatch between
your 2.6.3 patches and the current IO-APIC
code is the first time this box seems to be
free from lockup.
b
On Tue, Apr 13, 2004 at 05:18:24PM -0400, really bensoo_at_soo_dot_com wrote:
> My irq0 says XT-PIC. i'm not complaining, box's still
> very stable and since the last post i've burned a few
> DVDs on it while running the file share client and
> playing music.
>
> cat /proc/interrupts
>
> CPU0
> 0: 759809583 XT-PIC timer
> 1: 382279 IO-APIC-edge i8042
> 2: 0 XT-PIC cascade
> 8: 1 IO-APIC-edge rtc
> 9: 0 IO-APIC-level acpi
> 12: 6386931 IO-APIC-edge i8042
> 14: 2117474 IO-APIC-edge ide0
> 15: 5575006 IO-APIC-edge ide1
> 201: 6425958 IO-APIC-level EMU10K1
> 209: 167929203 IO-APIC-level eth0
> NMI: 0
> LOC: 759718637
> ERR: 0
> MIS: 0
On Wednesday 14 April 2004 11:02, Len Brown wrote:
> Re: IRQ0 XT-PIC timer issue
>
> Since the hardware is connected to APIC pin0, it is a BIOS bug
> that an ACPI interrupt source override from pin2 to IRQ0 exists.
>
> With this simple 2.6.5 patch you can specify "acpi_skip_timer_override"
> to ignore that bogus BIOS directive. The result is with your
> ACPI-enabled APIC-enabled kernel, you'll get IRQ0 IO-APIC-edge timer.
>
> Probably there is a more clever way to trigger this workaround
> automatcially instead of via boot parameter.
>
> cheers,
> -Len
Many thanks Len,
I cannot try it just yet (rebuilding car engine,-greasy mess
- hopefully get to it tonight).
Just would like to add that if we cannot get Maciej's 8259 ack patch
back into the distro then we need an if statement in the check_timer()
to turn off timer_ack for nforce2 or Christian might get his hi-load back
and certainly nmi_debug=1 won't work.
e.g. for 2.4.26-rc2 io_apic.c line 1613 or 2.6.5 line 2180
if (pin1 != -1) {
/*
* Ok, does IRQ0 through the IOAPIC work?
*/
+ if(acpi_skip_timer_override)
+ timer_ack=0;
unmask_IO_APIC_irq(0);
I might also grab the pci quirk source from the old nforce2 disconnect bit
patch and try it as a means of detection for automatic trigger. i.e. instead
of writing the pci config bit, set acpi_skip_timer_override instead - but then
if someone gets clock skew we would need the kern arg to turn it off -
unless the potential for clock skew is fixed.
The clock skew is an interesting one, I think the clock uses tsc if available
to interpolate between timer ints and if so should it not also be used to
validate the timer ints in case of noise? Apparently the clock speeds up not
slows down in those cases?
Regards
Ross.
>
> ===== Documentation/kernel-parameters.txt 1.44 vs edited =====
> --- 1.44/Documentation/kernel-parameters.txt Mon Mar 22 16:03:22 2004
> +++ edited/Documentation/kernel-parameters.txt Tue Apr 13 17:47:11 2004
> @@ -122,6 +122,10 @@
>
> acpi_serialize [HW,ACPI] force serialization of AML methods
>
> + acpi_skip_timer_override [HW,ACPI]]
> + Recognize IRQ0/pin2 Interrupt Source Override
> + and ignore it -- for broken nForce2 BIOS.
> +
> ad1816= [HW,OSS]
> Format: <io>,<irq>,<dma>,<dma2>
> See also Documentation/sound/oss/AD1816.
> ===== arch/i386/kernel/setup.c 1.115 vs edited =====
> --- 1.115/arch/i386/kernel/setup.c Fri Apr 2 07:21:43 2004
> +++ edited/arch/i386/kernel/setup.c Tue Apr 13 17:41:31 2004
> @@ -614,6 +614,12 @@
> else if (!memcmp(from, "acpi_sci=low", 12))
> acpi_sci_flags.polarity = 3;
>
> + else if (!memcmp(from, "acpi_skip_timer_override", 24)) {
> + extern int acpi_skip_timer_override;
> +
> + acpi_skip_timer_override = 1;
> + }
> +
> #ifdef CONFIG_X86_LOCAL_APIC
> /* disable IO-APIC */
> else if (!memcmp(from, "noapic", 6))
> ===== arch/i386/kernel/acpi/boot.c 1.57 vs edited =====
> --- 1.57/arch/i386/kernel/acpi/boot.c Tue Mar 30 17:05:19 2004
> +++ edited/arch/i386/kernel/acpi/boot.c Tue Apr 13 17:50:14 2004
> @@ -62,6 +62,7 @@
>
> acpi_interrupt_flags acpi_sci_flags __initdata;
> int acpi_sci_override_gsi __initdata;
> +int acpi_skip_timer_override __initdata;
>
> #ifdef CONFIG_X86_LOCAL_APIC
> static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
> @@ -327,6 +328,12 @@
> acpi_sci_ioapic_setup(intsrc->global_irq,
> intsrc->flags.polarity, intsrc->flags.trigger);
> return 0;
> + }
> +
> + if (acpi_skip_timer_override &&
> + intsrc->bus_irq == 0 && intsrc->global_irq == 2) {
> + printk(PREFIX "BIOS IRQ0 pin2 override ignored.\n");
> + return 0;
> }
>
> mp_override_legacy_irq (
>
>
>
Ross Dickson wrote:
> The clock skew is an interesting one, I think the clock uses tsc if available
> to interpolate between timer ints and if so should it not also be used to
> validate the timer ints in case of noise? Apparently the clock speeds up not
> slows down in those cases?
If the clock is speeding up due to spurious extra timer interrupts,
how about reading the timer chip to validate the interrupts? Doesn't
sound unreasonable to me :)
The problem with using the tsc is that the tsc frequency isn't
constant on some systems. If it slows down, it would make valid timer
interrupts appear to be spurious.
-- Jamie
On Wed, 14 Apr 2004, Ross Dickson wrote:
> e.g. for 2.4.26-rc2 io_apic.c line 1613 or 2.6.5 line 2180
> if (pin1 != -1) {
> /*
> * Ok, does IRQ0 through the IOAPIC work?
> */
> + if(acpi_skip_timer_override)
> + timer_ack=0;
> unmask_IO_APIC_irq(0);
>
> I might also grab the pci quirk source from the old nforce2 disconnect bit
> patch and try it as a means of detection for automatic trigger. i.e. instead
> of writing the pci config bit, set acpi_skip_timer_override instead - but then
> if someone gets clock skew we would need the kern arg to turn it off -
> unless the potential for clock skew is fixed.
Well, the question is whether the timer->INTIN0 routing is hardwired
inside the nforce2 chipset or is it external and thus board-dependent.
Any way to get this clarified by the chipset's manufacturer?
> The clock skew is an interesting one, I think the clock uses tsc if available
> to interpolate between timer ints and if so should it not also be used to
> validate the timer ints in case of noise? Apparently the clock speeds up not
> slows down in those cases?
With real hardware perhaps it can be debugged. The interaction between
the 8254, the 8259As and the APICs seems interesting in the chipset.
Perhaps the override to INTIN2 is to tell the timer is really unavailable
directly? I can't see a way to have an ACPI override that specifies an
ISA interrupt is not connected to the I/O APIC (unlike with the MPS).
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +
> Just would like to add that if we cannot get Maciej's 8259 ack patch
> back into the distro then we need an if statement in the check_timer()
> to turn off timer_ack for nforce2 or Christian might get his hi-load back
> and certainly nmi_debug=1 won't work.
>
> e.g. for 2.4.26-rc2 io_apic.c line 1613 or 2.6.5 line 2180
> if (pin1 != -1) {
> /*
> * Ok, does IRQ0 through the IOAPIC work?
> */
> + if(acpi_skip_timer_override)
> + timer_ack=0;
> unmask_IO_APIC_irq(0);
>
Well it seems that if at least on -mm this isn't necessary.
Len, I simply applied your patch against 2.6.5-mm5-1 and it just works, great
work! Having finally read http://bugme.osdl.org/show_bug.cgi?id=1203 I must
say that my nforce2-board (MSI K7N2-Delta) never ever hung, wether I had the
wrong timer setup or APIC on/off didn't harm any.
What I get now:
cat /proc/interrupts
CPU0
0: 25978569 IO-APIC-edge timer
1: 2102 IO-APIC-edge i8042
2: 0 XT-PIC cascade
7: 0 IO-APIC-edge parport0
8: 4 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 147962 IO-APIC-edge i8042
14: 405977 IO-APIC-edge ide0
15: 93 IO-APIC-edge ide1
16: 60192 IO-APIC-level ide2, saa7134[0]
17: 155 IO-APIC-level CMI8738
19: 2209002 IO-APIC-level nvidia
20: 7538158 IO-APIC-level ohci_hcd, eth0
21: 0 IO-APIC-level ehci_hcd
22: 78 IO-APIC-level ohci_hcd
NMI: 0
LOC: 25946237
ERR: 0
MIS: 0
timer setup:
ENABLING IO-APIC IRQs
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-2, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not
connected.
..TIMER: vector=0x31 pin1=0 pin2=-1
Using local APIC timer interrupts.
calibrating APIC timer ...
irq routing:
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 00170011
....... : max redirection entries: 0017
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 001 01 0 0 0 0 0 1 1 31
01 001 01 0 0 0 0 0 1 1 39
02 000 00 1 0 0 0 0 0 0 00
03 001 01 0 0 0 0 0 1 1 41
04 001 01 0 0 0 0 0 1 1 49
05 001 01 0 0 0 0 0 1 1 51
06 001 01 0 0 0 0 0 1 1 59
07 001 01 0 0 0 0 0 1 1 61
08 001 01 0 0 0 0 0 1 1 69
09 001 01 0 1 0 0 0 1 1 71
0a 001 01 0 0 0 0 0 1 1 79
0b 001 01 0 0 0 0 0 1 1 81
0c 001 01 0 0 0 0 0 1 1 89
0d 001 01 0 0 0 0 0 1 1 91
0e 001 01 0 0 0 0 0 1 1 99
0f 001 01 0 0 0 0 0 1 1 A1
10 001 01 1 1 0 0 0 1 1 D1
11 001 01 1 1 0 0 0 1 1 D9
12 001 01 1 1 0 0 0 1 1 E1
13 001 01 1 1 0 0 0 1 1 C9
14 001 01 1 1 0 0 0 1 1 B1
15 001 01 1 1 0 0 0 1 1 C1
16 001 01 1 1 0 0 0 1 1 B9
17 001 01 1 1 0 0 0 1 1 A9
IRQ to pin mappings:
IRQ0 -> 0:0
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ16 -> 0:16
IRQ17 -> 0:17
IRQ18 -> 0:18
IRQ19 -> 0:19
IRQ20 -> 0:20
IRQ21 -> 0:21
IRQ22 -> 0:22
IRQ23 -> 0:23
.................................... done.
This is simply great, any uncommon hi-load disappeared.
Will something like this get into mainline soon, maybe with automatic chipset
detection?
Once again, thanks, christian.
P.S.: I will test the patch against mainline 2.6.5 kernel now and post the
results later.
On Wed, 2004-04-14 at 15:57, Christian Kr?ner wrote:
> This is simply great, any uncommon hi-load disappeared.
> Will something like this get into mainline soon, maybe with automatic chipset
> detection?
I'm okay putting the bootparam and the workaround into the kernel,
for it is generic and we may find other platforms need it.
But I don't have a clean way to make it automatic.
This is a BIOS bug, so chipset ID will not always work.
We could list the BIOS in dmi_scan(), but I hate doing
that b/c then the vendor releases a new version of their
broken BIOS and the automatic workaround no longer works...
-Len
On Thursday 15 April 2004 10:17, Len Brown wrote:
> On Wed, 2004-04-14 at 15:57, Christian Kr?ner wrote:
>
> > This is simply great, any uncommon hi-load disappeared.
> > Will something like this get into mainline soon, maybe with automatic chipset
> > detection?
>
> I'm okay putting the bootparam and the workaround into the kernel,
> for it is generic and we may find other platforms need it.
Great, it sure is simpler and cleaner than my workaround. Thanks.
>
> But I don't have a clean way to make it automatic.
> This is a BIOS bug, so chipset ID will not always work.
True it is a bios thing but I have yet to see an nforce2 MOBO that is not
routed in this way. I am thinking it is internal to the chipset. I have seen
none route it into io-apic pin2.
> Maciej wrote
> Well, the question is whether the timer->INTIN0 routing is hardwired
> inside the nforce2 chipset or is it external and thus board-dependent.
> Any way to get this clarified by the chipset's manufacturer?
Nvidia is the first Company in my 20+ years of working life to totally not
respond to my attempts to communicate and I have had dealings with
numerous semiconductor firms and agents. I doubt that my email source
would be blocked and I have also tried their form mail. Do real people
work there? Maybe I have to phone or fax them from here in Australia?
-or place an order for 10,000 chips? Maybe we need a worldwide union of
Linux support staff to exhibit collective sales pressure. Enough ranting....
I am also cautioned by Maciej's comments indicating that maybe the
override appears in the nforce2 bios because there is no other way of
saying this is a feature that nvidia could not get to work properly?...
On the flip side in favour of this routing the clock skew may be restricted
to only to 2.6.1 kerns, I do not have it on my patched 2.4 kerns, it may
be fine on 2.6.5.
Here is a link to the old thread with the skew issues.
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-01/3129.html
Christian - would you please check if you get clock skew as described in
that thread?
>
> We could list the BIOS in dmi_scan(), but I hate doing
> that b/c then the vendor releases a new version of their
> broken BIOS and the automatic workaround no longer works...
>
> -Len
>
Unfortunately the hard lockups in the BUG report won't be fixed by this io-apic
work. I think Shuttle is the only manufacturer to ship a bios update which has
taken a board with existing lockup problems and fixed it. So far nobody has
posted how this magic was done?
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-01/5003.html
In the mean time I and others with lockups have had success with my C1 idle
patch but I have left it manual with kern arg for the same reason - no clean
way to automate it. Some nforce2 need it, others don't. Want me to finish
cleaning it up for possible inclusion?
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-04/1707.html
-Ross.
On Wednesday 14 April 2004 11:02, Len Brown wrote:
> Re: IRQ0 XT-PIC timer issue
>
> Since the hardware is connected to APIC pin0, it is a BIOS bug
> that an ACPI interrupt source override from pin2 to IRQ0 exists.
>
> With this simple 2.6.5 patch you can specify "acpi_skip_timer_override"
> to ignore that bogus BIOS directive. The result is with your
> ACPI-enabled APIC-enabled kernel, you'll get IRQ0 IO-APIC-edge timer.
>
> Probably there is a more clever way to trigger this workaround
> automatcially instead of via boot parameter.
>
> cheers,
> -Len
>
> ===== Documentation/kernel-parameters.txt 1.44 vs edited =====
> --- 1.44/Documentation/kernel-parameters.txt Mon Mar 22 16:03:22 2004
> +++ edited/Documentation/kernel-parameters.txt Tue Apr 13 17:47:11 2004
> @@ -122,6 +122,10 @@
>
> acpi_serialize [HW,ACPI] force serialization of AML methods
>
> + acpi_skip_timer_override [HW,ACPI]]
> + Recognize IRQ0/pin2 Interrupt Source Override
> + and ignore it -- for broken nForce2 BIOS.
> +
> ad1816= [HW,OSS]
> Format: <io>,<irq>,<dma>,<dma2>
> See also Documentation/sound/oss/AD1816.
> ===== arch/i386/kernel/setup.c 1.115 vs edited =====
> --- 1.115/arch/i386/kernel/setup.c Fri Apr 2 07:21:43 2004
> +++ edited/arch/i386/kernel/setup.c Tue Apr 13 17:41:31 2004
> @@ -614,6 +614,12 @@
> else if (!memcmp(from, "acpi_sci=low", 12))
> acpi_sci_flags.polarity = 3;
>
> + else if (!memcmp(from, "acpi_skip_timer_override", 24)) {
> + extern int acpi_skip_timer_override;
> +
> + acpi_skip_timer_override = 1;
> + }
> +
> #ifdef CONFIG_X86_LOCAL_APIC
> /* disable IO-APIC */
> else if (!memcmp(from, "noapic", 6))
> ===== arch/i386/kernel/acpi/boot.c 1.57 vs edited =====
> --- 1.57/arch/i386/kernel/acpi/boot.c Tue Mar 30 17:05:19 2004
> +++ edited/arch/i386/kernel/acpi/boot.c Tue Apr 13 17:50:14 2004
> @@ -62,6 +62,7 @@
>
> acpi_interrupt_flags acpi_sci_flags __initdata;
> int acpi_sci_override_gsi __initdata;
> +int acpi_skip_timer_override __initdata;
>
> #ifdef CONFIG_X86_LOCAL_APIC
> static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
> @@ -327,6 +328,12 @@
> acpi_sci_ioapic_setup(intsrc->global_irq,
> intsrc->flags.polarity, intsrc->flags.trigger);
> return 0;
> + }
> +
> + if (acpi_skip_timer_override &&
> + intsrc->bus_irq == 0 && intsrc->global_irq == 2) {
> + printk(PREFIX "BIOS IRQ0 pin2 override ignored.\n");
> + return 0;
> }
>
> mp_override_legacy_irq (
>
>
>
Hi Len, I have updated my nforce2 patches for 2.6.5 to work with your patch.
I have tested them only on one nforce2 board Epox 8Rga+ but as little has
changed in core functionality from past releases I think all will be OK....
Hopefully no clock skew. I saw none on my system but thats no guarantee.
I tried your above patch with the timer_ack on as is default in 2.6.5 and
nmi_watchdog=1 failed as expected. I still think Maciej's 8259 ack patch
is more complete solution to the ack issue but this one gets watchdog going for
nforce2. I cannot see anyone using your above patch without an integrated
apic and tsc so I cannot see a problem triggering it off your kern arg.
The second patch is the C1halt update I suggested in another posting.
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-04/1707.html
Both patches in attached tarball.
Regards
Ross.
Here is my revised patch for use with "acpi_skip_timer_override" to get
nmi_debug=1 working with the above patch from Len Brown.
--- linux-2.6.5/arch/i386/kernel/io_apic.c.orig 2004-04-16 00:20:54.000000000 +1000
+++ linux-2.6.5/arch/i386/kernel/io_apic.c 2004-04-15 20:24:18.000000000 +1000
@@ -2179,10 +2179,13 @@ static inline void check_timer(void)
if (pin1 != -1) {
/*
* Ok, does IRQ0 through the IOAPIC work?
*/
+ extern int acpi_skip_timer_override;
+ if(acpi_skip_timer_override)
+ timer_ack=0;
unmask_IO_APIC_irq(0);
if (timer_irq_works()) {
if (nmi_watchdog == NMI_IO_APIC) {
disable_8259A_irq(0);
setup_nmi();
Here is my revised patch for "idle=C1halt" to prevent nforce2 hard lockups.
Now more robust, better tested with apm config, and without x86 apic config,
and nolapic, noapic, acpi=off. All gave my usual 38C CPU temp when idle and
no hard lockups. Temp measured by leaving machine idle on run level 3 for
several minutes and then reading bios temp on reboot.
--- linux-2.6.5/arch/i386/kernel/process.c.orig 2004-04-04 13:36:10.000000000 +1000
+++ linux-2.6.5/arch/i386/kernel/process.c 2004-04-15 20:41:13.000000000 +1000
@@ -47,10 +47,13 @@
#include <asm/irq.h>
#include <asm/desc.h>
#ifdef CONFIG_MATH_EMULATION
#include <asm/math_emu.h>
#endif
+#if defined(CONFIG_X86_UP_APIC)
+#include <asm/apic.h>
+#endif
#include <linux/irq.h>
#include <linux/err.h>
asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
@@ -98,10 +101,34 @@ void default_idle(void)
local_irq_enable();
}
}
/*
+ * We use this to avoid nforce2 lockups
+ * Reduces frequency of C1 disconnects
+ */
+static void c1halt_idle(void)
+{
+ if (!hlt_counter && current_cpu_data.hlt_works_ok) {
+ local_irq_disable();
+#if defined(CONFIG_X86_UP_APIC)
+ /* only hlt disconnect if more than 1.6% of apic interval remains */
+ extern int enable_local_apic;
+ if(!need_resched() && (enable_local_apic < 0 ||
+ (apic_read(APIC_TMCCT) > (apic_read(APIC_TMICT)>>6)))) {
+#else
+ /* just adds a little delay to assist in back to back disconnects */
+ if(!need_resched()) {
+#endif
+ ndelay(600); /* helps nforce2 but adds 0.6us hard int latency */
+ safe_halt(); /* nothing better to do until we wake up */
+ } else {
+ local_irq_enable();
+ }
+ }
+}
+/*
* On SMP it's slightly faster (but much more power-consuming!)
* to poll the ->work.need_resched flag instead of waiting for the
* cross-CPU IPI to arrive. Use this option with caution.
*/
static void poll_idle (void)
@@ -135,20 +162,18 @@ static void poll_idle (void)
* The idle thread. There's no useful work to be
* done, so just try to conserve power and have a
* low exit latency (ie sit in a loop waiting for
* somebody to say that they'd like to reschedule)
*/
+static void (*idle)(void);
void cpu_idle (void)
{
/* endless idle loop with no priority at all */
while (1) {
while (!need_resched()) {
- void (*idle)(void) = pm_idle;
-
if (!idle)
- idle = default_idle;
-
+ idle = pm_idle ? pm_idle : default_idle;
irq_stat[smp_processor_id()].idle_timestamp = jiffies;
idle();
}
schedule();
}
@@ -199,16 +224,18 @@ void __init select_idle_routine(const st
static int __init idle_setup (char *str)
{
if (!strncmp(str, "poll", 4)) {
printk("using polling idle threads.\n");
- pm_idle = poll_idle;
+ idle = poll_idle;
} else if (!strncmp(str, "halt", 4)) {
printk("using halt in idle threads.\n");
- pm_idle = default_idle;
+ idle = default_idle;
+ } else if (!strncmp(str, "C1halt", 6)) {
+ printk("using C1 halt disconnect friendly idle threads.\n");
+ idle = c1halt_idle;
}
-
return 1;
}
__setup("idle=", idle_setup);
On Tue, 13 Apr 2004, Len Brown wrote:
> Re: IRQ0 XT-PIC timer issue
>
> Since the hardware is connected to APIC pin0, it is a BIOS bug
> that an ACPI interrupt source override from pin2 to IRQ0 exists.
>
> With this simple 2.6.5 patch you can specify "acpi_skip_timer_override"
> to ignore that bogus BIOS directive. The result is with your
> ACPI-enabled APIC-enabled kernel, you'll get IRQ0 IO-APIC-edge timer.
>
> Probably there is a more clever way to trigger this workaround
> automatcially instead of via boot parameter.
Nice, this is the problem which broke Andrew's and the systems i tested
my adaptation of Natalie's mp_override_legacy_irq() change. Whacking out
previous mp_irq entries would have worked if the BIOS had not forced the
pin2 override.
On Thursday 15 April 2004 10:17, Len Brown wrote:
> I'm okay putting the bootparam and the workaround into the kernel,
> for it is generic and we may find other platforms need it.
Thats more than what I could have expected, thanks.
Ross, I tested my kernel with nmi_watchdog=1 and nmi_watchdog=2 getting only 2
to work.
output: nmi_watchdog=1
activating NMI Watchdog ... done.
testing NMI watchdog ... CPU#0: NMI appears to be stuck!
output: nmi_watchdog=2
testing NMI watchdog ... OK.
This is on 2.6.5-mm5-1.
Concerning the timer, well I tested it against my radio-controlled clock,
setting it with ntpdate first and letting the system run (with ntpd off) and
my system is kinda faster than my radio-clock. After about one hour my system
was off by +14s compared to the radio-clock. I don't know if that is pretty
shitty or simply normal for these bad pc-clocks...
I'm now compiling 2.6.6-rc1 with the nmi_watchdog=1 workaround. One question
about the C1idle-patch, does this add a new feature or is it just a
workaround for locked up nforce2-systems (since I never experienced lockups
on my system, I wouldn't need it then)?
thanks for now, christian.
On Wed, 2004-04-14 at 06:37, Maciej W. Rozycki wrote:
> On Wed, 14 Apr 2004, Ross Dickson wrote:
> > The clock skew is an interesting one, I think the clock uses tsc if available
> > to interpolate between timer ints and if so should it not also be used to
> > validate the timer ints in case of noise? Apparently the clock speeds up not
> > slows down in those cases?
>
> With real hardware perhaps it can be debugged. The interaction between
> the 8254, the 8259As and the APICs seems interesting in the chipset.
> Perhaps the override to INTIN2 is to tell the timer is really unavailable
> directly?
That would be way too subtle for a BIOS writer;-)
> I can't see a way to have an ACPI override that specifies an
> ISA interrupt is not connected to the I/O APIC (unlike with the MPS).
I agree. And I think the existence of this /proc/interrupts
entry on an ACPI-enabled system should probably go away.
CPU0 CPU1
2: 0 0 XT-PIC cascade
ACPI also doesn't support sharing more than 1 pin on an IRQ.
So if you see a construct like this below, it is also a bug:
IRQ to pin mappings:
IRQ23 -> 0:23-> 0:7
cheers,
-Len
On Thu, 2004-04-15 at 11:10, Ross Dickson wrote:
> On Wednesday 14 April 2004 11:02, Len Brown wrote:
> > Re: IRQ0 XT-PIC timer issue
> >
> > Since the hardware is connected to APIC pin0, it is a BIOS bug
> > that an ACPI interrupt source override from pin2 to IRQ0 exists.
> >
> > With this simple 2.6.5 patch you can specify "acpi_skip_timer_override"
> > to ignore that bogus BIOS directive. The result is with your
> > ACPI-enabled APIC-enabled kernel, you'll get IRQ0 IO-APIC-edge timer.
> >
> > Probably there is a more clever way to trigger this workaround
> > automatcially instead of via boot parameter.
> Hi Len, I have updated my nforce2 patches for 2.6.5 to work with your patch.
> I have tested them only on one nforce2 board Epox 8Rga+ but as little has
> changed in core functionality from past releases I think all will be OK....
> Hopefully no clock skew. I saw none on my system but thats no guarantee.
While I don't want to get into the business of maintaining
a dmi_scan entry for every system with this issue, I think
it might be a good idea to add a couple of example entries
for high volume systems for which there is no BIOS fix available.
Got any opinions on which system to use as the example?
I'll need the output from dmidecode for them.
> I tried your above patch with the timer_ack on as is default in 2.6.5 and
> nmi_watchdog=1 failed as expected. I still think Maciej's 8259 ack patch
> is more complete solution to the ack issue but this one gets watchdog going for
> nforce2. I cannot see anyone using your above patch without an integrated
> apic and tsc so I cannot see a problem triggering it off your kern arg.
"acpi_skip_timer_override" is specific to IOAPIC mode,
since that is the only place that the bogus interrupt
source override is used.
I'm not clued-in on the nmi_watchdog and 8259 ack issues.
My focus is primarily the ACPI issues involved in getting
these systems up and running in IOAPIC mode.
> The second patch is the C1halt update I suggested in another posting.
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-04/1707.html
Clearly this hang issue is more important than the timer issue.
I'm impressed that you built such a sophisticated patch without
any support from the vendors. But it would be a "really good thing"
if we got some input from the vendors before considering putting
a workaround into the upstream kernel -- for they may have
guidance which would either simplify it, or make it unnecessary.
Perhaps Allen Martin at nVidia can comment?
-Len
On Thu, 2004-04-15 at 22:17, Len Brown wrote:
> On Thu, 2004-04-15 at 11:10, Ross Dickson wrote:
> > On Wednesday 14 April 2004 11:02, Len Brown wrote:
> > > Re: IRQ0 XT-PIC timer issue
> > >
> > > Since the hardware is connected to APIC pin0, it is a BIOS bug
> > > that an ACPI interrupt source override from pin2 to IRQ0 exists.
> > >
> > > With this simple 2.6.5 patch you can specify "acpi_skip_timer_override"
> > > to ignore that bogus BIOS directive. The result is with your
> > > ACPI-enabled APIC-enabled kernel, you'll get IRQ0 IO-APIC-edge timer.
> > >
> > > Probably there is a more clever way to trigger this workaround
> > > automatcially instead of via boot parameter.
>
> > Hi Len, I have updated my nforce2 patches for 2.6.5 to work with your patch.
> > I have tested them only on one nforce2 board Epox 8Rga+ but as little has
> > changed in core functionality from past releases I think all will be OK....
> > Hopefully no clock skew. I saw none on my system but thats no guarantee.
>
> While I don't want to get into the business of maintaining
> a dmi_scan entry for every system with this issue, I think
> it might be a good idea to add a couple of example entries
> for high volume systems for which there is no BIOS fix available.
>
> Got any opinions on which system to use as the example?
> I'll need the output from dmidecode for them.
I have an A7N8X Deluxe v2 BIOS v1007 that I can give u whatever numbers
u need. IOAPIC and APIC are on.
Its running gentoo-dev-sources 2.6.3-r1 plus the idlec1halt patch and
nmi patch from Ross. I guess the kernel doesnt matter too much if its
just board details.
More details of my 2.6.1 days are at
http://atlas.et.tudelft.nl/verwei90/nforce2/
Craig
> > I tried your above patch with the timer_ack on as is default in 2.6.5 and
> > nmi_watchdog=1 failed as expected. I still think Maciej's 8259 ack patch
> > is more complete solution to the ack issue but this one gets watchdog going for
> > nforce2. I cannot see anyone using your above patch without an integrated
> > apic and tsc so I cannot see a problem triggering it off your kern arg.
>
> "acpi_skip_timer_override" is specific to IOAPIC mode,
> since that is the only place that the bogus interrupt
> source override is used.
>
> I'm not clued-in on the nmi_watchdog and 8259 ack issues.
> My focus is primarily the ACPI issues involved in getting
> these systems up and running in IOAPIC mode.
>
> > The second patch is the C1halt update I suggested in another posting.
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-04/1707.html
>
> Clearly this hang issue is more important than the timer issue.
> I'm impressed that you built such a sophisticated patch without
> any support from the vendors. But it would be a "really good thing"
> if we got some input from the vendors before considering putting
> a workaround into the upstream kernel -- for they may have
> guidance which would either simplify it, or make it unnecessary.
> Perhaps Allen Martin at nVidia can comment?
>
> -Len
>
>
>
On 15 Apr 2004, Len Brown wrote:
> On Thu, 2004-04-15 at 11:10, Ross Dickson wrote:
> > On Wednesday 14 April 2004 11:02, Len Brown wrote:
> > > Re: IRQ0 XT-PIC timer issue
> > >
> > > Since the hardware is connected to APIC pin0, it is a BIOS bug
> > > that an ACPI interrupt source override from pin2 to IRQ0 exists.
> > >
> > > With this simple 2.6.5 patch you can specify "acpi_skip_timer_override"
> > > to ignore that bogus BIOS directive. The result is with your
> > > ACPI-enabled APIC-enabled kernel, you'll get IRQ0 IO-APIC-edge timer.
> > >
> > > Probably there is a more clever way to trigger this workaround
> > > automatcially instead of via boot parameter.
>
> > Hi Len, I have updated my nforce2 patches for 2.6.5 to work with your patch.
> > I have tested them only on one nforce2 board Epox 8Rga+ but as little has
> > changed in core functionality from past releases I think all will be OK....
> > Hopefully no clock skew. I saw none on my system but thats no guarantee.
>
> While I don't want to get into the business of maintaining
> a dmi_scan entry for every system with this issue, I think
> it might be a good idea to add a couple of example entries
> for high volume systems for which there is no BIOS fix available.
>
> Got any opinions on which system to use as the example?
> I'll need the output from dmidecode for them.
>
> > I tried your above patch with the timer_ack on as is default in 2.6.5 and
> > nmi_watchdog=1 failed as expected. I still think Maciej's 8259 ack patch
> > is more complete solution to the ack issue but this one gets watchdog going for
> > nforce2. I cannot see anyone using your above patch without an integrated
> > apic and tsc so I cannot see a problem triggering it off your kern arg.
>
> "acpi_skip_timer_override" is specific to IOAPIC mode,
> since that is the only place that the bogus interrupt
> source override is used.
>
> I'm not clued-in on the nmi_watchdog and 8259 ack issues.
> My focus is primarily the ACPI issues involved in getting
> these systems up and running in IOAPIC mode.
>
> > The second patch is the C1halt update I suggested in another posting.
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-04/1707.html
>
> Clearly this hang issue is more important than the timer issue.
> I'm impressed that you built such a sophisticated patch without
> any support from the vendors. But it would be a "really good thing"
> if we got some input from the vendors before considering putting
> a workaround into the upstream kernel -- for they may have
> guidance which would either simplify it, or make it unnecessary.
> Perhaps Allen Martin at nVidia can comment?
Yes, this sounds like a marvellous idea, since every board except some
Shuttle board after a BIOS update does not suffer from these hangs.
Unfortunately, Allen Martin already commented on this once:
"Likely the root of the problem has to do with the way the Linux kernel is
using the ACPI methods to setup the interrupts which is different from win
9x/2k/XP. I can help track this down, unfortunately so far I've been
unable to reproduce the hangs on any of the boards I have."
-Allen
http://lkml.org/lkml/2003/12/5/156
Maybe he can find useful hints on how to crash his box with an nforce2
chipset here:
http://atlas.et.tudelft.nl/verwei90/nforce2/
Basically just enable APIC in the kernel and start pushing the HDD or
anything related to I/O really. The crashes come more regularely in 2.6
kernels because of the increased Hz value.
Regards,
Arjen
On Thu, 2004-04-15 at 17:04, Craig Bradney wrote:
> > While I don't want to get into the business of maintaining
> > a dmi_scan entry for every system with this issue, I think
> > it might be a good idea to add a couple of example entries
> > for high volume systems for which there is no BIOS fix available.
> >
> > Got any opinions on which system to use as the example?
> > I'll need the output from dmidecode for them.
>
> I have an A7N8X Deluxe v2 BIOS v1007 that I can give u whatever numbers
> u need. IOAPIC and APIC are on.
Please send me the output from dmidecode, available in /usr/sbin/, or
here:
http://www.nongnu.org/dmidecode/
or
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
thanks,
-Len
On Wed, 2004-04-21 at 22:22, Len Brown wrote:
> On Thu, 2004-04-15 at 17:04, Craig Bradney wrote:
> > > Got any opinions on which system to use as the example?
> > > I'll need the output from dmidecode for them.
> >
> > I have an A7N8X Deluxe v2 BIOS v1007 that I can give u whatever numbers
> > u need. IOAPIC and APIC are on.
>
> Please send me the output from dmidecode, available in /usr/sbin/, or
I sent a off ml dmidecode from my ASUS A7N8X-X 2.xx (BIOS: 1007)
Motherboard.
I have also TRIED to send some complaints to ASUS, but thats harder than
you might expect... =P
--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net
On Wed, 2004-04-21 at 22:22, Len Brown wrote:
> On Thu, 2004-04-15 at 17:04, Craig Bradney wrote:
>
> > > While I don't want to get into the business of maintaining
> > > a dmi_scan entry for every system with this issue, I think
> > > it might be a good idea to add a couple of example entries
> > > for high volume systems for which there is no BIOS fix available.
> > >
> > > Got any opinions on which system to use as the example?
> > > I'll need the output from dmidecode for them.
> >
> > I have an A7N8X Deluxe v2 BIOS v1007 that I can give u whatever numbers
> > u need. IOAPIC and APIC are on.
>
> Please send me the output from dmidecode, available in /usr/sbin/, or
> here:
> http://www.nongnu.org/dmidecode/
> or
> http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
Enjoy :) & thanks
Craig
# dmidecode 2.3
SMBIOS 2.2 present.
37 structures occupying 981 bytes.
Table at 0x000F0800.
Handle 0x0000
DMI type 0, 19 bytes.
BIOS Information
Vendor: Phoenix Technologies, LTD
Version: 6.00 PG
Release Date: 03/24/2004
Address: 0xE0000
Runtime Size: 128 kB
ROM Size: 512 kB
Characteristics:
ISA is supported
PCI is supported
PNP is supported
APM is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/360 KB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 KB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
AGP is supported
LS-120 boot is supported
ATAPI Zip drive boot is supported
Handle 0x0001
DMI type 1, 25 bytes.
System Information
Manufacturer:
Product Name:
Version:
Serial Number:
UUID: 00000000-0000-0000-0000-00508DF1FBE3
Wake-up Type: Power Switch
Handle 0x0002
DMI type 2, 8 bytes.
Base Board Information
Manufacturer: http://www.abit.com.tw/
Product Name: NF7-S/NF7,NF7-V (nVidia-nForce2)
Version: 2.X,1.0
Serial Number:
Handle 0x0003
DMI type 3, 13 bytes.
Chassis Information
Manufacturer:
Type: Desktop
Lock: Not Present
Version:
Serial Number:
Asset Tag:
Boot-up State: Unknown
Power Supply State: Unknown
Thermal State: Unknown
Security Status: Unknown
Handle 0x0004
DMI type 4, 32 bytes.
Processor Information
Socket Designation: Socket A
Type: Central Processor
Family: Athlon
Manufacturer: AMD
ID: 81 06 00 00 FF FB 83 03
Signature: Type 0, Family 6, Model 8, Stepping 1
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
MMX (MMX technology supported)
FXSR (Fast floating-point save and restore)
SSE (Streaming SIMD extensions)
Version: AMD Athlon(tm) XP
Voltage: 1.6 V
External Clock: 200 MHz
Max Speed: 3000 MHz
Current Speed: 2100 MHz
Status: Populated, Enabled
Upgrade: ZIF Socket
L1 Cache Handle: 0x0009
L2 Cache Handle: 0x000A
L3 Cache Handle: No L3 Cache
Handle 0x0005
DMI type 5, 22 bytes.
Memory Controller Information
Error Detecting Method: 8-bit Parity
Error Correcting Capabilities:
None
Supported Interleave: One-way Interleave
Current Interleave: One-way Interleave
Maximum Memory Module Size: 1024 MB
Maximum Total Memory Size: 3072 MB
Supported Speeds:
Other
Supported Memory Types:
Other
DIMM
SDRAM
Memory Module Voltage: 2.9 V
Associated Memory Slots: 3
0x0006
0x0007
0x0008
Enabled Error Correcting Capabilities: None
Handle 0x0006
DMI type 6, 12 bytes.
Memory Module Information
Socket Designation: A0
Bank Connections: 0 1
Current Speed: 10 ns
Type: Other DIMM SDRAM
Installed Size: 512 MB (Double-bank Connection)
Enabled Size: 512 MB (Double-bank Connection)
Error Status: OK
Handle 0x0007
DMI type 6, 12 bytes.
Memory Module Information
Socket Designation: A1
Bank Connections: None
Current Speed: 10 ns
Type: Other DIMM SDRAM
Installed Size: Not Installed (Single-bank Connection)
Enabled Size: Not Installed (Single-bank Connection)
Error Status: OK
Handle 0x0008
DMI type 6, 12 bytes.
Memory Module Information
Socket Designation: A2
Bank Connections: 4 5
Current Speed: 10 ns
Type: Other DIMM SDRAM
Installed Size: 512 MB (Double-bank Connection)
Enabled Size: 512 MB (Double-bank Connection)
Error Status: OK
Handle 0x0009
DMI type 7, 19 bytes.
Cache Information
Socket Designation: Internal Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 128 KB
Maximum Size: 128 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown
Handle 0x000A
DMI type 7, 19 bytes.
Cache Information
Socket Designation: External Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: External
Installed Size: 256 KB
Maximum Size: 256 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown
Handle 0x000B
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: PRIMARY IDE
Internal Connector Type: On Board IDE
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x000C
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: SECONDARY IDE
Internal Connector Type: On Board IDE
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x000D
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: FDD
Internal Connector Type: On Board Floppy
External Reference Designator: Not Specified
External Connector Type: None
Port Type: 8251 FIFO Compatible
Handle 0x000E
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: COM1
Internal Connector Type: 9 Pin Dual Inline (pin 10 cut)
External Reference Designator:
External Connector Type: DB-9 male
Port Type: Serial Port 16450 Compatible
Handle 0x000F
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: COM2
Internal Connector Type: 9 Pin Dual Inline (pin 10 cut)
External Reference Designator:
External Connector Type: DB-9 male
Port Type: Serial Port 16450 Compatible
Handle 0x0010
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: LPT1
Internal Connector Type: DB-25 female
External Reference Designator:
External Connector Type: DB-25 female
Port Type: Parallel Port ECP/EPP
Handle 0x0011
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: Keyboard
Internal Connector Type: PS/2
External Reference Designator:
External Connector Type: PS/2
Port Type: Keyboard Port
Handle 0x0012
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: PS/2 Mouse
Internal Connector Type: PS/2
External Reference Designator:
External Connector Type: PS/2
Port Type: Mouse Port
Handle 0x0013
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: Not Specified
Internal Connector Type: None
External Reference Designator: USB
External Connector Type: Other
Port Type: USB
Handle 0x0014
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI0
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 1
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0015
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI1
Type: 32-bit PCI
Current Usage: Available
Length: Long
ID: 2
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0016
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI2
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 3
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0017
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI3
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 4
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0018
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI4
Type: 32-bit PCI
Current Usage: Available
Length: Long
ID: 5
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0019
DMI type 9, 13 bytes.
System Slot Information
Designation: AGP
Type: 32-bit AGP
Current Usage: Available
Length: Long
ID: 240
Characteristics:
5.0 V is provided
3.3 V is provided
Handle 0x001A
DMI type 13, 22 bytes.
BIOS Language Information
Installable Languages: 3
n|US|iso8859-1
n|US|iso8859-1
r|CA|iso8859-1
Currently Installed Language: n|US|iso8859-1
Handle 0x001B
DMI type 16, 15 bytes.
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 1536 MB
Error Information Handle: Not Provided
Number Of Devices: 3
Handle 0x001C
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: 512 MB
Form Factor: DIMM
Set: None
Locator: A0
Bank Locator: Bank0/1
Type: Unknown
Type Detail: None
Handle 0x001D
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: DIMM
Set: None
Locator: A1
Bank Locator: Bank2/3
Type: Unknown
Type Detail: None
Handle 0x001E
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: 512 MB
Form Factor: DIMM
Set: None
Locator: A2
Bank Locator: Bank4/5
Type: Unknown
Type Detail: None
Handle 0x001F
DMI type 19, 15 bytes.
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0003FFFFFFF
Range Size: 1 GB
Physical Array Handle: 0x001B
Partition Width: 0
Handle 0x0020
DMI type 20, 19 bytes.
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0001FFFFFFF
Range Size: 512 MB
Physical Device Handle: 0x001C
Memory Array Mapped Address Handle: 0x001F
Partition Row Position: 1
Handle 0x0021
DMI type 20, 19 bytes.
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x000000003FF
Range Size: 1 kB
Physical Device Handle: 0x001D
Memory Array Mapped Address Handle: 0x001F
Partition Row Position: 1
Handle 0x0022
DMI type 20, 19 bytes.
Memory Device Mapped Address
Starting Address: 0x00020000000
Ending Address: 0x0003FFFFFFF
Range Size: 512 MB
Physical Device Handle: 0x001E
Memory Array Mapped Address Handle: 0x001F
Partition Row Position: 1
Handle 0x0023
DMI type 32, 11 bytes.
System Boot Information
Status: No errors detected
Handle 0x0024
DMI type 127, 4 bytes.
End Of Table
> Please send me the output from dmidecode, available in /usr/sbin/, or
> > here:
> > http://www.nongnu.org/dmidecode/
> > or
> > http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
On Wed, 2004-04-21 at 17:28, Prakash K. Cheemplavam wrote:
> this is the output for Abit NF7-S Rev20 using bios d23. I have NOT
> activated APIC for this. Is it needed?
Yes, you need to enable ACPI and IOAPIC. The goal of this patch
is to address the XT-PIC timer issue in IOAPIC mode.
Here's the latest (vs 2.6.5).
I've got 1 Abit, 2 Asus, and 1 Shuttle DMI entry. Let me know if the
product names (1st line of dmidecode entry) are correct,
these are not from DMI, but are supposed to be human-readable titles.
I'm interested only in the latest BIOS -- if it is still broken.
The assumption is that if a fixed BIOS is available, the users
should upgrade.
thanks,
-Len
ps. latest BIOS on my shuttle has a C1 disconnect enable setting,
(curiously, it is disabled by default) so I'll try to reproduce the hang
on it...
===== Documentation/kernel-parameters.txt 1.44 vs edited =====
--- 1.44/Documentation/kernel-parameters.txt Mon Mar 22 16:03:22 2004
+++ edited/Documentation/kernel-parameters.txt Wed Apr 21 15:28:12 2004
@@ -122,6 +122,10 @@
acpi_serialize [HW,ACPI] force serialization of AML methods
+ acpi_skip_timer_override [HW,ACPI]
+ Recognize and ignore IRQ0/pin2 Interrupt Override.
+ For broken nForce2 BIOS resulting in XT-PIC timer.
+
ad1816= [HW,OSS]
Format: <io>,<irq>,<dma>,<dma2>
See also Documentation/sound/oss/AD1816.
===== arch/i386/kernel/dmi_scan.c 1.57 vs edited =====
--- 1.57/arch/i386/kernel/dmi_scan.c Fri Apr 16 22:03:06 2004
+++ edited/arch/i386/kernel/dmi_scan.c Wed Apr 21 18:29:35 2004
@@ -540,6 +540,19 @@
#endif
/*
+ * early nForce2 reference BIOS shipped with a
+ * bogus ACPI IRQ0 -> pin2 interrupt override -- ignore it
+ */
+static __init int ignore_timer_override(struct dmi_blacklist *d)
+{
+ extern int acpi_skip_timer_override;
+ printk(KERN_NOTICE "%s detected: BIOS IRQ0 pin2 override"
+ " will be ignored\n", d->ident);
+
+ acpi_skip_timer_override = 1;
+ return 0;
+}
+/*
* Process the DMI blacklists
*/
@@ -944,6 +957,37 @@
MATCH(DMI_BOARD_VENDOR, "IBM"),
MATCH(DMI_PRODUCT_NAME, "eserver xSeries 440"),
NO_MATCH, NO_MATCH }},
+
+/*
+ * Systems with nForce2 BIOS timer override bug
+ * add Albatron KM18G Pro
+ * add DFI NFII 400-AL
+ * add Epox 8RGA+
+ * add Shuttle AN35N
+ */
+ { ignore_timer_override, "Abit NF7-S v2", {
+ MATCH(DMI_BOARD_VENDOR, "http://www.abit.com.tw/"),
+ MATCH(DMI_BOARD_NAME, "NF7-S/NF7,NF7-V (nVidia-nForce2)"),
+ MATCH(DMI_BIOS_VERSION, "6.00 PG"),
+ MATCH(DMI_BIOS_DATE, "03/24/2004") }},
+
+ { ignore_timer_override, "Asus A7N8X v2", {
+ MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
+ MATCH(DMI_BOARD_NAME, "A7N8X2.0"),
+ MATCH(DMI_BIOS_VERSION, "ASUS A7N8X2.0 Deluxe ACPI BIOS Rev 1007"),
+ MATCH(DMI_BIOS_DATE, "10/06/2003") }},
+
+ { ignore_timer_override, "Asus A7N8X-X", {
+ MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
+ MATCH(DMI_BOARD_NAME, "A7N8X-X"),
+ MATCH(DMI_BIOS_VERSION, "ASUS A7N8X-X ACPI BIOS Rev 1007"),
+ MATCH(DMI_BIOS_DATE, "10/07/2003") }},
+
+ { ignore_timer_override, "Shuttle SN41G2", {
+ MATCH(DMI_BOARD_VENDOR, "Shuttle Inc"),
+ MATCH(DMI_BOARD_NAME, "FN41"),
+ MATCH(DMI_BIOS_VERSION, "6.00 PG"),
+ MATCH(DMI_BIOS_DATE, "01/14/2004") }},
#endif // CONFIG_ACPI_BOOT
#ifdef CONFIG_ACPI_PCI
===== arch/i386/kernel/setup.c 1.115 vs edited =====
--- 1.115/arch/i386/kernel/setup.c Fri Apr 2 07:21:43 2004
+++ edited/arch/i386/kernel/setup.c Wed Apr 21 15:28:12 2004
@@ -614,6 +614,9 @@
else if (!memcmp(from, "acpi_sci=low", 12))
acpi_sci_flags.polarity = 3;
+ else if (!memcmp(from, "acpi_skip_timer_override", 24))
+ acpi_skip_timer_override = 1;
+
#ifdef CONFIG_X86_LOCAL_APIC
/* disable IO-APIC */
else if (!memcmp(from, "noapic", 6))
===== arch/i386/kernel/acpi/boot.c 1.58 vs edited =====
--- 1.58/arch/i386/kernel/acpi/boot.c Tue Apr 20 20:54:03 2004
+++ edited/arch/i386/kernel/acpi/boot.c Wed Apr 21 15:28:13 2004
@@ -62,6 +62,7 @@
acpi_interrupt_flags acpi_sci_flags __initdata;
int acpi_sci_override_gsi __initdata;
+int acpi_skip_timer_override __initdata;
#ifdef CONFIG_X86_LOCAL_APIC
static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
@@ -327,6 +328,12 @@
acpi_sci_ioapic_setup(intsrc->global_irq,
intsrc->flags.polarity, intsrc->flags.trigger);
return 0;
+ }
+
+ if (acpi_skip_timer_override &&
+ intsrc->bus_irq == 0 && intsrc->global_irq == 2) {
+ printk(PREFIX "BIOS IRQ0 pin2 override ignored.\n");
+ return 0;
}
mp_override_legacy_irq (
===== include/asm-i386/acpi.h 1.18 vs edited =====
--- 1.18/include/asm-i386/acpi.h Tue Mar 30 17:05:19 2004
+++ edited/include/asm-i386/acpi.h Wed Apr 21 15:28:14 2004
@@ -118,6 +118,7 @@
#ifdef CONFIG_X86_IO_APIC
extern int skip_ioapic_setup;
extern int acpi_irq_to_vector(u32 irq); /* deprecated in favor of
acpi_gsi_to_irq */
+extern int acpi_skip_timer_override;
static inline void disable_ioapic_setup(void)
{
# dmidecode 2.3
SMBIOS 2.2 present.
37 structures occupying 981 bytes.
Table at 0x000F0800.
Handle 0x0000
DMI type 0, 19 bytes.
BIOS Information
Vendor: Phoenix Technologies, LTD
Version: 6.00 PG
Release Date: 03/24/2004
Address: 0xE0000
Runtime Size: 128 kB
ROM Size: 512 kB
Characteristics:
ISA is supported
PCI is supported
PNP is supported
APM is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/360 KB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 KB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
AGP is supported
LS-120 boot is supported
ATAPI Zip drive boot is supported
Handle 0x0001
DMI type 1, 25 bytes.
System Information
Manufacturer:
Product Name:
Version:
Serial Number:
UUID: 00000000-0000-0000-0000-00508DF1FBE3
Wake-up Type: Power Switch
Handle 0x0002
DMI type 2, 8 bytes.
Base Board Information
Manufacturer: http://www.abit.com.tw/
Product Name: NF7-S/NF7,NF7-V (nVidia-nForce2)
Version: 2.X,1.0
Serial Number:
Handle 0x0003
DMI type 3, 13 bytes.
Chassis Information
Manufacturer:
Type: Desktop
Lock: Not Present
Version:
Serial Number:
Asset Tag:
Boot-up State: Unknown
Power Supply State: Unknown
Thermal State: Unknown
Security Status: Unknown
Handle 0x0004
DMI type 4, 32 bytes.
Processor Information
Socket Designation: Socket A
Type: Central Processor
Family: Athlon
Manufacturer: AMD
ID: 81 06 00 00 FF FB 83 03
Signature: Type 0, Family 6, Model 8, Stepping 1
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
MMX (MMX technology supported)
FXSR (Fast floating-point save and restore)
SSE (Streaming SIMD extensions)
Version: AMD Athlon(tm) XP
Voltage: 1.6 V
External Clock: 200 MHz
Max Speed: 3000 MHz
Current Speed: 2100 MHz
Status: Populated, Enabled
Upgrade: ZIF Socket
L1 Cache Handle: 0x0009
L2 Cache Handle: 0x000A
L3 Cache Handle: No L3 Cache
Handle 0x0005
DMI type 5, 22 bytes.
Memory Controller Information
Error Detecting Method: 8-bit Parity
Error Correcting Capabilities:
None
Supported Interleave: One-way Interleave
Current Interleave: One-way Interleave
Maximum Memory Module Size: 1024 MB
Maximum Total Memory Size: 3072 MB
Supported Speeds:
Other
Supported Memory Types:
Other
DIMM
SDRAM
Memory Module Voltage: 2.9 V
Associated Memory Slots: 3
0x0006
0x0007
0x0008
Enabled Error Correcting Capabilities: None
Handle 0x0006
DMI type 6, 12 bytes.
Memory Module Information
Socket Designation: A0
Bank Connections: 0 1
Current Speed: 10 ns
Type: Other DIMM SDRAM
Installed Size: 512 MB (Double-bank Connection)
Enabled Size: 512 MB (Double-bank Connection)
Error Status: OK
Handle 0x0007
DMI type 6, 12 bytes.
Memory Module Information
Socket Designation: A1
Bank Connections: None
Current Speed: 10 ns
Type: Other DIMM SDRAM
Installed Size: Not Installed (Single-bank Connection)
Enabled Size: Not Installed (Single-bank Connection)
Error Status: OK
Handle 0x0008
DMI type 6, 12 bytes.
Memory Module Information
Socket Designation: A2
Bank Connections: 4 5
Current Speed: 10 ns
Type: Other DIMM SDRAM
Installed Size: 512 MB (Double-bank Connection)
Enabled Size: 512 MB (Double-bank Connection)
Error Status: OK
Handle 0x0009
DMI type 7, 19 bytes.
Cache Information
Socket Designation: Internal Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 128 KB
Maximum Size: 128 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown
Handle 0x000A
DMI type 7, 19 bytes.
Cache Information
Socket Designation: External Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: External
Installed Size: 256 KB
Maximum Size: 256 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown
Handle 0x000B
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: PRIMARY IDE
Internal Connector Type: On Board IDE
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x000C
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: SECONDARY IDE
Internal Connector Type: On Board IDE
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other
Handle 0x000D
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: FDD
Internal Connector Type: On Board Floppy
External Reference Designator: Not Specified
External Connector Type: None
Port Type: 8251 FIFO Compatible
Handle 0x000E
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: COM1
Internal Connector Type: 9 Pin Dual Inline (pin 10 cut)
External Reference Designator:
External Connector Type: DB-9 male
Port Type: Serial Port 16450 Compatible
Handle 0x000F
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: COM2
Internal Connector Type: 9 Pin Dual Inline (pin 10 cut)
External Reference Designator:
External Connector Type: DB-9 male
Port Type: Serial Port 16450 Compatible
Handle 0x0010
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: LPT1
Internal Connector Type: DB-25 female
External Reference Designator:
External Connector Type: DB-25 female
Port Type: Parallel Port ECP/EPP
Handle 0x0011
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: Keyboard
Internal Connector Type: PS/2
External Reference Designator:
External Connector Type: PS/2
Port Type: Keyboard Port
Handle 0x0012
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: PS/2 Mouse
Internal Connector Type: PS/2
External Reference Designator:
External Connector Type: PS/2
Port Type: Mouse Port
Handle 0x0013
DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: Not Specified
Internal Connector Type: None
External Reference Designator: USB
External Connector Type: Other
Port Type: USB
Handle 0x0014
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI0
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 1
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0015
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI1
Type: 32-bit PCI
Current Usage: Available
Length: Long
ID: 2
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0016
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI2
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 3
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0017
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI3
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 4
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0018
DMI type 9, 13 bytes.
System Slot Information
Designation: PCI4
Type: 32-bit PCI
Current Usage: Available
Length: Long
ID: 5
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x0019
DMI type 9, 13 bytes.
System Slot Information
Designation: AGP
Type: 32-bit AGP
Current Usage: Available
Length: Long
ID: 240
Characteristics:
5.0 V is provided
3.3 V is provided
Handle 0x001A
DMI type 13, 22 bytes.
BIOS Language Information
Installable Languages: 3
n|US|iso8859-1
n|US|iso8859-1
r|CA|iso8859-1
Currently Installed Language: n|US|iso8859-1
Handle 0x001B
DMI type 16, 15 bytes.
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 1536 MB
Error Information Handle: Not Provided
Number Of Devices: 3
Handle 0x001C
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: 512 MB
Form Factor: DIMM
Set: None
Locator: A0
Bank Locator: Bank0/1
Type: Unknown
Type Detail: None
Handle 0x001D
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: DIMM
Set: None
Locator: A1
Bank Locator: Bank2/3
Type: Unknown
Type Detail: None
Handle 0x001E
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: 512 MB
Form Factor: DIMM
Set: None
Locator: A2
Bank Locator: Bank4/5
Type: Unknown
Type Detail: None
Handle 0x001F
DMI type 19, 15 bytes.
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0003FFFFFFF
Range Size: 1 GB
Physical Array Handle: 0x001B
Partition Width: 0
Handle 0x0020
DMI type 20, 19 bytes.
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0001FFFFFFF
Range Size: 512 MB
Physical Device Handle: 0x001C
Memory Array Mapped Address Handle: 0x001F
Partition Row Position: 1
Handle 0x0021
DMI type 20, 19 bytes.
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x000000003FF
Range Size: 1 kB
Physical Device Handle: 0x001D
Memory Array Mapped Address Handle: 0x001F
Partition Row Position: 1
Handle 0x0022
DMI type 20, 19 bytes.
Memory Device Mapped Address
Starting Address: 0x00020000000
Ending Address: 0x0003FFFFFFF
Range Size: 512 MB
Physical Device Handle: 0x001E
Memory Array Mapped Address Handle: 0x001F
Partition Row Position: 1
Handle 0x0023
DMI type 32, 11 bytes.
System Boot Information
Status: No errors detected
Handle 0x0024
DMI type 127, 4 bytes.
End Of Table
On Thu, 2004-04-22 at 00:41, Len Brown wrote:
> > Please send me the output from dmidecode, available in /usr/sbin/, or
> > > here:
> > > http://www.nongnu.org/dmidecode/
> > > or
> > > http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
>
> On Wed, 2004-04-21 at 17:28, Prakash K. Cheemplavam wrote:
>
> > this is the output for Abit NF7-S Rev20 using bios d23. I have NOT
> > activated APIC for this. Is it needed?
>
> Yes, you need to enable ACPI and IOAPIC. The goal of this patch
> is to address the XT-PIC timer issue in IOAPIC mode.
>
> Here's the latest (vs 2.6.5).
Do we need any other patch? eg the idlec1halt patch? My Athlon still has
2.6.3 on it.
> I've got 1 Abit, 2 Asus, and 1 Shuttle DMI entry. Let me know if the
> product names (1st line of dmidecode entry) are correct,
> these are not from DMI, but are supposed to be human-readable titles.
+ { ignore_timer_override, "Asus A7N8X v2", {
> + MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
> + MATCH(DMI_BOARD_NAME, "A7N8X2.0"),
> + MATCH(DMI_BIOS_VERSION, "ASUS A7N8X2.0 Deluxe ACPI BIOS Rev 1007"),
> + MATCH(DMI_BIOS_DATE, "10/06/2003") }},
my dmidecode output also shows (in the first BIOS information section):
Vendor: Phoenix Technologies, LTD
although the Manufacturer is ASUSTek Computer INC. form the Base Board
and System sections.
Not really sure about the code. If it matches on all of above then it
might not work. Ill try a new kernel later today and see the result.
> I'm interested only in the latest BIOS -- if it is still broken.
> The assumption is that if a fixed BIOS is available, the users
> should upgrade.
>
Yes, I just checked yesterday and there was nothing new.
thanks
Craig
> thanks,
> -Len
>
> ps. latest BIOS on my shuttle has a C1 disconnect enable setting,
> (curiously, it is disabled by default) so I'll try to reproduce the hang
> on it...
>
> ===== Documentation/kernel-parameters.txt 1.44 vs edited =====
> --- 1.44/Documentation/kernel-parameters.txt Mon Mar 22 16:03:22 2004
> +++ edited/Documentation/kernel-parameters.txt Wed Apr 21 15:28:12 2004
> @@ -122,6 +122,10 @@
>
> acpi_serialize [HW,ACPI] force serialization of AML methods
>
> + acpi_skip_timer_override [HW,ACPI]
> + Recognize and ignore IRQ0/pin2 Interrupt Override.
> + For broken nForce2 BIOS resulting in XT-PIC timer.
> +
> ad1816= [HW,OSS]
> Format: <io>,<irq>,<dma>,<dma2>
> See also Documentation/sound/oss/AD1816.
> ===== arch/i386/kernel/dmi_scan.c 1.57 vs edited =====
> --- 1.57/arch/i386/kernel/dmi_scan.c Fri Apr 16 22:03:06 2004
> +++ edited/arch/i386/kernel/dmi_scan.c Wed Apr 21 18:29:35 2004
> @@ -540,6 +540,19 @@
> #endif
>
> /*
> + * early nForce2 reference BIOS shipped with a
> + * bogus ACPI IRQ0 -> pin2 interrupt override -- ignore it
> + */
> +static __init int ignore_timer_override(struct dmi_blacklist *d)
> +{
> + extern int acpi_skip_timer_override;
> + printk(KERN_NOTICE "%s detected: BIOS IRQ0 pin2 override"
> + " will be ignored\n", d->ident);
> +
> + acpi_skip_timer_override = 1;
> + return 0;
> +}
> +/*
> * Process the DMI blacklists
> */
>
> @@ -944,6 +957,37 @@
> MATCH(DMI_BOARD_VENDOR, "IBM"),
> MATCH(DMI_PRODUCT_NAME, "eserver xSeries 440"),
> NO_MATCH, NO_MATCH }},
> +
> +/*
> + * Systems with nForce2 BIOS timer override bug
> + * add Albatron KM18G Pro
> + * add DFI NFII 400-AL
> + * add Epox 8RGA+
> + * add Shuttle AN35N
> + */
> + { ignore_timer_override, "Abit NF7-S v2", {
> + MATCH(DMI_BOARD_VENDOR, "http://www.abit.com.tw/"),
> + MATCH(DMI_BOARD_NAME, "NF7-S/NF7,NF7-V (nVidia-nForce2)"),
> + MATCH(DMI_BIOS_VERSION, "6.00 PG"),
> + MATCH(DMI_BIOS_DATE, "03/24/2004") }},
> +
> + { ignore_timer_override, "Asus A7N8X v2", {
> + MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
> + MATCH(DMI_BOARD_NAME, "A7N8X2.0"),
> + MATCH(DMI_BIOS_VERSION, "ASUS A7N8X2.0 Deluxe ACPI BIOS Rev 1007"),
> + MATCH(DMI_BIOS_DATE, "10/06/2003") }},
> +
> + { ignore_timer_override, "Asus A7N8X-X", {
> + MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
> + MATCH(DMI_BOARD_NAME, "A7N8X-X"),
> + MATCH(DMI_BIOS_VERSION, "ASUS A7N8X-X ACPI BIOS Rev 1007"),
> + MATCH(DMI_BIOS_DATE, "10/07/2003") }},
> +
> + { ignore_timer_override, "Shuttle SN41G2", {
> + MATCH(DMI_BOARD_VENDOR, "Shuttle Inc"),
> + MATCH(DMI_BOARD_NAME, "FN41"),
> + MATCH(DMI_BIOS_VERSION, "6.00 PG"),
> + MATCH(DMI_BIOS_DATE, "01/14/2004") }},
> #endif // CONFIG_ACPI_BOOT
>
> #ifdef CONFIG_ACPI_PCI
> ===== arch/i386/kernel/setup.c 1.115 vs edited =====
> --- 1.115/arch/i386/kernel/setup.c Fri Apr 2 07:21:43 2004
> +++ edited/arch/i386/kernel/setup.c Wed Apr 21 15:28:12 2004
> @@ -614,6 +614,9 @@
> else if (!memcmp(from, "acpi_sci=low", 12))
> acpi_sci_flags.polarity = 3;
>
> + else if (!memcmp(from, "acpi_skip_timer_override", 24))
> + acpi_skip_timer_override = 1;
> +
> #ifdef CONFIG_X86_LOCAL_APIC
> /* disable IO-APIC */
> else if (!memcmp(from, "noapic", 6))
> ===== arch/i386/kernel/acpi/boot.c 1.58 vs edited =====
> --- 1.58/arch/i386/kernel/acpi/boot.c Tue Apr 20 20:54:03 2004
> +++ edited/arch/i386/kernel/acpi/boot.c Wed Apr 21 15:28:13 2004
> @@ -62,6 +62,7 @@
>
> acpi_interrupt_flags acpi_sci_flags __initdata;
> int acpi_sci_override_gsi __initdata;
> +int acpi_skip_timer_override __initdata;
>
> #ifdef CONFIG_X86_LOCAL_APIC
> static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
> @@ -327,6 +328,12 @@
> acpi_sci_ioapic_setup(intsrc->global_irq,
> intsrc->flags.polarity, intsrc->flags.trigger);
> return 0;
> + }
> +
> + if (acpi_skip_timer_override &&
> + intsrc->bus_irq == 0 && intsrc->global_irq == 2) {
> + printk(PREFIX "BIOS IRQ0 pin2 override ignored.\n");
> + return 0;
> }
>
> mp_override_legacy_irq (
> ===== include/asm-i386/acpi.h 1.18 vs edited =====
> --- 1.18/include/asm-i386/acpi.h Tue Mar 30 17:05:19 2004
> +++ edited/include/asm-i386/acpi.h Wed Apr 21 15:28:14 2004
> @@ -118,6 +118,7 @@
> #ifdef CONFIG_X86_IO_APIC
> extern int skip_ioapic_setup;
> extern int acpi_irq_to_vector(u32 irq); /* deprecated in favor of
> acpi_gsi_to_irq */
> +extern int acpi_skip_timer_override;
>
> static inline void disable_ioapic_setup(void)
> {
>
>
Len,
Please bear in mind that the people from Shuttle are the only ones that
have seemingly fixed it, alledgedly, late in December. I only have data
for one Shuttle board, but that if they (Shuttle) would fix it, they would
fix it for all boards. For Shuttle AN35N rev 1.1 there is a BIOS update
from 05-Dec-2003 that has probably addressed this issue.
So if you are looking to reproduce this hang, don't update your BIOS :)
Regards,
Arjen
On 21 Apr 2004, Len Brown wrote:
> > Please send me the output from dmidecode, available in /usr/sbin/, or
> > > here:
> > > http://www.nongnu.org/dmidecode/
> > > or
> > > http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
>
> On Wed, 2004-04-21 at 17:28, Prakash K. Cheemplavam wrote:
>
> > this is the output for Abit NF7-S Rev20 using bios d23. I have NOT
> > activated APIC for this. Is it needed?
>
> Yes, you need to enable ACPI and IOAPIC. The goal of this patch
> is to address the XT-PIC timer issue in IOAPIC mode.
>
> Here's the latest (vs 2.6.5).
>
> I've got 1 Abit, 2 Asus, and 1 Shuttle DMI entry. Let me know if the
> product names (1st line of dmidecode entry) are correct,
> these are not from DMI, but are supposed to be human-readable titles.
>
> I'm interested only in the latest BIOS -- if it is still broken.
> The assumption is that if a fixed BIOS is available, the users
> should upgrade.
>
> thanks,
> -Len
>
> ps. latest BIOS on my shuttle has a C1 disconnect enable setting,
> (curiously, it is disabled by default) so I'll try to reproduce the hang
> on it...
>
> ===== Documentation/kernel-parameters.txt 1.44 vs edited =====
> --- 1.44/Documentation/kernel-parameters.txt Mon Mar 22 16:03:22 2004
> +++ edited/Documentation/kernel-parameters.txt Wed Apr 21 15:28:12 2004
> @@ -122,6 +122,10 @@
>
> acpi_serialize [HW,ACPI] force serialization of AML methods
>
> + acpi_skip_timer_override [HW,ACPI]
> + Recognize and ignore IRQ0/pin2 Interrupt Override.
> + For broken nForce2 BIOS resulting in XT-PIC timer.
> +
> ad1816= [HW,OSS]
> Format: <io>,<irq>,<dma>,<dma2>
> See also Documentation/sound/oss/AD1816.
> ===== arch/i386/kernel/dmi_scan.c 1.57 vs edited =====
> --- 1.57/arch/i386/kernel/dmi_scan.c Fri Apr 16 22:03:06 2004
> +++ edited/arch/i386/kernel/dmi_scan.c Wed Apr 21 18:29:35 2004
> @@ -540,6 +540,19 @@
> #endif
>
> /*
> + * early nForce2 reference BIOS shipped with a
> + * bogus ACPI IRQ0 -> pin2 interrupt override -- ignore it
> + */
> +static __init int ignore_timer_override(struct dmi_blacklist *d)
> +{
> + extern int acpi_skip_timer_override;
> + printk(KERN_NOTICE "%s detected: BIOS IRQ0 pin2 override"
> + " will be ignored\n", d->ident);
> +
> + acpi_skip_timer_override = 1;
> + return 0;
> +}
> +/*
> * Process the DMI blacklists
> */
>
> @@ -944,6 +957,37 @@
> MATCH(DMI_BOARD_VENDOR, "IBM"),
> MATCH(DMI_PRODUCT_NAME, "eserver xSeries 440"),
> NO_MATCH, NO_MATCH }},
> +
> +/*
> + * Systems with nForce2 BIOS timer override bug
> + * add Albatron KM18G Pro
> + * add DFI NFII 400-AL
> + * add Epox 8RGA+
> + * add Shuttle AN35N
> + */
> + { ignore_timer_override, "Abit NF7-S v2", {
> + MATCH(DMI_BOARD_VENDOR, "http://www.abit.com.tw/"),
> + MATCH(DMI_BOARD_NAME, "NF7-S/NF7,NF7-V (nVidia-nForce2)"),
> + MATCH(DMI_BIOS_VERSION, "6.00 PG"),
> + MATCH(DMI_BIOS_DATE, "03/24/2004") }},
> +
> + { ignore_timer_override, "Asus A7N8X v2", {
> + MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
> + MATCH(DMI_BOARD_NAME, "A7N8X2.0"),
> + MATCH(DMI_BIOS_VERSION, "ASUS A7N8X2.0 Deluxe ACPI BIOS Rev 1007"),
> + MATCH(DMI_BIOS_DATE, "10/06/2003") }},
> +
> + { ignore_timer_override, "Asus A7N8X-X", {
> + MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
> + MATCH(DMI_BOARD_NAME, "A7N8X-X"),
> + MATCH(DMI_BIOS_VERSION, "ASUS A7N8X-X ACPI BIOS Rev 1007"),
> + MATCH(DMI_BIOS_DATE, "10/07/2003") }},
> +
> + { ignore_timer_override, "Shuttle SN41G2", {
> + MATCH(DMI_BOARD_VENDOR, "Shuttle Inc"),
> + MATCH(DMI_BOARD_NAME, "FN41"),
> + MATCH(DMI_BIOS_VERSION, "6.00 PG"),
> + MATCH(DMI_BIOS_DATE, "01/14/2004") }},
> #endif // CONFIG_ACPI_BOOT
>
> #ifdef CONFIG_ACPI_PCI
> ===== arch/i386/kernel/setup.c 1.115 vs edited =====
> --- 1.115/arch/i386/kernel/setup.c Fri Apr 2 07:21:43 2004
> +++ edited/arch/i386/kernel/setup.c Wed Apr 21 15:28:12 2004
> @@ -614,6 +614,9 @@
> else if (!memcmp(from, "acpi_sci=low", 12))
> acpi_sci_flags.polarity = 3;
>
> + else if (!memcmp(from, "acpi_skip_timer_override", 24))
> + acpi_skip_timer_override = 1;
> +
> #ifdef CONFIG_X86_LOCAL_APIC
> /* disable IO-APIC */
> else if (!memcmp(from, "noapic", 6))
> ===== arch/i386/kernel/acpi/boot.c 1.58 vs edited =====
> --- 1.58/arch/i386/kernel/acpi/boot.c Tue Apr 20 20:54:03 2004
> +++ edited/arch/i386/kernel/acpi/boot.c Wed Apr 21 15:28:13 2004
> @@ -62,6 +62,7 @@
>
> acpi_interrupt_flags acpi_sci_flags __initdata;
> int acpi_sci_override_gsi __initdata;
> +int acpi_skip_timer_override __initdata;
>
> #ifdef CONFIG_X86_LOCAL_APIC
> static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
> @@ -327,6 +328,12 @@
> acpi_sci_ioapic_setup(intsrc->global_irq,
> intsrc->flags.polarity, intsrc->flags.trigger);
> return 0;
> + }
> +
> + if (acpi_skip_timer_override &&
> + intsrc->bus_irq == 0 && intsrc->global_irq == 2) {
> + printk(PREFIX "BIOS IRQ0 pin2 override ignored.\n");
> + return 0;
> }
>
> mp_override_legacy_irq (
> ===== include/asm-i386/acpi.h 1.18 vs edited =====
> --- 1.18/include/asm-i386/acpi.h Tue Mar 30 17:05:19 2004
> +++ edited/include/asm-i386/acpi.h Wed Apr 21 15:28:14 2004
> @@ -118,6 +118,7 @@
> #ifdef CONFIG_X86_IO_APIC
> extern int skip_ioapic_setup;
> extern int acpi_irq_to_vector(u32 irq); /* deprecated in favor of
> acpi_gsi_to_irq */
> +extern int acpi_skip_timer_override;
>
> static inline void disable_ioapic_setup(void)
> {
>
>
>
On Thu, 2004-04-22 at 03:26, Prakash K. Cheemplavam wrote:
> Len Brown wrote:
> > Yes, you need to enable ACPI and IOAPIC. The goal of this patch
> > is to address the XT-PIC timer issue in IOAPIC mode.
>
> Ok, I recompiled using your (former) patch and Ross' apic tack patch. I
> attached the new dmidecode Text.
Actually dmidecode dumps hard-coded BIOS data, so it will not change
unless you upgrade your BIOS.
> > I've got 1 Abit, 2 Asus, and 1 Shuttle DMI entry. Let me know if the
> > product names (1st line of dmidecode entry) are correct,
> > these are not from DMI, but are supposed to be human-readable titles.
>
> Are you referring to (as the first line doesn't say much):
>
> Product Name: NF7-S/NF7,NF7-V (nVidia-nForce2)
> Version: 2.X,1.0
+ { ignore_timer_override, "Abit NF7-S v2", {
This one is for humans and anything can be in the string.
+ MATCH(DMI_BOARD_VENDOR,
"http://www.abit.com.tw/"),
+ MATCH(DMI_BOARD_NAME, "NF7-S/NF7,NF7-V
(nVidia-nForce2)"
),
+ MATCH(DMI_BIOS_VERSION, "6.00 PG"),
+ MATCH(DMI_BIOS_DATE, "03/24/2004") }},
These are keys in the DMI table, and have to match the BIOS (as seen in
dmidecode) exactly.
> Seems pretty much OK, though I don't understand, why 1.0 is in the
> Version string. Durthermore I don't understand, why "Phoenix" appears as
> bios vendor. It should be Award, AFAIK.
Phoenix and Award merged.
Doesn't really matter what it says, it is just a string compare to
linux. Also, I chose not to look at the BIOS vendor in this example
b/c it adds no value, here we're just looking at BOARD vendor & name,
plus BIOS version and date.
Thanks for confirming that the entry matched your system and that the
patch triggered automatically.
-Len
On Thu, 2004-04-22 at 04:45, Craig Bradney wrote:
> > Yes, you need to enable ACPI and IOAPIC. The goal of this patch
> > is to address the XT-PIC timer issue in IOAPIC mode.
> >
> > Here's the latest (vs 2.6.5).
>
>
> Do we need any other patch? eg the idlec1halt patch? My Athlon still has
> 2.6.3 on it.
If you needed idlec1halt before, you still need it.
This patch just addresses the XT-PIC timer issue.
>
> > I've got 1 Abit, 2 Asus, and 1 Shuttle DMI entry. Let me know if the
> > product names (1st line of dmidecode entry) are correct,
> > these are not from DMI, but are supposed to be human-readable titles.
>
> + { ignore_timer_override, "Asus A7N8X v2", {
> > + MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
> > + MATCH(DMI_BOARD_NAME, "A7N8X2.0"),
> > + MATCH(DMI_BIOS_VERSION, "ASUS A7N8X2.0 Deluxe ACPI BIOS Rev 1007"),
> > + MATCH(DMI_BIOS_DATE, "10/06/2003") }},
>
> my dmidecode output also shows (in the first BIOS information section):
> Vendor: Phoenix Technologies, LTD
> although the Manufacturer is ASUSTek Computer INC. form the Base Board
> and System sections.
Right, DMI has separate sections for System, Board, BIOS, and we're
using two pieces from the BOARD and two pieces from the BIOS sections.
> Not really sure about the code. If it matches on all of above then it
> might not work. Ill try a new kernel later today and see the result.
The workaround is triggered only if all the MATCH()'s above match.
If it doesn't trigger, then either I munged it on copy out of dmidecode
or you've got a different BIOS and we need a new dmidecode...
> > I'm interested only in the latest BIOS -- if it is still broken.
> > The assumption is that if a fixed BIOS is available, the users
> > should upgrade.
> >
>
> Yes, I just checked yesterday and there was nothing new.
thanks,
-Len
On Wed, Apr 21, 2004 at 06:41:38PM -0400, Len Brown wrote:
> > Please send me the output from dmidecode, available in /usr/sbin/, or
>
> I've got 1 Abit, 2 Asus, and 1 Shuttle DMI entry. Let me know if the
> product names (1st line of dmidecode entry) are correct,
> these are not from DMI, but are supposed to be human-readable titles.
>
> I'm interested only in the latest BIOS -- if it is still broken.
> The assumption is that if a fixed BIOS is available, the users
> should upgrade.
>
> thanks,
> -Len
>
> ps. latest BIOS on my shuttle has a C1 disconnect enable setting,
> (curiously, it is disabled by default) so I'll try to reproduce the hang
> on it...
>
On the Shuttle AN35N, the C1 disconnect option default is auto. If you're
talking about this board, or another board Shuttle seemingly fixed, then I
can tell you that I haven't been able to get my to hang with vanilla kernels.
As for your patch, I get a fast timer, and gain about 1 sec per 5 minutes.
The only patch that seemed to work without a fast timer so far was the one
removed by Linus in a testing version. The AN35N has the timer override
bug.
Attached is the dmidecode for the AN35N. Note: onboard sound may be disabled.
Jesse
On Thu, 2004-04-22 at 12:39, Jesse Allen wrote:
> On the Shuttle AN35N, the C1 disconnect option default is auto. If you're
> talking about this board, or another board Shuttle seemingly fixed, then I
> can tell you that I haven't been able to get my to hang with vanilla kernels.
Have you been able to hang the AN35N under any conditions?
Old BIOS, non-vanilla kernel?
> As for your patch, I get a fast timer, and gain about 1 sec per 5 minutes.
> The only patch that seemed to work without a fast timer so far was the one
> removed by Linus in a testing version. The AN35N has the timer override
> bug.
Hmm, I didn't notice fast time on my FN41, i'll look for it.
I'm not familiar with the "one removed by Linux in a testing version",
perhaps you could point me to that?
> Attached is the dmidecode for the AN35N.
applied.
thanks,
-Len
On Thu, 2004-04-22 at 17:03, Len Brown wrote:
> On Thu, 2004-04-22 at 04:45, Craig Bradney wrote:
>
> > > Yes, you need to enable ACPI and IOAPIC. The goal of this patch
> > > is to address the XT-PIC timer issue in IOAPIC mode.
> > >
> > > Here's the latest (vs 2.6.5).
> >
> >
> > Do we need any other patch? eg the idlec1halt patch? My Athlon still has
> > 2.6.3 on it.
>
> If you needed idlec1halt before, you still need it.
> This patch just addresses the XT-PIC timer issue.
>
> >
> > > I've got 1 Abit, 2 Asus, and 1 Shuttle DMI entry. Let me know if the
> > > product names (1st line of dmidecode entry) are correct,
> > > these are not from DMI, but are supposed to be human-readable titles.
> >
> > + { ignore_timer_override, "Asus A7N8X v2", {
> > > + MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
> > > + MATCH(DMI_BOARD_NAME, "A7N8X2.0"),
> > > + MATCH(DMI_BIOS_VERSION, "ASUS A7N8X2.0 Deluxe ACPI BIOS Rev 1007"),
> > > + MATCH(DMI_BIOS_DATE, "10/06/2003") }},
> >
> > my dmidecode output also shows (in the first BIOS information section):
> > Vendor: Phoenix Technologies, LTD
> > although the Manufacturer is ASUSTek Computer INC. form the Base Board
> > and System sections.
>
> Right, DMI has separate sections for System, Board, BIOS, and we're
> using two pieces from the BOARD and two pieces from the BIOS sections.
>
> > Not really sure about the code. If it matches on all of above then it
> > might not work. Ill try a new kernel later today and see the result.
>
> The workaround is triggered only if all the MATCH()'s above match.
> If it doesn't trigger, then either I munged it on copy out of dmidecode
> or you've got a different BIOS and we need a new dmidecode...
>
> > > I'm interested only in the latest BIOS -- if it is still broken.
> > > The assumption is that if a fixed BIOS is available, the users
> > > should upgrade.
> > >
> >
> > Yes, I just checked yesterday and there was nothing new.
[Have sent this email with attachments directly to Len, attachments are
just /proc/interrupts and dmegs output. If someone is interested, please
ask for them]
Hi Len
Please find attached /proc/interrupts and dmesg from 3 boots, 2 with new
kernel.
263 : gentoo-dev-sources-r1 2.6.3 kernel with Ross Dickson's idleC1halt
and IOAPIC patches.
265: gentoo-dev-sources-r1 2.6.5 kernel with Ross Dickson's idleC1halt
for 2.6.5 kernel only. Note in 265pi (/proc/interrupts):
0: 54821 XT-PIC timer
265-lb: gentoo-dev-sources-r1 2.6.5 kernel with Ross Dickson's
idleC1halt for 2.6.5 kernel and your patch for the interrupt.
Note in 265pi-lb (/proc/interrupts):
0: 51144 IO-APIC-edge timer
so.. looks good here. :) I was surprised to see this effect with no boot
kernel option though. Having read the code I see you set the value to 1
and therefore on. Seems fine to me.
regards
Craig
On Thu, 2004-04-22 at 13:21, Len Brown wrote:
> > As for your patch, I get a fast timer, and gain about 1 sec per 5 minutes.
> > The only patch that seemed to work without a fast timer so far was the one
> > removed by Linus in a testing version. The AN35N has the timer override
> > bug.
>
> Hmm, I didn't notice fast time on my FN41, i'll look for it.
>
> I'm not familiar with the "one removed by Linux in a testing version",
> perhaps you could point me to that?
date seems to gain 9sec/hour on my Shuttle/SN41G2/FN41 when using IOAPIC
timer.
booted with "noapic" for XT-PIC timer, it stays locked
onto my wristwatch after an hour. If the workaround is disabled,
and XT-PIC timer is used, it matches the "noapic" behaviour -- no drift.
I can't explain it. I think it is a timer problem independent of the
IRQ routing.
-Len
ps. when i ran in XT-PIC mode there were lots of ERR's registered in
/proc/interrupts -- doesn't look healthy.
Len Brown wrote:
> On Thu, 2004-04-22 at 13:21, Len Brown wrote:
>
>
>>>As for your patch, I get a fast timer, and gain about 1 sec per 5 minutes.
>>>The only patch that seemed to work without a fast timer so far was the one
>>>removed by Linus in a testing version. The AN35N has the timer override
>>>bug.
>>
>>Hmm, I didn't notice fast time on my FN41, i'll look for it.
>>
>>I'm not familiar with the "one removed by Linux in a testing version",
>>perhaps you could point me to that?
>
>
> date seems to gain 9sec/hour on my Shuttle/SN41G2/FN41 when using IOAPIC
> timer.
Do you get lock-ups wihtout the timer_ack/C1halt patch? If yes, this may
be the cause. I remember someone finding out that Ross' patch made the
timer actually slower which resulted in stable operation. Maciej found
out, not connecting the timer at all made it stabke as well. So is there
a possibility to sync both timers?
According to a recent post, buil?ding kernel with SMP makes it stable,
as well, but I haven't tested.
> booted with "noapic" for XT-PIC timer, it stays locked
> onto my wristwatch after an hour. If the workaround is disabled,
> and XT-PIC timer is used, it matches the "noapic" behaviour -- no drift.
>
> I can't explain it. I think it is a timer problem independent of the
> IRQ routing.
>
> -Len
>
> ps. when i ran in XT-PIC mode there were lots of ERR's registered in
> /proc/interrupts -- doesn't look healthy.
>
>
>
>
He even filed a bug report:
http://bugme.osdl.org/show_bug.cgi?id=2552
I don't have access to my box atm, but I will certainly be trying a
vanilla kernel built with SMP to see what's going on.
Regards,
Arjen
On Fri, 23 Apr 2004, Prakash K. Cheemplavam wrote:
> Len Brown wrote:
> > On Thu, 2004-04-22 at 13:21, Len Brown wrote:
> >
> >
> >>>As for your patch, I get a fast timer, and gain about 1 sec per 5 minutes.
> >>>The only patch that seemed to work without a fast timer so far was the one
> >>>removed by Linus in a testing version. The AN35N has the timer override
> >>>bug.
> >>
> >>Hmm, I didn't notice fast time on my FN41, i'll look for it.
> >>
> >>I'm not familiar with the "one removed by Linux in a testing version",
> >>perhaps you could point me to that?
> >
> >
> > date seems to gain 9sec/hour on my Shuttle/SN41G2/FN41 when using IOAPIC
> > timer.
>
> Do you get lock-ups wihtout the timer_ack/C1halt patch? If yes, this may
> be the cause. I remember someone finding out that Ross' patch made the
> timer actually slower which resulted in stable operation. Maciej found
> out, not connecting the timer at all made it stabke as well. So is there
> a possibility to sync both timers?
>
> According to a recent post, buil?ding kernel with SMP makes it stable,
> as well, but I haven't tested.
>
> > booted with "noapic" for XT-PIC timer, it stays locked
> > onto my wristwatch after an hour. If the workaround is disabled,
> > and XT-PIC timer is used, it matches the "noapic" behaviour -- no drift.
> >
> > I can't explain it. I think it is a timer problem independent of the
> > IRQ routing.
> >
> > -Len
> >
> > ps. when i ran in XT-PIC mode there were lots of ERR's registered in
> > /proc/interrupts -- doesn't look healthy.
> >
> >
> >
> >
>
>
Arjen Verweij wrote:
> He even filed a bug report:
>
> http://bugme.osdl.org/show_bug.cgi?id=2552
>
> I don't have access to my box atm, but I will certainly be trying a
> vanilla kernel built with SMP to see what's going on.
Hmm, well, I just tried it with 2.6.6-rc2-mm1 and it did NOT succeed, ie
it locked up. Maybe I need to use the exact kernel version and
configuration to find out, what's going on.
Prakash
Arjen Verweij wrote:
> He even filed a bug report:
>
> http://bugme.osdl.org/show_bug.cgi?id=2552
>
> I don't have access to my box atm, but I will certainly be trying a
> vanilla kernel built with SMP to see what's going on.
Ok, I read the bug report, so it ssems it will still lock-up from my
silicon image sata controller, but not from PATA internal ide. Well, I
only tried the sata, but I don't quite understand what makes the
difference...at least no go for me.
Prakash
On Thu, 22 Apr 2004, Len Brown wrote:
> date seems to gain 9sec/hour on my Shuttle/SN41G2/FN41 when using IOAPIC
> timer.
>
> booted with "noapic" for XT-PIC timer, it stays locked
> onto my wristwatch after an hour. If the workaround is disabled,
> and XT-PIC timer is used, it matches the "noapic" behaviour -- no drift.
>
> I can't explain it. I think it is a timer problem independent of the
> IRQ routing.
>
> -Len
>
> ps. when i ran in XT-PIC mode there were lots of ERR's registered in
> /proc/interrupts -- doesn't look healthy.
It looks like a noise on the timer IRQ line causing spurious interrupt
edges. In the XT-PIC mode it gets ignored -- at the time the CPU issues
an ack, the request is already gone and the PIC signals a spurious
interrupt. In the APIC mode the interrupt is delivered as a regular one
as edge interrupt events are persistent for the APICs -- if a falling edge
happens before an interrupt is acked it's not assumed to be gone and is
delivered as a real one.
Another possibility is there's a bug in our APIC interrupt setup, leading
to the timer interrupt being enabled both in the APIC and in the PIC.
You can verify that by calling debug functions for dumping states of the
controllers from io_apic.c. They are print_IO_APIC(), print_local_APIC()
and print_PIC() -- you may call them from an ad-hoc written small module,
although the first one is (accidentally?) marked __init, so you'd have to
remove the mark first. You need to call all of them to get a complete
view.
Maciej
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +
On Friday 23 April 2004 03:21, Len Brown wrote:
> On Thu, 2004-04-22 at 12:39, Jesse Allen wrote:
>
> > On the Shuttle AN35N, the C1 disconnect option default is auto. If you're
> > talking about this board, or another board Shuttle seemingly fixed, then I
> > can tell you that I haven't been able to get my to hang with vanilla kernels.
>
> Have you been able to hang the AN35N under any conditions?
> Old BIOS, non-vanilla kernel?
>
> > As for your patch, I get a fast timer, and gain about 1 sec per 5 minutes.
> > The only patch that seemed to work without a fast timer so far was the one
> > removed by Linus in a testing version. The AN35N has the timer override
> > bug.
>
> Hmm, I didn't notice fast time on my FN41, i'll look for it.
>
> I'm not familiar with the "one removed by Linux in a testing version",
> perhaps you could point me to that?
This is Maciej's patch - latest posting of it that I have seen,
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-04/3174.html
His fix up of the 8259 ack issue (when used without routing 8254 pit into
io-apic INTIN0) successfully establishes a virtual wire mode input of the timer
which the nforce2 seems happy with albeit without being able to use
"nmi_debug=1"
It is that timer ack issue tied up with the integrated apic.
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-04/2143.html
This refers to when it was in the 2.6.3-rc1-mm1
http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-02/2658.html
Regards
Ross.
>
> > Attached is the dmidecode for the AN35N.
>
> applied.
>
> thanks,
> -Len
>
>
>
>
>
Hi,
we once had this subject a bit, but it doesn't seem to be fully
resolved. It is still about the C1 halt state. Perhaps you remember me
having trouble getting low idle temps with my nforce2 and Athlon XP.
WIth a previous kernel I could get them back using agpgart and nvidia
binary. But now (2.6.6-rc2-mm1) even using the open source nvidia
driver, idle temps seem to do whatever they like (no matter if PIC or
APIC is used). I really think that the C1 state isn't called properly.
(cpu disconnect is activated)
cat /proc/acpi/processor/CPU0/power
active state: C1
default state: C1
bus master activity: 00000000
states:
*C1: promotion[--] demotion[--] latency[000]
usage[00000000]
C2: <not supported>
C3: <not supported>
You told that the usage probably keeps 0 as it is not counted. But this
makes me wonder: Yesterday with I tried acpi=force on a board with VIA
MVP3 chipset. The bios is from 2000 and guess what, here C1 and even C2
semm to be used properly and the usage is even counted. ACPI seems to
work better than on my nforce2...
So I wonder why on nforce2 C1 usage isn't counted. I now have the strong
feeling that is itn't properly called under some circumstances.
Should I open a bug report? If yes, what files do you need?
Thanks,
Prakash
Hello,
I'm sorry for the small interlude in this thread, but I just want to get
something clear.
Basically we have a problem that is all around, except for (some) Shuttle
boards. Noone really knows what's going on, or at least if they know they
are not vocal about it.
In comes Ross Dickson. He starts poking at the problem until he comes up
with two patches. Near the end of 2003, an NVIDIA engineer (Allen Martin)
states that he (or maybe NVIDIA as a whole?) has been unable to reproduce
this weird problem with hard locks, seemingly related to APIC and IO.
He can tell us there was a bug in a reference BIOS that NVIDIA sent out
into the world, but that it has been fixed in a follow-up. Somewhere at
the start of December, Shuttle updates its BIOS for the AN35. Jesse Allen
flashes the new BIOS into his board and for reasons unknown his hard lock
problem has vanished. The importance of the update of NVIDIA's reference
BIOS in relation to the Shuttle update of the BIOS for their product(s) is
unknown as well.
Meanwhile, Ross Dickson drops requests for support tickets at AMD and
NVIDIA. Until this day, no reply yet. Unaffected by the deafening silence
he keeps improving his patches which seem to work(tm).
Without Ross' hard labor one can avoid the hard locks by banning APIC
support from the kernel, or turn off the C1 disconnect feature in the
BIOS, which is misinterpreted by one ACPI developer as running the CPU
"out of spec."
Recently Len Brown, the ACPI Linux kernel maintainer and Intel employee -
can you spot the irony? - agrees to attempt to reproduce the problem.
After having his box run with cat /dev/hda > /dev/null for a night
straight no lockup has occured. The brand of his motherboard is Shuttle.
Did I mention irony...?
Although this topic is primarily about nforce2 chipsets, similar problems
have been reported with SiS chipsets for AMD cpus. Other chipsets capable
of having the CPU disconnect include VIA KT266(A), KT333 and KT400. For
linux a tool like athcool can set the bits for the disconnect and the HLT
instruction. It is unconfirmed that these chipsets suffer from the same
symptoms as nforce2 chipsets.
Does anyone have some input on how to tackle this problem? The only things
I can come up with is mailing all the motherboard manufacturers I can
think of, harass NVIDIA and/or AMD some more through proper channels (i.e.
file a "bug report", but I don't expect much from this, sorry Allen) or
buy Len the cheapest broken nforce2 board I can find at pricewatch.com and
have it shipped to his house :)
Best regards,
Arjen
On Tue, 2004-04-27 at 19:02, Arjen Verweij wrote:
> Hello,
>
> I'm sorry for the small interlude in this thread, but I just want to get
> something clear.
Imho it was a nice summation of the situation and it might be welcome
for ppl that just started reading about this.
> Basically we have a problem that is all around, except for (some) Shuttle
> boards. Noone really knows what's going on, or at least if they know they
> are not vocal about it.
Yep, and asus seems to only add support for new ram manuf in dual ddr
mode.
> In comes Ross Dickson. He starts poking at the problem until he comes up
> with two patches. Near the end of 2003, an NVIDIA engineer (Allen Martin)
> states that he (or maybe NVIDIA as a whole?) has been unable to reproduce
> this weird problem with hard locks, seemingly related to APIC and IO.
> He can tell us there was a bug in a reference BIOS that NVIDIA sent out
> into the world, but that it has been fixed in a follow-up. Somewhere at
> the start of December, Shuttle updates its BIOS for the AN35. Jesse Allen
> flashes the new BIOS into his board and for reasons unknown his hard lock
> problem has vanished. The importance of the update of NVIDIA's reference
> BIOS in relation to the Shuttle update of the BIOS for their product(s) is
> unknown as well.
> Meanwhile, Ross Dickson drops requests for support tickets at AMD and
> NVIDIA. Until this day, no reply yet. Unaffected by the deafening silence
> he keeps improving his patches which seem to work(tm).
Yep, and we are all great full for that =), thanks Ross.
> Without Ross' hard labor one can avoid the hard locks by banning APIC
> support from the kernel, or turn off the C1 disconnect feature in the
> BIOS, which is misinterpreted by one ACPI developer as running the CPU
> "out of spec."
Well, it gets hot... like hell.
> Recently Len Brown, the ACPI Linux kernel maintainer and Intel employee -
> can you spot the irony? - agrees to attempt to reproduce the problem.
> After having his box run with cat /dev/hda > /dev/null for a night
> straight no lockup has occured. The brand of his motherboard is Shuttle.
> Did I mention irony...?
Heh.
> Although this topic is primarily about nforce2 chipsets, similar problems
> have been reported with SiS chipsets for AMD cpus. Other chipsets capable
> of having the CPU disconnect include VIA KT266(A), KT333 and KT400. For
> linux a tool like athcool can set the bits for the disconnect and the HLT
> instruction. It is unconfirmed that these chipsets suffer from the same
> symptoms as nforce2 chipsets.
There are several other things that can nuke machines though.
A friend has problem with dma on a intel chipset (i keep monitoring the
changelogs for fixes) but he has problems getting a > 20 says uptime.
(crashes faster with dma enabled)
My firewall, a VIA Samuel 2 (microitx) dies after a few hours if you
enable cpu freq. But it also seems like it changes cpu speed to often.
The common denominator with my fw and my desktop is 'to often'. Which
leads me to suspect that the Hz change from 100 -> 1000 could be
somewhat responsible. Could it be that we just run it to often and thus
worsen the impact? And C1 disconnect shouldn't be run that often imho.
Neither should cpu freq.
Perhaps some throttling would have about the same affect as Ross patches
(which is what his original patches did, but not to the C1 disconnect or
the HLT instruction. Could it be that some kernel code isn't well
adapted to the 100 -> 1000 change?)
Anyways, that my 0.2 eur
> Does anyone have some input on how to tackle this problem? The only things
> I can come up with is mailing all the motherboard manufacturers I can
> think of, harass NVIDIA and/or AMD some more through proper channels (i.e.
> file a "bug report", but I don't expect much from this, sorry Allen) or
> buy Len the cheapest broken nforce2 board I can find at pricewatch.com and
> have it shipped to his house :)
Heh, that would be fun if he's willing to do the work/research =).
PS. CC, since i'm not on this list.
--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net
On Tue, 2004-04-27 at 13:02, Arjen Verweij wrote:
> After having his box run with cat /dev/hda > /dev/null for a night
> straight no lockup has occured. The brand of his motherboard is Shuttle.
My shuttle is a FN41 board in a SN41G2 system.
I found "rev 1.0" BIOS (FN41S00X of 12/18/2002) on Shuttle's ftp site
and downgraded to that, but still no hang.
It may be this board never hangs no matter what,
or perhaps C1 disconnect was simply disabled in that BIOS
b/c there was no option for it in Advanced Chipset Features
like there is for the most recent BIOS.
Other things about my board.
I run "optimized defaults", I don't overclock anything.
Processor is an AMD XP 2200+
Does anybody else see the hang with this processor model?
I wonder if the hang is processor model or speed dependent?
> Does anyone have some input on how to tackle this problem?
Unfortunately I don't have tools for debugging nvidia + amd hardware.
I would expect that those companies do, however. So encouraging them
to reproduce the hang internally may be the best way to go.
> buy Len the cheapest broken nforce2 board I can find at pricewatch.com and
> have it shipped to his house :)
I got tangled in this b/c this board (actually, the reference BIOS for
this chipset) had some unusual ACPI related failures. If the failures
turn out to be related to ACPI, I'll do what I can to help. But I
expect that hardware debugging tools may be necessary before the
hang issue is completely explained and solved.
-Len
Len,
I don't think that the CPU model has much to do with anything, it is
pretty much chipset related. My last remark about buying you a mobo for
debugging purposes was a vain attempt at humor.
I'm just surprised of the lack of support the vendors and NVIDIA/AMD are
giving. I realise that Linux may be only a marginal part of the market for
those companies so it is not commercially justifiable to invest a lot of
time in this.
We all appreciate whatever input you may have, because a solution without
indepth knowledge of how ACPI/APIC code handles stuff is probably needed
to tackle this issue.
All I can do is gather info, and I'm currently thinking of a plan to get
the info we need.
Regards,
Arjen
On 27 Apr 2004, Len Brown wrote:
> On Tue, 2004-04-27 at 13:02, Arjen Verweij wrote:
>
> > After having his box run with cat /dev/hda > /dev/null for a night
> > straight no lockup has occured. The brand of his motherboard is Shuttle.
>
> My shuttle is a FN41 board in a SN41G2 system.
>
> I found "rev 1.0" BIOS (FN41S00X of 12/18/2002) on Shuttle's ftp site
> and downgraded to that, but still no hang.
>
> It may be this board never hangs no matter what,
> or perhaps C1 disconnect was simply disabled in that BIOS
> b/c there was no option for it in Advanced Chipset Features
> like there is for the most recent BIOS.
>
> Other things about my board.
> I run "optimized defaults", I don't overclock anything.
> Processor is an AMD XP 2200+
> Does anybody else see the hang with this processor model?
> I wonder if the hang is processor model or speed dependent?
>
> > Does anyone have some input on how to tackle this problem?
>
> Unfortunately I don't have tools for debugging nvidia + amd hardware.
> I would expect that those companies do, however. So encouraging them
> to reproduce the hang internally may be the best way to go.
>
> > buy Len the cheapest broken nforce2 board I can find at pricewatch.com and
> > have it shipped to his house :)
>
> I got tangled in this b/c this board (actually, the reference BIOS for
> this chipset) had some unusual ACPI related failures. If the failures
> turn out to be related to ACPI, I'll do what I can to help. But I
> expect that hardware debugging tools may be necessary before the
> hang issue is completely explained and solved.
>
> -Len
>
>
>
On Tue, 2004-04-27 at 21:00, Len Brown wrote:
> I run "optimized defaults", I don't overclock anything.
> Processor is an AMD XP 2200+
> Does anybody else see the hang with this processor model?
> I wonder if the hang is processor model or speed dependent?
Have people run memtest86 over weekend without errors?
I found nForce2 (newer ones, rev2? A7N8X rev 2 and K7N2 Delta) to be
veery picky to DDR400 RAMs. I was able to find 2 properly working memory
modules out of 6. Also tested with several different brands. However
A7N8X rev 1 runs fine without need for carefully picking memory modules.
Memory test may need to be run over 48 hours to detect errors. But the
time required may be lower when running Linux kernel.
--
Jussi Laako <[email protected]>
Hi all,
I have just made soem interesting experience. It seems Len's timer
routing patch (or whatever you wanna call it) stabilizes my system to a
certain amount or NOT using AGP stabilizes it to an amount...
The whole story: I am using Ross' C1halt patch to make the system stable
in APIC mode, but due to a recent change I borked my kernel parameters
and just had idle=halt instead of idle=C1halt as parameter, thus I had
not activated Ross patch by accident. Nevertheless, the system survived
a whole day! Usually it locks up within minutes, but this time no. I
even did yome heavy copying from DVD to HD and from one HD to another
with peaks of about 40mb/s. Finally the system crashed when I recorded
from dvb to hd (but only after 20minutes). Then after a reboot (still
NOT using Ross' patch) it survived dvb recording for about 30min.
I only manage to instantly lock the system when doing a hdparm (rather a
second hdparm, the first one gives just about 20mb/sec, hello Jeff? What
is libata doing here?) which goes up to >60mb/sec.
So Len, maybe try using a faster hd to crash your shuttle if it is one
of the borked bioses...
As I used the open source NV driver all the time, AGP probably wasn't in
use (or someone tell me how to use AGP with nv driver...), as ususally
without Ross' patches using AGP I get fast lock-ups or as stated above
Len's patch makes it a bit better. In fact I would need to try Len's
patch and AGP on (with nvidia binary) to find out whether agp or Len's
patch makes the difference. But currently I am too tired and not in the
mood to further patch current mm-kernel to get Nvidia's binary running
again...
Does anybody know a tool to generate certain amount of traffic on PCI
bus? So I could test at which point the system wants to lock-up now.
Only idea I have right now is to put an older hd into the system an test.
bye,
Prakash
Prakash K. Cheemplavam wrote:
> Hi all,
>
> I have just made soem interesting experience. It seems Len's timer
> routing patch (or whatever you wanna call it) stabilizes my system to a
> certain amount or NOT using AGP stabilizes it to an amount...
[snip]
Btw, I found another possible reason for this behaviour, which would fit
into the idle temp problem I am experiencing again with 2.6.6-rc2-mm1
kernel (unless it seems I use Ross C1halt idle patch): Perhaps this
kernel uses the disconnect feature less often, so the probality of
lock-up goes down. That would explain my higher temps...
Prakash
On Wednesday 28 April 2004 04:00, Len Brown wrote:
> On Tue, 2004-04-27 at 13:02, Arjen Verweij wrote:
>
> > After having his box run with cat /dev/hda > /dev/null for a night
> > straight no lockup has occured. The brand of his motherboard is Shuttle.
>
> My shuttle is a FN41 board in a SN41G2 system.
>
I have had 3 Albatron KM18G pro and one Epox8rga+.
> I found "rev 1.0" BIOS (FN41S00X of 12/18/2002) on Shuttle's ftp site
> and downgraded to that, but still no hang.
My Albatrons hang with bios R1.01, R1.01a, R1.04 which is latest, probably also
hang with earlier bios but have not tried. I have emailed Albatron in last couple
of weeks re Allen's comments on Nvidia reference bios and about lockups but
have had no response as yet.
My Epox hangs but does not have latest bios - don't have floppy hooked up in
that box to flash it to latest bios as yet.
>
> It may be this board never hangs no matter what,
> or perhaps C1 disconnect was simply disabled in that BIOS
> b/c there was no option for it in Advanced Chipset Features
> like there is for the most recent BIOS.
Maybe other MOBO manufacturers skimp on filter caps and regulator damping
ability and a resonance occurs in the on-board supply rails? Do Shuttle make
any claims to using an improved on board regulator? Or Shuttle may have
always programmed more time in C1 cycle handshakes if such is
configurable?
>
> Other things about my board.
> I run "optimized defaults", I don't overclock anything.
> Processor is an AMD XP 2200+
> Does anybody else see the hang with this processor model?
> I wonder if the hang is processor model or speed dependent?
I have tried XP2200, XP2400, XP2500, I know I get lockups with both t'bred
and barton cores. Epox mobo has been tried with both Aopen H-500A and
Elanvital full size case and power supplies. My albatron are all in Aopen m-atx
H-400A cases.
>
> > Does anyone have some input on how to tackle this problem?
>
> Unfortunately I don't have tools for debugging nvidia + amd hardware.
> I would expect that those companies do, however. So encouraging them
> to reproduce the hang internally may be the best way to go.
Ditto I figured out early on it could do with emulator or bond out cpu/chipset
and tried to draw in Nvidia and AMD starting in December last year.
http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/2549.html
It was cc'd to Mr Allen Martin of Nvidia as were other emails on topic.
His reply was "\0" so I assumed he was on a long holiday or no longer worked
there. It has been good to hear from him on the topic some 4 months later.
Don't scare him off! -we appear to be making some progress.
I also spoke to Mr Michael Apthorpe of AMD in Australia in December and
forwarded the support request email who replied "Thanks Ross I will forwards
it on and see what comes back." But nothing has to date.
In January I spotted Mr Richard Brunner of AMD had previously corresponded
with the LKML so I emailed him and he was interested at the time but said
whilst he could not promise anything he would forward my query to the hardware
certification labs. And guess what - he was right to promise nothing as I have
received "\0" to date.
I followed up with the AMD guys in February this year but again received "\0".
>
> > buy Len the cheapest broken nforce2 board I can find at pricewatch.com and
> > have it shipped to his house :)
>
> I got tangled in this b/c this board (actually, the reference BIOS for
> this chipset) had some unusual ACPI related failures. If the failures
> turn out to be related to ACPI, I'll do what I can to help. But I
> expect that hardware debugging tools may be necessary before the
> hang issue is completely explained and solved.
I have had good (100%) success in reproducing the fault with the Albatron
KM18G pro MOBO. I needed m-atx form factor and distributor was local to me.
Makes very nice - cheap and stable system but only with the lockup workaround.
I also recollect that Windows had lockups with nforce2 for a while depending
whether you ran the Nvidia or Microsoft driver.
http://lkml.org/lkml/2003/12/13/5
Anybody got the inside running on that one and what was different between the
two drivers?
Regards
Ross.
>
> -Len
>
>
>
>
>
On Wed, Apr 28, 2004 at 09:33:34PM +1000, Ross Dickson wrote:
> >
> > It may be this board never hangs no matter what,
> > or perhaps C1 disconnect was simply disabled in that BIOS
> > b/c there was no option for it in Advanced Chipset Features
> > like there is for the most recent BIOS.
>
> Maybe other MOBO manufacturers skimp on filter caps and regulator damping
> ability and a resonance occurs in the on-board supply rails? Do Shuttle make
> any claims to using an improved on board regulator? Or Shuttle may have
> always programmed more time in C1 cycle handshakes if such is
> configurable?
Do you really think so? I think there may be a resonance occuring, even with
this new BIOS. I plugged in new headphones into my nforce2 onboard sound, and
get a high pitched noise. Now here is where it gets weird: This noise does
not occur on boot until sometime after the IDE driver is loaded. I also
believe it varies under a high load. If you disable C1 disconnect, it's gone.
Also I've heard a high pitched noise at certain times coming right from the
copmuter (very faint, but I do have very good hearing, I can even hear a hush
sounding from my router. my brother was quite astonished when I pointed that
out) I try to distinguish whats doing it. It could be the hard drive. But
when I found the other sound in the head phones, I found that the sound varies
almost in unison with the sound coming from the computer. Maybe the IDE or
hard drive is related, but it is too much related to C1 disconnect.
Whether it is really possible that my board can really generate this sound, I
don't know. Though, I have once determined that resonance was occuring in an
old system, causing unstable CPU operation. It wasn't that I heard a sound
coming from it =). But what I thought was the case was causing it, and pulled
it out of the case. I ran it on the table and found it to be stable. That
was the only thing wrong. I've also studied resonance before a bit. I know
resonance can break systems. But to think that my board is doing emmitting
noise like that is pretty bizarre.
It may be true that this Shuttle board may have resonance problems. So that
would indicate that they did something much like you describe by changing the
C1 handshake time? Isn't that much like what your patch does?
>
> > hang issue is completely explained and solved.
>
> I have had good (100%) success in reproducing the fault with the Albatron
> KM18G pro MOBO. I needed m-atx form factor and distributor was local to me.
> Makes very nice - cheap and stable system but only with the lockup workaround.
>
> I also recollect that Windows had lockups with nforce2 for a while depending
> whether you ran the Nvidia or Microsoft driver.
> http://lkml.org/lkml/2003/12/13/5
> Anybody got the inside running on that one and what was different between the
> two drivers?
>
Yeah, unfortunately, I didn't save a link to the message board that I found
that on. But the issue is pretty common. I'm sure more info can be found on i
the windows side.
Jesse
On Thursday 29 April 2004 06:59, Jesse Allen wrote:
> On Wed, Apr 28, 2004 at 09:33:34PM +1000, Ross Dickson wrote:
> > >
> > > It may be this board never hangs no matter what,
> > > or perhaps C1 disconnect was simply disabled in that BIOS
> > > b/c there was no option for it in Advanced Chipset Features
> > > like there is for the most recent BIOS.
> >
> > Maybe other MOBO manufacturers skimp on filter caps and regulator damping
> > ability and a resonance occurs in the on-board supply rails? Do Shuttle make
> > any claims to using an improved on board regulator? Or Shuttle may have
> > always programmed more time in C1 cycle handshakes if such is
> > configurable?
>
> Do you really think so? I think there may be a resonance occuring, even with
> this new BIOS. I plugged in new headphones into my nforce2 onboard sound, and
> get a high pitched noise. Now here is where it gets weird: This noise does
> not occur on boot until sometime after the IDE driver is loaded. I also
> believe it varies under a high load. If you disable C1 disconnect, it's gone.
> Also I've heard a high pitched noise at certain times coming right from the
> copmuter (very faint, but I do have very good hearing, I can even hear a hush
> sounding from my router. my brother was quite astonished when I pointed that
> out) I try to distinguish whats doing it. It could be the hard drive. But
> when I found the other sound in the head phones, I found that the sound varies
> almost in unison with the sound coming from the computer. Maybe the IDE or
> hard drive is related, but it is too much related to C1 disconnect.
I think I might break out my oscilloscope this weekend and have a look at how
clean the supply rails are around the cpu and northbridge and southbridge.
Who knows I might get lucky and see some unexpected ripple or spikes.
>
> Whether it is really possible that my board can really generate this sound, I
> don't know. Though, I have once determined that resonance was occuring in an
> old system, causing unstable CPU operation. It wasn't that I heard a sound
> coming from it =). But what I thought was the case was causing it, and pulled
> it out of the case. I ran it on the table and found it to be stable. That
> was the only thing wrong. I've also studied resonance before a bit. I know
> resonance can break systems. But to think that my board is doing emmitting
> noise like that is pretty bizarre.
Not as bizarre as you may think. I have heard coils and even capacitors "sing"
in years past whilst servicing electronics.
>
> It may be true that this Shuttle board may have resonance problems. So that
> would indicate that they did something much like you describe by changing the
> C1 handshake time? Isn't that much like what your patch does?
I had not really thought about it from that perspective. Whilst my patch cannot
alter the handshake times it does prevent consecutive C1 cycles from occurring
too close together. Too close together I think being less than about 800ns. I
guess I could look at that with a cro too - use an appropriate pin as the trigger
source and see if supply rails have load dump voltage rises when going into
disconnect. Maybe rail voltage rings for about 700ns and might be out of
tolerence inside Athlon during that time. Would be very interesting if a
few hundred picofarad of low esr decoupling cap placed on a supply rail near a
chip makes a difference? A pinout of the nforce2 chipset would help a great deal
here but I do not have one. Can anyone oblige me?
>
>
> >
> > > hang issue is completely explained and solved.
> >
> > I have had good (100%) success in reproducing the fault with the Albatron
> > KM18G pro MOBO. I needed m-atx form factor and distributor was local to me.
> > Makes very nice - cheap and stable system but only with the lockup workaround.
> >
> > I also recollect that Windows had lockups with nforce2 for a while depending
> > whether you ran the Nvidia or Microsoft driver.
> > http://lkml.org/lkml/2003/12/13/5
> > Anybody got the inside running on that one and what was different between the
> > two drivers?
> >
>
> Yeah, unfortunately, I didn't save a link to the message board that I found
> that on. But the issue is pretty common. I'm sure more info can be found on i
> the windows side.
No tech info but this link shows user had Lockups with Nvidia's ide driver but
OK with MS one.
http://club.cdfreaks.com/showthread/t-91381.html
-Ross
>
> Jesse
>
>
>
>
On Thu, 29 Apr 2004, Ross Dickson wrote:
> > Do you really think so? I think there may be a resonance occuring, even with
> > this new BIOS. I plugged in new headphones into my nforce2 onboard sound, and
> > get a high pitched noise. Now here is where it gets weird: This noise does
> > not occur on boot until sometime after the IDE driver is loaded. I also
> > believe it varies under a high load. If you disable C1 disconnect, it's gone.
> > Also I've heard a high pitched noise at certain times coming right from the
> > copmuter (very faint, but I do have very good hearing, I can even hear a hush
> > sounding from my router. my brother was quite astonished when I pointed that
> > out) I try to distinguish whats doing it. It could be the hard drive. But
> > when I found the other sound in the head phones, I found that the sound varies
> > almost in unison with the sound coming from the computer. Maybe the IDE or
> > hard drive is related, but it is too much related to C1 disconnect.
>
> I think I might break out my oscilloscope this weekend and have a look at how
> clean the supply rails are around the cpu and northbridge and southbridge.
> Who knows I might get lucky and see some unexpected ripple or spikes.
Not necessarily related to the PSU, but the noise may actually be the
reason of spurious timer interrupts.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +
Ross Dickson wrote:
> Not as bizarre as you may think. I have heard coils and even capacitors "sing"
> in years past whilst servicing electronics.
See the thread "Increasing HZ (patch for HZ > 1000)" for something
along these lines. The change of HZ from 100 to 1000 causes some
notebooks to make a noise.
(Mine makes a noise with both, though).
-- Jamie
Maciej W. Rozycki wrote:
> Not necessarily related to the PSU, but the noise may actually be the
> reason of spurious timer interrupts.
With most device interrupts, additional spurious ones don't cause any
malfunction because the driver's handler checks whether the device
actually has a condition pending.
This is the basis of shared interrupts, of course.
Is there any way we can check the timer itself to see whether an
interrupt was caused by it, so that spurious timer interrupts are ignored?
-- Jamie
On Thu, 2004-04-29 at 13:44, Ross Dickson wrote:
> On Thursday 29 April 2004 06:59, Jesse Allen wrote:
> > On Wed, Apr 28, 2004 at 09:33:34PM +1000, Ross Dickson wrote:
> > > >
> > > > It may be this board never hangs no matter what,
> > > > or perhaps C1 disconnect was simply disabled in that BIOS
> > > > b/c there was no option for it in Advanced Chipset Features
> > > > like there is for the most recent BIOS.
> > >
> > > Maybe other MOBO manufacturers skimp on filter caps and regulator damping
> > > ability and a resonance occurs in the on-board supply rails? Do Shuttle make
> > > any claims to using an improved on board regulator? Or Shuttle may have
> > > always programmed more time in C1 cycle handshakes if such is
> > > configurable?
> >
> > Do you really think so? I think there may be a resonance occuring, even with
> > this new BIOS. I plugged in new headphones into my nforce2 onboard sound, and
> > get a high pitched noise. Now here is where it gets weird: This noise does
> > not occur on boot until sometime after the IDE driver is loaded. I also
> > believe it varies under a high load. If you disable C1 disconnect, it's gone.
> > Also I've heard a high pitched noise at certain times coming right from the
> > copmuter (very faint, but I do have very good hearing, I can even hear a hush
> > sounding from my router. my brother was quite astonished when I pointed that
> > out) I try to distinguish whats doing it. It could be the hard drive. But
> > when I found the other sound in the head phones, I found that the sound varies
> > almost in unison with the sound coming from the computer. Maybe the IDE or
> > hard drive is related, but it is too much related to C1 disconnect.
>
> I think I might break out my oscilloscope this weekend and have a look at how
> clean the supply rails are around the cpu and northbridge and southbridge.
> Who knows I might get lucky and see some unexpected ripple or spikes.
>
> >
> > Whether it is really possible that my board can really generate this sound, I
> > don't know. Though, I have once determined that resonance was occuring in an
> > old system, causing unstable CPU operation. It wasn't that I heard a sound
> > coming from it =). But what I thought was the case was causing it, and pulled
> > it out of the case. I ran it on the table and found it to be stable. That
> > was the only thing wrong. I've also studied resonance before a bit. I know
> > resonance can break systems. But to think that my board is doing emmitting
> > noise like that is pretty bizarre.
>
> Not as bizarre as you may think. I have heard coils and even capacitors "sing"
> in years past whilst servicing electronics.
>
> >
> > It may be true that this Shuttle board may have resonance problems. So that
> > would indicate that they did something much like you describe by changing the
> > C1 handshake time? Isn't that much like what your patch does?
>
> I had not really thought about it from that perspective. Whilst my patch cannot
> alter the handshake times it does prevent consecutive C1 cycles from occurring
> too close together. Too close together I think being less than about 800ns. I
> guess I could look at that with a cro too - use an appropriate pin as the trigger
> source and see if supply rails have load dump voltage rises when going into
> disconnect. Maybe rail voltage rings for about 700ns and might be out of
> tolerence inside Athlon during that time. Would be very interesting if a
> few hundred picofarad of low esr decoupling cap placed on a supply rail near a
> chip makes a difference? A pinout of the nforce2 chipset would help a great deal
> here but I do not have one. Can anyone oblige me?
>
> >
> >
> > >
> > > > hang issue is completely explained and solved.
> > >
> > > I have had good (100%) success in reproducing the fault with the Albatron
> > > KM18G pro MOBO. I needed m-atx form factor and distributor was local to me.
> > > Makes very nice - cheap and stable system but only with the lockup workaround.
> > >
> > > I also recollect that Windows had lockups with nforce2 for a while depending
> > > whether you ran the Nvidia or Microsoft driver.
> > > http://lkml.org/lkml/2003/12/13/5
> > > Anybody got the inside running on that one and what was different between the
> > > two drivers?
> > >
> >
> > Yeah, unfortunately, I didn't save a link to the message board that I found
> > that on. But the issue is pretty common. I'm sure more info can be found on i
> > the windows side.
>
> No tech info but this link shows user had Lockups with Nvidia's ide driver but
> OK with MS one.
> http://club.cdfreaks.com/showthread/t-91381.html
>
> -
This has become a rather interesting problem to watch from afar. The
Athlon here seems to have no issues with the NForce driver under Windows
(I dont burn a lot of DVDs on it tho). Whenever its in Linux, its mainly
a testing machine these days.
It will be interesting to see if theres a real hardware problem and then
if it can be worked around in software (cant image a single product
recall happening).
On Thu, 29 Apr 2004, Jamie Lokier wrote:
> > Not necessarily related to the PSU, but the noise may actually be the
> > reason of spurious timer interrupts.
>
> With most device interrupts, additional spurious ones don't cause any
> malfunction because the driver's handler checks whether the device
> actually has a condition pending.
Note the 8254 timer uses edge-triggered interrupts and is just a square
wave signal. There's no acking to deassert the interrupt -- it goes away
spontaneously after a predefined time.
> This is the basis of shared interrupts, of course.
Yep, but the timer is non-shareable by definition.
> Is there any way we can check the timer itself to see whether an
> interrupt was caused by it, so that spurious timer interrupts are ignored?
This may be possible, but complicated and likely unreliable -- an I/O
APIC may deliver a spurious interrupt at the time a real one would be
probable and you can't check if a period between two consecutive timer
interrupts is appropriate without an additional time reference, which may
be unavailable (like the TSC).
Note the timer is special -- we don't really do any device handling, but
we want to get periodic interrupts at the right times to have a time
reference. Coalescing interrupts or discarding spurious ones, which is
normal and acceptable for regular devices, doesn't work here.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +
On Thu, Apr 29, 2004 at 09:44:37PM +1000, Ross Dickson wrote:
> On Thursday 29 April 2004 06:59, Jesse Allen wrote:
> > almost in unison with the sound coming from the computer. Maybe the IDE or
> > hard drive is related, but it is too much related to C1 disconnect.
>
> I think I might break out my oscilloscope this weekend and have a look at how
> clean the supply rails are around the cpu and northbridge and southbridge.
> Who knows I might get lucky and see some unexpected ripple or spikes.
I'd be interested in knowing the results.
> > resonance can break systems. But to think that my board is doing emmitting
> > noise like that is pretty bizarre.
>
> Not as bizarre as you may think. I have heard coils and even capacitors "sing"
> in years past whilst servicing electronics.
Yes, I know that these things can theorectically happen. But when it happens
to me, it's a suprise. To an electronics genius, he probably encounters it
more often. =)
> > C1 handshake time? Isn't that much like what your patch does?
>
> I had not really thought about it from that perspective. Whilst my patch cannot
> alter the handshake times it does prevent consecutive C1 cycles from occurring
> too close together. Too close together I think being less than about 800ns. I
ah, ok.
> guess I could look at that with a cro too - use an appropriate pin as the
> trigger source and see if supply rails have load dump voltage rises when
> going into disconnect. Maybe rail voltage rings for about 700ns and might be
> out of tolerence inside Athlon during that time. Would be very interesting if
> a few hundred picofarad of low esr decoupling cap placed on a supply rail
> near a chip makes a difference? A pinout of the nforce2 chipset would help a
> great deal here but I do not have one. Can anyone oblige me?
What I'd like to know is where the sound chip is really at on my board. I've
tried looking before, but find myself confused.
A pic:
http://us.shuttle.com/images/productimages/AN35.jpg
According to a diagram that I have, it points to an AC'97 6-CH AUDIO as a chip
near of the top of the board in the image that I link to, above 2nd PCI slot
left of the AGP. But I'm am also left thinking, how does the NForce2 MCP come
into play. Specs would help. Maybe if we can figure out how the sound is
wired on the board, we could also trace the source of noise to the exact
component.
Jesse
Jesse Allen wrote:
> What I'd like to know is where the sound chip is really at on my board. I've
> tried looking before, but find myself confused.
>
> A pic:
> http://us.shuttle.com/images/productimages/AN35.jpg
>
> According to a diagram that I have, it points to an AC'97 6-CH AUDIO as a chip
> near of the top of the board in the image that I link to, above 2nd PCI slot
> left of the AGP. But I'm am also left thinking, how does the NForce2 MCP come
> into play. Specs would help. Maybe if we can figure out how the sound is
> wired on the board, we could also trace the source of noise to the exact
> component.
Yes, I also think the chip above 2nd PCI slot is the right one. You can
see the realtek logo. It is only a ac97 codec (basically not more than a
DAC and ADC) and linux currently only has drivers for this. The MCP-T
has an APU, which could do dsp stuff by hardware, but no drivers still
(Hello Nvidia?), so all of this is done via software. (THe APU has even
more functionality, like DD5.1 realtime encoding, fx, and whatever). In
our case, the APU shouldn't cause any troubles, as it is not used. With
the APU, nforce2 chipset behaves like a "real" soundcard. Without, its
sound abilities are not better than the average mainboard's onboard sound.
Prakash