Subject: Crash on reading the whole PCI config of PIIX4 SMBus

Hi there,

at boot time my system (Wincor/Nixdorf Beetle D1) sometimes crashs while loading the i2c-piix4 driver.
I found out that I can always trigger the crash as root, which one of those:

# hexdump -C /proc/bus/pci/00/07.3
# hexdump -C /sys/bus/pci/devices/0000:00:07.3/config
# lspci -s 07.3 -xxx

While initialization the i2c-piix4 driver does two reads to the config space, at 0xd2 and 0xd6,
in a relative short time. That sometimes triggers the crash,
but isn't that precise like one of those commands.

While my investigations I put a printk() between those two reads and had no more crashs
on module loading. I tested that with a script, doing insmod/rmmod 100 times in a row.

But printk() can't be the solution, so I tried msleep(1) and udelay(250),
but with each of these my system crashed.
The time for the read and one printk() takes ~100 us on my machine,
so both time values should be more than enough, if time would have been the reason.

Does someone have an idea what the driver should do between those two reads, to avoid crashing?
Can somebody with the same device trigger this crash too (greped LKML 2001-2008, found nothing)?
I have another box with this device and I'll test this in 4 hours.
Or is just my Hardware broken and I should blacklist the module?

I used 2.6.26-2 (deb/lenny) and 2.6.31 (vanilla) whith the same results.

Btw, I also tried:

$ for i in `seq 300`; do sensors; done

which brought the same machine down (only 2.6.31) with an:

do_IRQ: 0.66 No irq handler for vector (irq -1)


#lspci -s 07.3 -vvvn

00:07.3 0680: 8086:7113 (rev 02)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin ? routed to IRQ 9
Kernel driver in use: piix4_smbus
Kernel modules: i2c-piix4


2009-09-22 17:04:18

by Jean Delvare

[permalink] [raw]
Subject: Re: Crash on reading the whole PCI config of PIIX4 SMBus

On Tue, 22 Sep 2009 17:46:10 +0200, Henrik Kretzschmar wrote:
> Hi there,
>
> at boot time my system (Wincor/Nixdorf Beetle D1) sometimes crashs while loading the i2c-piix4 driver.
> I found out that I can always trigger the crash as root, which one of those:
>
> # hexdump -C /proc/bus/pci/00/07.3
> # hexdump -C /sys/bus/pci/devices/0000:00:07.3/config
> # lspci -s 07.3 -xxx

Does this crash happens even is i2c-piix4 hasn't been ever loaded since
the last cold boot of the machine?

>
> While initialization the i2c-piix4 driver does two reads to the config space, at 0xd2 and 0xd6,
> in a relative short time. That sometimes triggers the crash,
> but isn't that precise like one of those commands.
>
> While my investigations I put a printk() between those two reads and had no more crashs
> on module loading. I tested that with a script, doing insmod/rmmod 100 times in a row.
>
> But printk() can't be the solution, so I tried msleep(1) and udelay(250),
> but with each of these my system crashed.
> The time for the read and one printk() takes ~100 us on my machine,
> so both time values should be more than enough, if time would have been the reason.
>
> Does someone have an idea what the driver should do between those two reads, to avoid crashing?

I doubt we can do anything reliable in the i2c-piix4 driver itself, as
this seems to be a rare hardware issue.

> Can somebody with the same device trigger this crash too (greped LKML 2001-2008, found nothing)?

You're rather search the lm-sensors list archives. I don't remember
anything like this, but OTOH the hardware must be so old that I doubt
many people are still using it.

> I have another box with this device and I'll test this in 4 hours.
> Or is just my Hardware broken and I should blacklist the module?

That would be my bet, yes.

>
> I used 2.6.26-2 (deb/lenny) and 2.6.31 (vanilla) whith the same results.
>

If there _any_ kernel that did not exhibit the problem?

> Btw, I also tried:
>
> $ for i in `seq 300`; do sensors; done
>
> which brought the same machine down (only 2.6.31) with an:
>
> do_IRQ: 0.66 No irq handler for vector (irq -1)
>
>
> #lspci -s 07.3 -vvvn
>
> 00:07.3 0680: 8086:7113 (rev 02)

That's a pretty old chip... I have one in a 1997 machine. Presumably
you can easily find a replacement machine, ten times more powerful, for
cheap. It might be a more constructive approach that trying to keep the
old thing running.

> Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Interrupt: pin ? routed to IRQ 9

This "?" looks rather suspicious. Smells like an IRQ routing issue?

> Kernel driver in use: piix4_smbus
> Kernel modules: i2c-piix4

--
Jean Delvare

2009-09-22 18:07:57

by Jean Delvare

[permalink] [raw]
Subject: Re: Crash on reading the whole PCI config of PIIX4 SMBus

On Tue, 22 Sep 2009 17:46:10 +0200, Henrik Kretzschmar wrote:
> Hi there,
>
> at boot time my system (Wincor/Nixdorf Beetle D1) sometimes crashs while loading the i2c-piix4 driver.
> I found out that I can always trigger the crash as root, which one of those:
>
> # hexdump -C /proc/bus/pci/00/07.3
> # hexdump -C /sys/bus/pci/devices/0000:00:07.3/config
> # lspci -s 07.3 -xxx
>
> While initialization the i2c-piix4 driver does two reads to the config space, at 0xd2 and 0xd6,
> in a relative short time. That sometimes triggers the crash,
> but isn't that precise like one of those commands.
>
> While my investigations I put a printk() between those two reads and had no more crashs
> on module loading. I tested that with a script, doing insmod/rmmod 100 times in a row.
>
> But printk() can't be the solution, so I tried msleep(1) and udelay(250),
> but with each of these my system crashed.
> The time for the read and one printk() takes ~100 us on my machine,
> so both time values should be more than enough, if time would have been the reason.
>
> Does someone have an idea what the driver should do between those two reads, to avoid crashing?
> Can somebody with the same device trigger this crash too (greped LKML 2001-2008, found nothing)?

And to answer this question: with the same device (except that it's at
01.3 instead of 07.3 here), none of the 3 above commands crash my
system (kernel 2.6.29.)

--
Jean Delvare

2009-09-22 23:19:26

by Wolfram Sang

[permalink] [raw]
Subject: Re: Crash on reading the whole PCI config of PIIX4 SMBus

On Tue, Sep 22, 2009 at 05:46:10PM +0200, Henrik Kretzschmar wrote:

> at boot time my system (Wincor/Nixdorf Beetle D1) sometimes crashs while loading the i2c-piix4 driver.

Have you checked if you have the latest BIOS for this machine? Sometimes
PCI-problems got fixed there...

Regards,

Wolfram

--
Pengutronix e.K. | Wolfram Sang |
Industrial Linux Solutions | http://www.pengutronix.de/ |


Attachments:
(No filename) (457.00 B)
signature.asc (197.00 B)
Digital signature
Download all attachments
Subject: Re: Crash on reading the whole PCI config of PIIX4 SMBus

Wolfram Sang schrieb:
> On Tue, Sep 22, 2009 at 05:46:10PM +0200, Henrik Kretzschmar wrote:
>
>
>> at boot time my system (Wincor/Nixdorf Beetle D1) sometimes crashs while loading the i2c-piix4 driver.
>>
>
> Have you checked if you have the latest BIOS for this machine? Sometimes
> PCI-problems got fixed there...
>
>
Thanks for you fast replies,

the latest BIOS (05/08) is already installed, so no fixing can be expected that way


I tested my killer-commands at the other box @home with no crashes.

The commands _work_ with a coldstarted Linux and i2c-piix not loaded,
so the only thing I can do is blacklisting it and renounce sensors support,
having a good a argument for a new hardware aquisition. :)

For completness the message with the SMBus revision, which is not the pci-revision, is:

SMBus Host Controller at 0x1040, revision 0


The "?" at the interrupt line of lspci is a little strange:

>> Interrupt: pin ? routed to IRQ 9

looking at lspci -x -s 7.3 I can see that PCI_INTERRUPT_PIN == 0

I cant see that i2p-piix does any interrupts, refering /proc/irq/9/spurious

The driver says:

Using Interrupt SMI# for SMBus

I'm not sure, but my box @home has the same lspci output with the "?".

So long, thanks for help.


2009-09-23 13:35:33

by Jean Delvare

[permalink] [raw]
Subject: Re: Crash on reading the whole PCI config of PIIX4 SMBus

On Wed, 23 Sep 2009 14:59:08 +0200, Henrik Kretzschmar wrote:
> Wolfram Sang schrieb:
> > On Tue, Sep 22, 2009 at 05:46:10PM +0200, Henrik Kretzschmar wrote:
> >
> >
> >> at boot time my system (Wincor/Nixdorf Beetle D1) sometimes crashs while loading the i2c-piix4 driver.
> >>
> >
> > Have you checked if you have the latest BIOS for this machine? Sometimes
> > PCI-problems got fixed there...
> >
> >
> Thanks for you fast replies,
>
> the latest BIOS (05/08) is already installed, so no fixing can be expected that way
>
>
> I tested my killer-commands at the other box @home with no crashes.
>
> The commands _work_ with a coldstarted Linux and i2c-piix not loaded,
> so the only thing I can do is blacklisting it and renounce sensors support,
> having a good a argument for a new hardware aquisition. :)

That's really odd, considering that the i2c-piix4 driver doesn't change
the PCI device configuration, it only reads from it.

If you trigger some transactions (for example by running "sensors -s"
at boot time?) then you also write to the I/O ports. But this hardly
explains how subsequently reading the PCI config space would crash.

You might still want to check if maybe ACPI is interfering with the
i2c-piix4 driver. This isn't the kind of result I'd expect, but who
knows.

>
> For completness the message with the SMBus revision, which is not the pci-revision, is:
>
> SMBus Host Controller at 0x1040, revision 0

Revision 0 here as well, I suspect there never was any other revision
of this chip.

>
>
> The "?" at the interrupt line of lspci is a little strange:
>
> >> Interrupt: pin ? routed to IRQ 9
>
> looking at lspci -x -s 7.3 I can see that PCI_INTERRUPT_PIN == 0
>
> I cant see that i2p-piix does any interrupts, refering /proc/irq/9/spurious
>
> The driver says:
>
> Using Interrupt SMI# for SMBus

This is only a debug message, saying how the interrupt line is
configured, but you are right that the driver doesn't use interrupts
(much to our shame.)

>
> I'm not sure, but my box @home has the same lspci output with the "?".
>
> So long, thanks for help.

--
Jean Delvare
http://khali.linux-fr.org/wishlist.html

Subject: Re: Crash on reading the whole PCI config of PIIX4 SMBus

Jean Delvare schrieb:

> On Wed, 23 Sep 2009 14:59:08 +0200, Henrik Kretzschmar wrote:
>
>> The commands _work_ with a coldstarted Linux and i2c-piix not loaded,
>> so the only thing I can do is blacklisting it and renounce sensors support,
>> having a good a argument for a new hardware aquisition. :)
>>
>
> That's really odd, considering that the i2c-piix4 driver doesn't change
> the PCI device configuration, it only reads from it.
>
> If you trigger some transactions (for example by running "sensors -s"
> at boot time?) then you also write to the I/O ports. But this hardly
> explains how subsequently reading the PCI config space would crash.
>
> You might still want to check if maybe ACPI is interfering with the
> i2c-piix4 driver. This isn't the kind of result I'd expect, but who
> knows.
>
This machine doesnt even have ACPI. :) Just APM.

But reading the config space may be dangerous, refering the manpage of lspci:
"
-xxx Show hexadecimal dump of the whole PCI configuration space. It
is available only to root as several PCI devices crash when you
try to read some parts of the config space (this behavior proba-
bly doesnt violate the PCI standard, but its at least very
stupid). However, such devices are rare, so you neednt worry
much.
"

I seem to have stumbled over one of those stupidnesses.
That is the reason why non-root users are only allowed to
read the first 64 byte of the config space.

So its imho generally a good idea to run lspci -xxx on every machine you can
and save some time searching in the wrong places.

2009-09-23 14:15:18

by Jean Delvare

[permalink] [raw]
Subject: Re: Crash on reading the whole PCI config of PIIX4 SMBus

On Wed, 23 Sep 2009 16:11:45 +0200, Henrik Kretzschmar wrote:
> Jean Delvare schrieb:
> > You might still want to check if maybe ACPI is interfering with the
> > i2c-piix4 driver. This isn't the kind of result I'd expect, but who
> > knows.
> >
> This machine doesnt even have ACPI. :) Just APM.
>
> But reading the config space may be dangerous, refering the manpage of lspci:
> "
> -xxx Show hexadecimal dump of the whole PCI configuration space. It
> is available only to root as several PCI devices crash when you
> try to read some parts of the config space (this behavior proba-
> bly doesnt violate the PCI standard, but its at least very
> stupid). However, such devices are rare, so you neednt worry
> much.
> "
>
> I seem to have stumbled over one of those stupidnesses.
> That is the reason why non-root users are only allowed to
> read the first 64 byte of the config space.

That's right, but it doesn't explain why i2c-piix4 crashes in the first
place, not why merely loading it causes further lspci -xxx to crash
when they did not beforehand. I admit I am totally clueless.

>
> So its imho generally a good idea to run lspci -xxx on every machine you can
> and save some time searching in the wrong places.


--
Jean Delvare

Subject: Re: Crash on reading the whole PCI config of PIIX4 SMBus

Jean Delvare schrieb:

> That's right, but it doesn't explain why i2c-piix4 crashes in the first
> place, not why merely loading it causes further lspci -xxx to crash
> when they did not beforehand. I admit I am totally clueless.
>
Sorry, I expressed myself a bit unclear.

With _worked_ I meant the system crashed (thats what killer commands are for).

lspci -xxx (and co) bring this system down in every case, module loaded or not.
Obvious this crash occuress when reading the config space in short periods.

lspci (or better proc-fs and sys-fs) do that, and i2c-piix4 does it sometimes.

Looking at read() of drivers/pci/proc.c i had the idea of stalking the critical area with:

#!/bin/sh
for i in `seq 100`; do
dd if=/proc/bus/pci/00/07.3 of=/dev/null bs=1 count=n 2>/dev/null;
done


I got no crashes with n == 192, but with n == 193 theres no reaction from the system.

Maybe it's interesting, that (all the time after crashes) the screen
(in my case the console with a blinking cursor) can still be seen.
But no reaction on keyboard hits.

Also strange is, that the device works well IF those two read accesses have not done the crash.
I'll test tomorrow without the second read access, just to know if it works.




2009-09-23 16:30:29

by Jean Delvare

[permalink] [raw]
Subject: Re: Crash on reading the whole PCI config of PIIX4 SMBus

On Wed, 23 Sep 2009 17:49:22 +0200, Henrik Kretzschmar wrote:
> Jean Delvare schrieb:
>
> > That's right, but it doesn't explain why i2c-piix4 crashes in the first
> > place, not why merely loading it causes further lspci -xxx to crash
> > when they did not beforehand. I admit I am totally clueless.
> >
> Sorry, I expressed myself a bit unclear.
>
> With _worked_ I meant the system crashed (thats what killer commands are for).
>
> lspci -xxx (and co) bring this system down in every case, module loaded or not.
> Obvious this crash occuress when reading the config space in short periods.

Ah, OK, thanks for the clarification.

> lspci (or better proc-fs and sys-fs) do that, and i2c-piix4 does it sometimes.
>
> Looking at read() of drivers/pci/proc.c i had the idea of stalking the critical area with:
>
> #!/bin/sh
> for i in `seq 100`; do
> dd if=/proc/bus/pci/00/07.3 of=/dev/null bs=1 count=n 2>/dev/null;
> done
>
>
> I got no crashes with n == 192, but with n == 193 theres no reaction from the system.
>
> Maybe it's interesting, that (all the time after crashes) the screen
> (in my case the console with a blinking cursor) can still be seen.
> But no reaction on keyboard hits.

I think it is typical of IRQ routing gone out to lunch.

> Also strange is, that the device works well IF those two read accesses have not done the crash.
> I'll test tomorrow without the second read access, just to know if it works.

--
Jean Delvare