2006-03-31 08:25:59

by Jurgen Kramer

[permalink] [raw]
Subject: Non-Fatal Error PCI Express messages

With 2.6.16 (from FC5s 2.6.16-1.2080_FC5smp) I am getting a lot of

Mar 31 09:35:16 paragon kernel: Non-Fatal Error PCI Express B
Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
Mar 31 09:35:39 paragon kernel: Non-Fatal Error PCI Express B

messages which presumably come from

Mar 31 09:17:15 paragon kernel: MC: drivers/edac/edac_mc.c version
edac_mc Ver: 2.0.0 Mar 28 2006
Mar 31 09:17:15 paragon kernel: EDAC MC0: Giving out device to
"e752x_edac" E7525: PCI 0000:00:00.0

Is there really something broken here of just a noisy driver?

BTW this is on a Asus NCT-D mobo with Intel E7525 chipset.

Jurgen



2006-03-31 18:23:13

by Dave Peterson

[permalink] [raw]
Subject: Re: Non-Fatal Error PCI Express messages

On Friday 31 March 2006 00:25, Jurgen Kramer wrote:
> With 2.6.16 (from FC5s 2.6.16-1.2080_FC5smp) I am getting a lot of
>
> Mar 31 09:35:16 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:39 paragon kernel: Non-Fatal Error PCI Express B
>
> messages which presumably come from
>
> Mar 31 09:17:15 paragon kernel: MC: drivers/edac/edac_mc.c version
> edac_mc Ver: 2.0.0 Mar 28 2006
> Mar 31 09:17:15 paragon kernel: EDAC MC0: Giving out device to
> "e752x_edac" E7525: PCI 0000:00:00.0
>
> Is there really something broken here of just a noisy driver?
>
> BTW this is on a Asus NCT-D mobo with Intel E7525 chipset.
>
> Jurgen

Hi Jurgen,

I haven't seen this particular error before, and I can't say for sure
whether it's a genuine problem that should be dealt with or just a
minor annoyance that can be safely ignored. EDAC is a relatively new
piece of code, and still very much a work in progress. If this is in
fact a benign type of error, EDAC should provide a mechanism by which
a sysadmin can silence it. This is an area of future work.

I'm forwarding your message to the bluesmoke mailing list just in
case anyone who reads that list has seen instances of this error in
the past and can provide more info on it.

Dave

2006-03-31 19:18:28

by Jurgen Kramer

[permalink] [raw]
Subject: Re: Non-Fatal Error PCI Express messages

On Fri, 2006-03-31 at 10:22 -0800, Dave Peterson wrote:
> On Friday 31 March 2006 00:25, Jurgen Kramer wrote:
> > With 2.6.16 (from FC5s 2.6.16-1.2080_FC5smp) I am getting a lot of
> >
> > Mar 31 09:35:16 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:39 paragon kernel: Non-Fatal Error PCI Express B
> >
> > messages which presumably come from
> >
> > Mar 31 09:17:15 paragon kernel: MC: drivers/edac/edac_mc.c version
> > edac_mc Ver: 2.0.0 Mar 28 2006
> > Mar 31 09:17:15 paragon kernel: EDAC MC0: Giving out device to
> > "e752x_edac" E7525: PCI 0000:00:00.0
> >
> > Is there really something broken here of just a noisy driver?
> >
> > BTW this is on a Asus NCT-D mobo with Intel E7525 chipset.
> >
> > Jurgen
>
> Hi Jurgen,
>
> I haven't seen this particular error before, and I can't say for sure
> whether it's a genuine problem that should be dealt with or just a
> minor annoyance that can be safely ignored. EDAC is a relatively new
> piece of code, and still very much a work in progress. If this is in
> fact a benign type of error, EDAC should provide a mechanism by which
> a sysadmin can silence it. This is an area of future work.
>
> I'm forwarding your message to the bluesmoke mailing list just in
> case anyone who reads that list has seen instances of this error in
> the past and can provide more info on it.
>
> Dave

Hi Dave,

So far the system is running just fine. For reference, so far I found 92
"Non-Fatal Error PCI Express B" messages since the system was booted 8
hours ago.

BTW Dave Jones reported similar problems on the LKML:
http://lkml.org/lkml/2006/1/26/381

Cheers,

Jurgen


2006-03-31 20:03:07

by Doug Thompson

[permalink] [raw]
Subject: Re: Non-Fatal Error PCI Express messages

On Fri, 2006-03-31 at 18:22 +0000, Dave Peterson wrote:
> On Friday 31 March 2006 00:25, Jurgen Kramer wrote:
> > With 2.6.16 (from FC5s 2.6.16-1.2080_FC5smp) I am getting a lot of
> >
> > Mar 31 09:35:16 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
> > Mar 31 09:35:39 paragon kernel: Non-Fatal Error PCI Express B
> >
> > messages which presumably come from
> >
> > Mar 31 09:17:15 paragon kernel: MC: drivers/edac/edac_mc.c version
> > edac_mc Ver: 2.0.0 Mar 28 2006
> > Mar 31 09:17:15 paragon kernel: EDAC MC0: Giving out device to
> > "e752x_edac" E7525: PCI 0000:00:00.0
> >
> > Is there really something broken here of just a noisy driver?
> >
> > BTW this is on a Asus NCT-D mobo with Intel E7525 chipset.
> >
> > Jurgen
>
> Hi Jurgen,
>
> I haven't seen this particular error before, and I can't say for sure
> whether it's a genuine problem that should be dealt with or just a
> minor annoyance that can be safely ignored. EDAC is a relatively new
> piece of code, and still very much a work in progress. If this is in
> fact a benign type of error, EDAC should provide a mechanism by which
> a sysadmin can silence it. This is an area of future work.
>
> I'm forwarding your message to the bluesmoke mailing list just in
> case anyone who reads that list has seen instances of this error in
> the past and can provide more info on it.
>
> Dave

It is benign, just too verbose. It needs to be silenced.

The code takes the error status from the chip. That status can contain
true BAD errors and non-fatal status. The code is generic in nature and
does not special case the non-fatal error status.

this does need looking into.

doug t


2006-03-31 20:03:52

by Dave Jiang

[permalink] [raw]
Subject: Re: Non-Fatal Error PCI Express messages

Jurgen Kramer wrote:
> On Fri, 2006-03-31 at 10:22 -0800, Dave Peterson wrote:
>> On Friday 31 March 2006 00:25, Jurgen Kramer wrote:
>>> With 2.6.16 (from FC5s 2.6.16-1.2080_FC5smp) I am getting a lot of
>>>
>>> Mar 31 09:35:16 paragon kernel: Non-Fatal Error PCI Express B
>>> Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
>>> Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
>>> Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
>>> Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
>>> Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
>>> Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
>>> Mar 31 09:35:39 paragon kernel: Non-Fatal Error PCI Express B
>>>
>>> messages which presumably come from
>>>
>>> Mar 31 09:17:15 paragon kernel: MC: drivers/edac/edac_mc.c version
>>> edac_mc Ver: 2.0.0 Mar 28 2006
>>> Mar 31 09:17:15 paragon kernel: EDAC MC0: Giving out device to
>>> "e752x_edac" E7525: PCI 0000:00:00.0
>>>
>>> Is there really something broken here of just a noisy driver?
>>>
>>> BTW this is on a Asus NCT-D mobo with Intel E7525 chipset.
>>>
>>> Jurgen
>> Hi Jurgen,
>>
>> I haven't seen this particular error before, and I can't say for sure
>> whether it's a genuine problem that should be dealt with or just a
>> minor annoyance that can be safely ignored. EDAC is a relatively new
>> piece of code, and still very much a work in progress. If this is in
>> fact a benign type of error, EDAC should provide a mechanism by which
>> a sysadmin can silence it. This is an area of future work.
>>
>> I'm forwarding your message to the bluesmoke mailing list just in
>> case anyone who reads that list has seen instances of this error in
>> the past and can provide more info on it.
>>
>> Dave
>
> Hi Dave,
>
> So far the system is running just fine. For reference, so far I found 92
> "Non-Fatal Error PCI Express B" messages since the system was booted 8
> hours ago.
>
> BTW Dave Jones reported similar problems on the LKML:
> http://lkml.org/lkml/2006/1/26/381
>
> Cheers,
>
> Jurgen
>

Do you have the sysbus message patch in the e752x driver? I've seen this before
on certain Intel ATCA boards, but it's actually the system bus complaining
and not the PCIe but the code is referencing the wrong print out messages. If
you do then it's probably PCIe.

2006-03-31 20:20:05

by Dave Jones

[permalink] [raw]
Subject: Re: Non-Fatal Error PCI Express messages

On Fri, Mar 31, 2006 at 10:25:50AM +0200, Jurgen Kramer wrote:
> With 2.6.16 (from FC5s 2.6.16-1.2080_FC5smp) I am getting a lot of
>
> Mar 31 09:35:16 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:17 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:18 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:20 paragon kernel: Non-Fatal Error PCI Express B
> Mar 31 09:35:39 paragon kernel: Non-Fatal Error PCI Express B
>
> messages which presumably come from
>
> Mar 31 09:17:15 paragon kernel: MC: drivers/edac/edac_mc.c version
> edac_mc Ver: 2.0.0 Mar 28 2006
> Mar 31 09:17:15 paragon kernel: EDAC MC0: Giving out device to
> "e752x_edac" E7525: PCI 0000:00:00.0
>
> Is there really something broken here of just a noisy driver?

really noisy driver.
http://lkml.org/lkml/2006/1/26/381

Dave

--
http://www.codemonkey.org.uk