2010-08-16 17:59:58

by Gábor Stefanik

[permalink] [raw]
Subject: [RFT] BCM4312 users with DMA errors, please test!

Hello Everyone!

If you are experiencing DMA errors on a BCM4312, please test the
attached patch. It implements the PCI-E SERDES workaround, which the
hybrid driver is applying during early init to LP-PHY cards, and which
is a good candidate for the cause of the DMA error.
Note that this is not a final patch & it may cause collateral damage
for non-4312 cards; if it helps the 4312 problem, I will submit a
cleaned-up version.

Thanks,
G?bor


Attachments:
pcie_serdes_workaround.diff (3.34 kB)

2010-08-17 20:34:54

by Chris Vine

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On Tue, 17 Aug 2010 13:23:37 +0100
Chris Vine <[email protected]> wrote:
> Reverting to 2.6.35.1 solves this. Probably 2.6.36-rc1 would be OK as
> well. I might later today try the ssb patch on the 2.6.35.1 kernel,
> but it doesn't look as if it solves the DMA errors. However,
> yesterday's tests clearly aren't conclusive about anything.

I have applied the patch to 2.6.35.1, and it does not appear to have
any effect on the DMA errors.

Chris



2010-08-16 22:36:05

by Chris Vine

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On Mon, 16 Aug 2010 20:41:54 +0100
Chris Vine <[email protected]> wrote:

> On Mon, 16 Aug 2010 19:59:36 +0200
> Gábor Stefanik <[email protected]> wrote:
> > Hello Everyone!
> >
> > If you are experiencing DMA errors on a BCM4312, please test the
> > attached patch. It implements the PCI-E SERDES workaround, which the
> > hybrid driver is applying during early init to LP-PHY cards, and
> > which is a good candidate for the cause of the DMA error.
> > Note that this is not a final patch & it may cause collateral damage
> > for non-4312 cards; if it helps the 4312 problem, I will submit a
> > cleaned-up version.
>
> This applies to 2.6.35.2, but does not compile:
>
> drivers/ssb/driver_pcicore.c: In function 'ssb_pcie_mdio_set_block':
> drivers/ssb/driver_pcicore.c:457: error: 'i' undeclared (first use in
> this function) drivers/ssb/driver_pcicore.c:457: error: (Each
> undeclared identifier is reported only once
> drivers/ssb/driver_pcicore.c:457: error: for each function it appears
> in.) drivers/ssb/driver_pcicore.c: In function 'ssb_pcie_mdio_read':
> drivers/ssb/driver_pcicore.c:503: error: expected ';' before
> 'pcicore_write32'
>
> With the obvious fixes (providing a variable 'i' of int type as the
> count variable and terminating the line which had no terminating
> semi-colon), it compiled OK and the first time I booted up, booted up
> OK but didn't fix the DMA error. Subsequent attempts to boot up gave
> me a slew of errors on boot up so I am not sure what is going on
> there.
>
> You might want to check that the obvious fixes to the patch are
> complete.

I have more or less tracked down what is happening.

The first thing to report is that this does not fix the DMA error.

The second thing is that with the patch applied, and with _all_
wireless/ssb modules blacklisted (ssb, b43, wl), if the wl module is
present the kernel bizarrely still tries to load it on boot up and
shortly thereafter hangs with a number of errors reported, which differ
on different boots (there is no particular pattern to them). That in
turn causes some file system corruption which affects further boots
even if the wl module is removed. The corruption is not that serious
and appears only to affect the kernel image and/or the wl module.
Reinstalling both solved the problem, so far. ef2fsck -f only reported
that the time stamps were wrong, but inodes and directory structures
were reported as OK.

But caveat testor so far as the corruption is concerned. I might have
been lucky.

Chris



2010-08-16 23:53:59

by Larry Finger

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On 08/16/2010 05:35 PM, Chris Vine wrote:
> On Mon, 16 Aug 2010 20:41:54 +0100
> Chris Vine <[email protected]> wrote:
>
>> On Mon, 16 Aug 2010 19:59:36 +0200
>> Gábor Stefanik <[email protected]> wrote:
>>> Hello Everyone!
>>>
>>> If you are experiencing DMA errors on a BCM4312, please test the
>>> attached patch. It implements the PCI-E SERDES workaround, which the
>>> hybrid driver is applying during early init to LP-PHY cards, and
>>> which is a good candidate for the cause of the DMA error.
>>> Note that this is not a final patch & it may cause collateral damage
>>> for non-4312 cards; if it helps the 4312 problem, I will submit a
>>> cleaned-up version.
>>
>> This applies to 2.6.35.2, but does not compile:
>>
>> drivers/ssb/driver_pcicore.c: In function 'ssb_pcie_mdio_set_block':
>> drivers/ssb/driver_pcicore.c:457: error: 'i' undeclared (first use in
>> this function) drivers/ssb/driver_pcicore.c:457: error: (Each
>> undeclared identifier is reported only once
>> drivers/ssb/driver_pcicore.c:457: error: for each function it appears
>> in.) drivers/ssb/driver_pcicore.c: In function 'ssb_pcie_mdio_read':
>> drivers/ssb/driver_pcicore.c:503: error: expected ';' before
>> 'pcicore_write32'
>>
>> With the obvious fixes (providing a variable 'i' of int type as the
>> count variable and terminating the line which had no terminating
>> semi-colon), it compiled OK and the first time I booted up, booted up
>> OK but didn't fix the DMA error. Subsequent attempts to boot up gave
>> me a slew of errors on boot up so I am not sure what is going on
>> there.
>>
>> You might want to check that the obvious fixes to the patch are
>> complete.
>
> I have more or less tracked down what is happening.
>
> The first thing to report is that this does not fix the DMA error.
>
> The second thing is that with the patch applied, and with _all_
> wireless/ssb modules blacklisted (ssb, b43, wl), if the wl module is
> present the kernel bizarrely still tries to load it on boot up and
> shortly thereafter hangs with a number of errors reported, which differ
> on different boots (there is no particular pattern to them). That in
> turn causes some file system corruption which affects further boots
> even if the wl module is removed. The corruption is not that serious
> and appears only to affect the kernel image and/or the wl module.
> Reinstalling both solved the problem, so far. ef2fsck -f only reported
> that the time stamps were wrong, but inodes and directory structures
> were reported as OK.
>
> But caveat testor so far as the corruption is concerned. I might have
> been lucky.

I tried to duplicate your boot problems without anything unexpected happening.

I don't know what happened, but I doubt that this ssb patch was responsible.

Larry


2010-08-17 00:14:59

by Chris Vine

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On Tue, 17 Aug 2010 01:09:23 +0100
Chris Vine <[email protected]> wrote:

> On Mon, 16 Aug 2010 18:53:41 -0500
> Larry Finger <[email protected]> wrote:
> > I tried to duplicate your boot problems without anything unexpected
> > happening.
> >
> > I don't know what happened, but I doubt that this ssb patch was
> > responsible.
>
> I have reproduced it twice. It is definitely the patch, in the sense
> that it happens with the patch included (on two separate tests that I
> have conducted) and never without.
>
> I do not doubt that you do not experience this effect. However, you
> don't experience the DMA bug either.

Out of interest, which version of the wl module do you have installed,
as I wonder if that makes a difference? (As I said in my e-mail on
this, this problem only occurs if the wl module is available to the
kernel.)

Chris



2010-08-17 00:18:29

by Chris Vine

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On Mon, 16 Aug 2010 18:53:41 -0500
Larry Finger <[email protected]> wrote:
> I tried to duplicate your boot problems without anything unexpected
> happening.
>
> I don't know what happened, but I doubt that this ssb patch was
> responsible.

I have reproduced it twice. It is definitely the patch, in the sense
that it happens with the patch included (on two separate tests that I
have conducted) and never without.

I do not doubt that you do not experience this effect. However, you
don't experience the DMA bug either.

Chris



2010-08-17 00:19:15

by Chris Vine

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On Tue, 17 Aug 2010 01:09:23 +0100
Chris Vine <[email protected]> wrote:

> On Mon, 16 Aug 2010 18:53:41 -0500
> Larry Finger <[email protected]> wrote:
> > I tried to duplicate your boot problems without anything unexpected
> > happening.
> >
> > I don't know what happened, but I doubt that this ssb patch was
> > responsible.
>
> I have reproduced it twice. It is definitely the patch, in the sense
> that it happens with the patch included (on two separate tests that I
> have conducted) and never without.
>
> I do not doubt that you do not experience this effect. However, you
> don't experience the DMA bug either.

I probably also ought to say that this is with a stock 2.6.35.2
kernel. You may be using wireless testing, although I doubt that makes
a difference.

Chris



2010-08-16 19:16:54

by Gábor Stefanik

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

2010/8/16 Larry Finger <[email protected]>:
> On 08/16/2010 12:59 PM, G?bor Stefanik wrote:
>> Hello Everyone!
>>
>> If you are experiencing DMA errors on a BCM4312, please test the
>> attached patch. It implements the PCI-E SERDES workaround, which the
>> hybrid driver is applying during early init to LP-PHY cards, and which
>> is a good candidate for the cause of the DMA error.
>> Note that this is not a final patch & it may cause collateral damage
>> for non-4312 cards; if it helps the 4312 problem, I will submit a
>> cleaned-up version.
>
> The patch that you distributed had a couple of errors in compiling, namely:
>
> ?CC [M] ?drivers/ssb/driver_pcicore.o
> drivers/ssb/driver_pcicore.c: In function ?ssb_pcie_mdio_set_block?:
> drivers/ssb/driver_pcicore.c:457:7: error: ?i? undeclared (first use in this
> function)
> drivers/ssb/driver_pcicore.c:457:7: note: each undeclared identifier is reported
> only once for each function it appears in
> drivers/ssb/driver_pcicore.c: In function ?ssb_pcie_mdio_read?:
> drivers/ssb/driver_pcicore.c:503:2: error: expected ?;? before ?pcicore_write32?
> make[2]: *** [drivers/ssb/driver_pcicore.o] Error 1
> make[1]: *** [drivers/ssb] Error 2
> make[1]: *** Waiting for unfinished jobs....
>
> Did you forget a quilt refresh?
>
> My machine does not have the DMA error, but I will be testing.
>
> Larry
>

Oops... yes, two nasty typos. I have no idea why it compiled for me...
Schr?dinbug?

With that said, here is the corrected version.

--
Vista: [V]iruses, [I]ntruders, [S]pyware, [T]rojans and [A]dware. :-)


Attachments:
pcie_serdes_workaround.diff (3.35 kB)

2010-08-16 19:42:15

by Chris Vine

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On Mon, 16 Aug 2010 19:59:36 +0200
Gábor Stefanik <[email protected]> wrote:
> Hello Everyone!
>
> If you are experiencing DMA errors on a BCM4312, please test the
> attached patch. It implements the PCI-E SERDES workaround, which the
> hybrid driver is applying during early init to LP-PHY cards, and which
> is a good candidate for the cause of the DMA error.
> Note that this is not a final patch & it may cause collateral damage
> for non-4312 cards; if it helps the 4312 problem, I will submit a
> cleaned-up version.

This applies to 2.6.35.2, but does not compile:

drivers/ssb/driver_pcicore.c: In function 'ssb_pcie_mdio_set_block':
drivers/ssb/driver_pcicore.c:457: error: 'i' undeclared (first use in
this function) drivers/ssb/driver_pcicore.c:457: error: (Each
undeclared identifier is reported only once
drivers/ssb/driver_pcicore.c:457: error: for each function it appears
in.) drivers/ssb/driver_pcicore.c: In function 'ssb_pcie_mdio_read':
drivers/ssb/driver_pcicore.c:503: error: expected ';' before
'pcicore_write32'

With the obvious fixes (providing a variable 'i' of int type as the
count variable and terminating the line which had no terminating
semi-colon), it compiled OK and the first time I booted up, booted up
OK but didn't fix the DMA error. Subsequent attempts to boot up gave me
a slew of errors on boot up so I am not sure what is going on there.

You might want to check that the obvious fixes to the patch are
complete.

Chris



2010-08-16 19:06:20

by Larry Finger

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On 08/16/2010 12:59 PM, G?bor Stefanik wrote:
> Hello Everyone!
>
> If you are experiencing DMA errors on a BCM4312, please test the
> attached patch. It implements the PCI-E SERDES workaround, which the
> hybrid driver is applying during early init to LP-PHY cards, and which
> is a good candidate for the cause of the DMA error.
> Note that this is not a final patch & it may cause collateral damage
> for non-4312 cards; if it helps the 4312 problem, I will submit a
> cleaned-up version.

The patch that you distributed had a couple of errors in compiling, namely:

CC [M] drivers/ssb/driver_pcicore.o
drivers/ssb/driver_pcicore.c: In function ?ssb_pcie_mdio_set_block?:
drivers/ssb/driver_pcicore.c:457:7: error: ?i? undeclared (first use in this
function)
drivers/ssb/driver_pcicore.c:457:7: note: each undeclared identifier is reported
only once for each function it appears in
drivers/ssb/driver_pcicore.c: In function ?ssb_pcie_mdio_read?:
drivers/ssb/driver_pcicore.c:503:2: error: expected ?;? before ?pcicore_write32?
make[2]: *** [drivers/ssb/driver_pcicore.o] Error 1
make[1]: *** [drivers/ssb] Error 2
make[1]: *** Waiting for unfinished jobs....

Did you forget a quilt refresh?

My machine does not have the DMA error, but I will be testing.

Larry

2010-08-16 19:30:52

by Larry Finger

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On 08/16/2010 02:16 PM, G?bor Stefanik wrote:
>
> Oops... yes, two nasty typos. I have no idea why it compiled for me...
> Schr?dinbug?
>
> With that said, here is the corrected version.

Hmmm, a new application of the "Uncertainty Principle".

The patch did not break my 14e4:4315 device, which already worked.

Larry



2010-08-17 12:23:51

by Chris Vine

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On Mon, 16 Aug 2010 20:01:45 -0500
Larry Finger <[email protected]> wrote:
> My wl came from hybrid-portsrc-x86_32-v5.60.48.36.tar.gz. My kernel is
> 2.6.36-rc1 from mainline.
>
> Very strange that a patch to ssb, which is a module that cannot even
> load due to being blacklisted, can cause this kind of problem. Are
> you warm or cold rebooting?

I am using the same version of wl.

On booting up my netbook this morning, I found that I get a wholesome
set of bugs reported by dmesg, even without wl available to the
kernel and with the ssb patch reverted. I think the ssb patch is an
innocent bystander which just happened first to reveal a bug in the
2.6.35.2 kernel. A lot seems to depend on the starting state of the
hardware - a cold boot in a stock 2.6.35.2 kernel will trigger it
(what I got this morning). A warm boot on a 2.6.35.2 kernel with the ssb
patch will also trigger it (what I got yesterday).

The fact that something tries to load a blacklisted wl module would
seem to show there is something fairly fundamentally amiss, possibly
with the ACPI code. In any event, the conclusion I have reached is that
the 2.6.35.2 kernel is hopelessly broken on my netbook hardware. I do
not now think there is any file system corruption: it just gave that
appearance because of the saving of the state of the machine between
warm boots.

Reverting to 2.6.35.1 solves this. Probably 2.6.36-rc1 would be OK as
well. I might later today try the ssb patch on the 2.6.35.1 kernel,
but it doesn't look as if it solves the DMA errors. However,
yesterday's tests clearly aren't conclusive about anything.

Chris



2010-08-16 19:32:48

by Gábor Stefanik

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

2010/8/16 Larry Finger <[email protected]>:
> On 08/16/2010 02:16 PM, G?bor Stefanik wrote:
>>
>> Oops... yes, two nasty typos. I have no idea why it compiled for me...
>> Schr?dinbug?
>>
>> With that said, here is the corrected version.
>
> Hmmm, a new application of the "Uncertainty Principle".
>
> The patch did not break my 14e4:4315 device, which already worked.
>
> Larry
>

That's expected, given that the hybrid driver also does this. :-) The
question is whether it fixes the DMA error.

2010-08-17 01:02:05

by Larry Finger

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On 08/16/2010 07:14 PM, Chris Vine wrote:
>>
>> I have reproduced it twice. It is definitely the patch, in the sense
>> that it happens with the patch included (on two separate tests that I
>> have conducted) and never without.
>>
>> I do not doubt that you do not experience this effect. However, you
>> don't experience the DMA bug either.
>
> Out of interest, which version of the wl module do you have installed,
> as I wonder if that makes a difference? (As I said in my e-mail on
> this, this problem only occurs if the wl module is available to the
> kernel.)

My wl came from hybrid-portsrc-x86_32-v5.60.48.36.tar.gz. My kernel is
2.6.36-rc1 from mainline.

Very strange that a patch to ssb, which is a module that cannot even load due to
being blacklisted, can cause this kind of problem. Are you warm or cold rebooting?

Larry

2010-08-17 20:39:15

by Gábor Stefanik

[permalink] [raw]
Subject: Re: [RFT] BCM4312 users with DMA errors, please test!

On Tue, Aug 17, 2010 at 10:28 PM, Chris Vine
<[email protected]> wrote:
> On Tue, 17 Aug 2010 13:23:37 +0100
> Chris Vine <[email protected]> wrote:
>> Reverting to 2.6.35.1 solves this. ?Probably 2.6.36-rc1 would be OK as
>> well. ?I might later today try the ssb patch on the 2.6.35.1 kernel,
>> but it doesn't look as if it solves the DMA errors. ?However,
>> yesterday's tests clearly aren't conclusive about anything.
>
> I have applied the patch to 2.6.35.1, and it does not appear to have
> any effect on the DMA errors.
>
> Chris
>
>
>

Thanks! That's what I was ultimately interested in. So, we now know
it's not the SERDES workaround. (Or maybe it needs to be performed at
a different time.)