2008-09-29 18:32:11

by Larry Finger

[permalink] [raw]
Subject: [RFC/T] b43: to few loop tries in do_dummy_tx

Michael,

I have started modifying my copy of b43 to see if I can locate any
coding problems that might be causing the PHY transmission errors. To
start with, I am checking all the sections that use udelay. The main
reason is that when a bug in the CPU timer cut all delays in half, I
got these errors in b43legacy. I also noted the report this morning
that compiler and toolchain versions seem to be important.

One of the first places I looked was in do_dummy_tx. When I checked
the spin-on-condition loops in this routine, I found that the 3rd
always used the maximum number of loops and exited before the
condition was met. I increased the number of possible loops from 10 to
20 and found that it always takes 17 or 18 passes for the condition to
be met on my machine.

The existing code matches specs - I even checked the code in
4.150.10.5. It is too early to tell if this has anything to do with
the errors, but I suggest the following change:

diff --git a/drivers/net/wireless/b43/main.c
b/drivers/net/wireless/b43/main.c
index 7205a93..af60122 100644
--- a/drivers/net/wireless/b43/main.c
+++ b/drivers/net/wireless/b43/main.c
@@ -814,7 +814,7 @@ void b43_dummy_transmission(struct b43_wldev *dev)
break;
udelay(10);
}
- for (i = 0x00; i < 0x0A; i++) {
+ for (i = 0x00; i < 0x19; i++) {
value = b43_read16(dev, 0x0690);
if (!(value & 0x0100))
break;

Using 25 passes gives a little margin for CPUs that are much faster
than mine. Perhaps the margin should even be bigger.

Larry




2008-09-30 14:11:29

by Holger Schurig

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

> The b43 and b43-legacy driver are _based_ on these
> specifications. There are no other specs available.

Can it be the case that the binary windows driver the spec
producer analyzed used a different PCI settings? And that
therefore the delays/access-patterns in the binary driver work
as expected *in their environment*, but that Linux might have
programmed the PCI bus/controller differently and that therefore
the identical approach fails (sometimes) ???

2008-09-30 14:22:50

by Michael Büsch

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

On Tuesday 30 September 2008 16:11:18 Holger Schurig wrote:
> > The b43 and b43-legacy driver are _based_ on these
> > specifications. There are no other specs available.
>
> Can it be the case that the binary windows driver the spec

They didn't use a windows driver.

> producer analyzed used a different PCI settings? And that
> therefore the delays/access-patterns in the binary driver work
> as expected *in their environment*, but that Linux might have
> programmed the PCI bus/controller differently and that therefore
> the identical approach fails (sometimes) ???

Unlikely.

--
Greetings Michael.

2008-09-30 20:02:26

by Gábor Stefanik

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

On Tue, Sep 30, 2008 at 6:34 PM, Artem Antonov
<[email protected]> wrote:
> I've tested it with my bcm4318(asus wl-500gP). But there are still PHY
> transmission errors. So there should be some more reasons of these
> errors :)
>
> Best regards/

It would be good to know whether the Windows driver also suffers from
PHY transmission errors... The "hybrid" Linux driver likely does have
them, possibly even more than us, judging from its poor performance.

--
Vista: [V]iruses, [I]ntruders, [S]pyware, [T]rojans and [A]dware. :-)

2008-09-30 14:21:54

by Michael Büsch

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

On Tuesday 30 September 2008 16:13:53 Peter Stuge wrote:
> At this point, if there are only/mostly benefits, I don't see why
> deviating from the specs is bad - after all they "only" document
> another driver, right?

The past taught us that deviations from the specs are almost certainly
hidden bugs somewhere else. We had that dozens of times.
In this case, however, I think it's not the case.

--
Greetings Michael.

2008-09-30 13:55:29

by Michael Büsch

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

On Tuesday 30 September 2008 15:46:04 [email protected] wrote:
> If I understand him correctly he's suggesting that there could be BETTER
> values than those used by the reference driver. In other words, yes,
> B43/B43-Legacy are based on the RE of the Windows driver but perhaps
> there are better values that improve behavior beyond that of the
> original driver.

I doubt that very much. I think it's as simple as the udelay of the original
driver taking 1.2 microseconds and ours taking only 1.05 microseconds.
Both udelay implementations are valid, but ours uncovers the hidden bug.

--
Greetings Michael.

2008-09-30 13:26:36

by Michael Büsch

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

On Monday 29 September 2008 23:16:20 Peter Stuge wrote:
> Michael Buesch wrote:
> > Actually, I do know since the very first days of bcm43xx that the
> > loop counts are not big enough in some of these loops.
>
> Would it make sense to double check the conditions after the loop?

No it wouldn't. Accessing the registers has side-effects.

> > But I didn't change it, as it was said the current counts match the
> > specs.
>
> Which specs?

Our specs?

--
Greetings Michael.

2008-09-29 18:39:06

by Michael Büsch

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

On Monday 29 September 2008 20:31:58 Larry Finger wrote:
> Michael,
>
> I have started modifying my copy of b43 to see if I can locate any
> coding problems that might be causing the PHY transmission errors. To
> start with, I am checking all the sections that use udelay. The main
> reason is that when a bug in the CPU timer cut all delays in half, I
> got these errors in b43legacy. I also noted the report this morning
> that compiler and toolchain versions seem to be important.
>
> One of the first places I looked was in do_dummy_tx. When I checked
> the spin-on-condition loops in this routine, I found that the 3rd
> always used the maximum number of loops and exited before the
> condition was met. I increased the number of possible loops from 10 to
> 20 and found that it always takes 17 or 18 passes for the condition to
> be met on my machine.
>
> The existing code matches specs - I even checked the code in
> 4.150.10.5. It is too early to tell if this has anything to do with
> the errors, but I suggest the following change:
>
> diff --git a/drivers/net/wireless/b43/main.c
> b/drivers/net/wireless/b43/main.c
> index 7205a93..af60122 100644
> --- a/drivers/net/wireless/b43/main.c
> +++ b/drivers/net/wireless/b43/main.c
> @@ -814,7 +814,7 @@ void b43_dummy_transmission(struct b43_wldev *dev)
> break;
> udelay(10);
> }
> - for (i = 0x00; i < 0x0A; i++) {
> + for (i = 0x00; i < 0x19; i++) {
> value = b43_read16(dev, 0x0690);
> if (!(value & 0x0100))
> break;
>
> Using 25 passes gives a little margin for CPUs that are much faster
> than mine. Perhaps the margin should even be bigger.


Yeah, feel free to submit this to John.
Dummy-tx _can_ for sure trigger PHY TX errors. I already saw that
in my experiments, too. It also makes sense, as dummy-tx is poking
with the lowlevel TX hardware.
So if a loop exists early it may leave TX hardware in an inconsistent state.

Actually, I do know since the very first days of bcm43xx that the
loop counts are not big enough in some of these loops. But I didn't
change it, as it was said the current counts match the specs.
However, I think it's perfectly valid to raise the values.
Please do so, thanks a lot.

--
Greetings Michael.

2008-09-29 21:23:03

by Peter Stuge

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

Michael Buesch wrote:
> Actually, I do know since the very first days of bcm43xx that the
> loop counts are not big enough in some of these loops.

Would it make sense to double check the conditions after the loop?


> But I didn't change it, as it was said the current counts match the
> specs.

Which specs?


//Peter

2008-09-30 14:13:57

by Peter Stuge

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

Michael Buesch wrote:
> > Nice work, but as it's a spec of another driver implementation rather
> > than hardware (or even the firmware API) I don't think it should be
> > so authoritative. If other values are clearly better why not use
> > them?
>
> What crap are you smoking?

Maybe we just miscommunicated, or maybe I misunderstood something
about the reverse engineering process. I am very new to this project.


> The b43 and b43-legacy driver are _based_ on these specifications.
> There are no other specs available.

I know. Allow me to clarify my train of thought:

* Specs are created by reverse engineering of a binary blob.

* After development of b43*, many can happily use the drivers. \o/

* As it turns out, b43* can be made even more reliable by doing
things slightly different than what the specs say. Case in point
looping a little longer.

At this point, if there are only/mostly benefits, I don't see why
deviating from the specs is bad - after all they "only" document
another driver, right?


//Peter

2008-09-29 21:28:27

by Larry Finger

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

Peter Stuge wrote:
>
> Which specs?

The ones generated by the reverse engineers. See
http://bcm-v4.sipsolutions.net/.

Larry

2008-09-30 13:46:08

by Ehud Gavron

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx



Michael Buesch wrote:
> On Tuesday 30 September 2008 07:50:34 Peter Stuge wrote:
>
>> Larry Finger wrote:
>>
>>>> Which specs?
>>>>
>>> The ones generated by the reverse engineers. See
>>> http://bcm-v4.sipsolutions.net/.
>>>
>> Nice work, but as it's a spec of another driver implementation rather
>> than hardware (or even the firmware API) I don't think it should be
>> so authoritative. If other values are clearly better why not use
>> them?
>>
>
> What crap are you smoking?
> The b43 and b43-legacy driver are _based_ on these specifications.
> There are no other specs available.
>
>
If I understand him correctly he's suggesting that there could be BETTER
values than those used by the reference driver. In other words, yes,
B43/B43-Legacy are based on the RE of the Windows driver but perhaps
there are better values that improve behavior beyond that of the
original driver.

He didn't say the following but I will: It's also true that there are
edge cases that RE won't catch without repeated arduous testing in
adverse conditions, and there may be code in the reference driver that
will therefore won't end up in the specs. This means that behavioral
improvements and/or performance gains in B43/B43-Legacy that can be
gained without getting into those edge cases are worthy of consideration
(or maybe specially labeled code).

Just my two farthings worth.

E

--
Legal Disclaimer that you are now contractually bound to under all laws with no recourse:
http://attrition.org/security/rants/z/disclaimers.html


2008-09-30 05:50:38

by Peter Stuge

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

Larry Finger wrote:
> > Which specs?
>
> The ones generated by the reverse engineers. See
> http://bcm-v4.sipsolutions.net/.

Nice work, but as it's a spec of another driver implementation rather
than hardware (or even the firmware API) I don't think it should be
so authoritative. If other values are clearly better why not use
them?


//Peter

2008-09-30 16:34:08

by Artem Antonov

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

I've tested it with my bcm4318(asus wl-500gP). But there are still PHY
transmission errors. So there should be some more reasons of these
errors :)

Best regards/

2008/9/29, Larry Finger <[email protected]>:
> Michael,
>
..........

2008-09-30 13:28:48

by Michael Büsch

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

On Tuesday 30 September 2008 07:50:34 Peter Stuge wrote:
> Larry Finger wrote:
> > > Which specs?
> >
> > The ones generated by the reverse engineers. See
> > http://bcm-v4.sipsolutions.net/.
>
> Nice work, but as it's a spec of another driver implementation rather
> than hardware (or even the firmware API) I don't think it should be
> so authoritative. If other values are clearly better why not use
> them?

What crap are you smoking?
The b43 and b43-legacy driver are _based_ on these specifications.
There are no other specs available.

--
Greetings Michael.

2008-09-30 22:11:52

by Larry Finger

[permalink] [raw]
Subject: Re: [RFC/T] b43: to few loop tries in do_dummy_tx

Stefanik G=E1bor wrote:
>=20
> It would be good to know whether the Windows driver also suffers from
> PHY transmission errors... The "hybrid" Linux driver likely does have
> them, possibly even more than us, judging from its poor performance.
>=20

We didn't see the PHY transmission errors in b43 until fairly
recently. Unfortunately, they are too intermittent to tell if it is a
regression, and more importantly, it would be a bitch to bisect the
code as one would need many weeks of testing before saying that a
particular kernel was good.

When I first got 64-bit dma working on a BCM4311/2, there was always
one such error. That was removed when Michael changed the dma code.
Now I do not get any on most runs, but every once in a while (3 times
since July!), I get a storm of these.

Unlike most of you, I have seen the Broadcom code. There is almost no
error checking in their routines, which is good for us - it makes the
code smaller and the decompilation easier. I will never RE the binary
blob of the hybrid driver for a number of reasons; however, I doubt
that their culture has changed. As to why their driver has such poor
performance, your guess is as good as mine.

Larry