2002-02-15 12:59:15

by Frank Elsner

[permalink] [raw]
Subject: Broadcom 5700/5701 Gigabit Ethernet Adapters


I'm currently using kernel 2.4.9-21 from RH, after updating my RH 7.2.

I want to build a custom kernel 2.4.17 but
the "Broadcom 5700/5701 Gigabit Ethernet Adapter" (which I need)
isn't in the source. Obviously an addon from RH.

Why isn't the driver in the main kernel tree ?

Kind regards _______________________________________________________________
Frank Elsner / c/o Technische Universitaet Berlin |
____________/ ZRZ, Sekr. E-N 50 |
| Einsteinufer 17 |
| Voice: +49 30 314 23897 D-10587 Berlin |
| SMTP : [email protected] Germany ________________________|
|____________________________________________________| Und das ist auch gut so



2002-02-15 13:06:22

by Jeff Garzik

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Frank Elsner wrote:
>
> I'm currently using kernel 2.4.9-21 from RH, after updating my RH 7.2.
>
> I want to build a custom kernel 2.4.17 but
> the "Broadcom 5700/5701 Gigabit Ethernet Adapter" (which I need)
> isn't in the source. Obviously an addon from RH.
>
> Why isn't the driver in the main kernel tree ?

Cuz the driver is a piece of crap, and BroadCom isn't interested in
working with the open source community to fix up the issues.

DaveM and I should have something eventually, which will make the
RH-shipped driver irrelevant.

Jeff


--
Jeff Garzik | "I went through my candy like hot oatmeal
Building 1024 | through an internally-buttered weasel."
MandrakeSoft | - goats.com

2002-02-15 14:36:27

by Thomas Langås

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Jeff Garzik:
> DaveM and I should have something eventually, which will make the
> RH-shipped driver irrelevant.

How's this coming along? Do you have specs which are free? Ie. could
others get the specs too? (Contacting broadcom doesn't help, I've tried
that).

--
Thomas

2002-02-15 14:43:57

by Joe

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

You could grab the rh rawhide kernel source
rpm, which is 2.4.17 + many patches -

Joe

Frank Elsner wrote:

>I'm currently using kernel 2.4.9-21 from RH, after updating my RH 7.2.
>
>I want to build a custom kernel 2.4.17 but
>the "Broadcom 5700/5701 Gigabit Ethernet Adapter" (which I need)
>isn't in the source. Obviously an addon from RH.
>
>Why isn't the driver in the main kernel tree ?
>
>Kind regards _______________________________________________________________
>Frank Elsner / c/o Technische Universitaet Berlin |
> ____________/ ZRZ, Sekr. E-N 50 |
>| Einsteinufer 17 |
>| Voice: +49 30 314 23897 D-10587 Berlin |
>| SMTP : [email protected] Germany ________________________|
>|____________________________________________________| Und das ist auch gut so
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>


2002-02-15 14:46:18

by Jeff Garzik

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Thomas Lang?s wrote:
>
> Jeff Garzik:
> > DaveM and I should have something eventually, which will make the
> > RH-shipped driver irrelevant.
>
> How's this coming along? Do you have specs which are free? Ie. could
> others get the specs too? (Contacting broadcom doesn't help, I've tried
> that).

I wish. The only info source we have is their latest GPL'd driver.

It's coming along slowly at the moment... I haven't had time to mess
with it for a few months, and I not DaveM was originally supposed to be
filling in the rx/tx dma stuff, and h/w init. DaveM jumped in recently
and played a bit with the h/w init stage.

My guess is maybe another month or two until others can play with
"tg3"...

Jeff


--
Jeff Garzik | "I went through my candy like hot oatmeal
Building 1024 | through an internally-buttered weasel."
MandrakeSoft | - goats.com

2002-02-15 14:55:49

by Thomas Langås

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Jeff Garzik:
> It's coming along slowly at the moment... I haven't had time to mess
> with it for a few months, and I not DaveM was originally supposed to be
> filling in the rx/tx dma stuff, and h/w init. DaveM jumped in recently
> and played a bit with the h/w init stage.

Is it possible for others to get axs to the work you guys have already done?

--
Thomas

2002-02-15 15:00:39

by Jeff Garzik

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Thomas Lang?s wrote:
>
> Jeff Garzik:
> > It's coming along slowly at the moment... I haven't had time to mess
> > with it for a few months, and I not DaveM was originally supposed to be
> > filling in the rx/tx dma stuff, and h/w init. DaveM jumped in recently
> > and played a bit with the h/w init stage.
>
> Is it possible for others to get axs to the work you guys have already done?

Everything is checked into vger cvs

--
Jeff Garzik | "I went through my candy like hot oatmeal
Building 1024 | through an internally-buttered weasel."
MandrakeSoft | - goats.com

2002-02-15 15:04:09

by Jason Lunz

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

In mlist.linux-kernel, you wrote:
> Cuz the driver is a piece of crap, and BroadCom isn't interested in
> working with the open source community to fix up the issues.

Can you elaborate? What are the issues? I've found the broadcomm driver
to be more robust than the in-kernel one for acenic cards. With acenic,
I've had a null-pointer deref on SMP and other lockups where I wasn't
lucky enough to get an oops.

Also, broadcomm-driven cards can be put in a bridge. An acenic/bridge
combination will crash the kernel hard when tcp traverses the bridge.

> DaveM and I should have something eventually, which will make the
> RH-shipped driver irrelevant.

that would be oh-so-nice. Do you need cards to play with? I've got a
couple of 3com broadcomm-chipset cards I prabably won't be needing.

--
Jason Lunz Trellis Network Security
[email protected] http://www.trellisinc.com/

2002-02-15 15:36:25

by Thomas Langås

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Jeff Garzik:
> > Is it possible for others to get axs to the work you guys have already done?
> Everything is checked into vger cvs

Ok, found it :)

--
Thomas

2002-02-15 20:23:02

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Thomas Lang?s <[email protected]>
Date: Fri, 15 Feb 2002 15:36:04 +0100

How's this coming along? Do you have specs which are free? Ie. could
others get the specs too? (Contacting broadcom doesn't help, I've tried
that).

No, we've been reverse engineering the hardware using the sources of
Broadcom's driver. This is why the work is taking so long.

2002-02-19 00:18:45

by Thomas Langås

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

David S. Miller:
> No, we've been reverse engineering the hardware using the sources of
> Broadcom's driver. This is why the work is taking so long.

Ok, I've downloaded the driver, and tried building as a modue to a
2.4.17-kernel. It segfaulted when I tried loading it (since it says it's not
done, I wasn't expecting it to work :). However, my question is; how do you
guys develope network drivers, for instance? I mean, in order to test a new
version (after the first has segfaulted), I need to reboot.

I've got the broadcom-version dated 2th january 2002, and wanted to try and
implement small parts of missing code, mostly for educational purposes to
learn more about kernel programming.

--
Thomas

2002-02-26 20:09:55

by Jes Sorensen

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Jason Lunz <[email protected]> writes:

> In mlist.linux-kernel, you wrote:
> > Cuz the driver is a piece of crap, and BroadCom isn't interested in
> > working with the open source community to fix up the issues.
>
> Can you elaborate? What are the issues? I've found the broadcomm driver
> to be more robust than the in-kernel one for acenic cards. With acenic,
> I've had a null-pointer deref on SMP and other lockups where I wasn't
> lucky enough to get an oops.

Ehm, it would be nice to get some more details on this. Few people
have reported problems with the acenic driver for a long time, except
for certain highmem configs and a problem with very old Tigon I cards.

Jes

2002-02-27 14:12:49

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Hello,

quick additional question concerning this topic:
If I were free to buy any Gigabit Adapter, what would be the known-to-work choice (including existence of a GPL driver, of course)?

Regards,
Stephan

2002-03-10 15:35:41

by Harald Welte

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Wed, Feb 27, 2002 at 03:12:18PM +0100, Stephan von Krawczynski wrote:
> Hello,
>
> quick additional question concerning this topic:
> If I were free to buy any Gigabit Adapter, what would be the known-to-work
> choice (including existence of a GPL driver, of course)?

>From my point of view, there is no 'perfect' choice.

You can buy bcm57xx based boards, where the chipset is nice but the driver
not really nice yet.

You can buy syskonnect sk98 boards, which definitely have a good chipset -
but the driver doesn't support the tcp transmit zerocopy path yet. I've
tried to put some pressure on SysKonnect about this - but they seem a bit
'slow'.

You can buy natsemi boards, which is a more-or-less crappy chipset, but
there is a nice linux driver.

Summary:

Old acenic boards are still the best solution - but there are no longer
available for quite some time :(

> Regards,
> Stephan

--
Live long and prosper
- Harald Welte / [email protected] http://www.gnumonks.org/
============================================================================
GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M+
V-- PS++ PE-- Y++ PGP++ t+ 5-- !X !R tv-- b+++ !DI !D G+ e* h--- r++ y+(*)

2002-03-10 19:10:54

by Jeff Garzik

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Harald Welte wrote:
>
> On Wed, Feb 27, 2002 at 03:12:18PM +0100, Stephan von Krawczynski wrote:
> > Hello,
> >
> > quick additional question concerning this topic:
> > If I were free to buy any Gigabit Adapter, what would be the known-to-work
> > choice (including existence of a GPL driver, of course)?
>
> >From my point of view, there is no 'perfect' choice.
>
> You can buy bcm57xx based boards, where the chipset is nice but the driver
> not really nice yet.

What's not nice about "tg3"?

Have you tested it, and have bugs to report?

--
Jeff Garzik | Usenet Rule #2 (John Gilmore): "The Net interprets
Building 1024 | censorship as damage and routes around it."
MandrakeSoft |

2002-03-10 21:37:30

by Harald Welte

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Sun, Mar 10, 2002 at 02:10:38PM -0500, Jeff Garzik wrote:

> > You can buy bcm57xx based boards, where the chipset is nice but the driver
> > not really nice yet.
>
> What's not nice about "tg3"?

I was referring to the old bcm57xx drivers, not to the new 'tg3' driver
(about which I've read only after sending this mail).

I definitely appreciate the work on the tg3 driver done by you, DaveM and
others - no offense.

> Jeff Garzik | Usenet Rule #2 (John Gilmore): "The Net interprets

--
Live long and prosper
- Harald Welte / [email protected] http://www.gnumonks.org/
============================================================================
GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M+
V-- PS++ PE-- Y++ PGP++ t+ 5-- !X !R tv-- b+++ !DI !D G+ e* h--- r++ y+(*)

2002-03-11 00:44:48

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Harald Welte <[email protected]>
Date: Sun, 10 Mar 2002 16:33:39 +0100

You can buy bcm57xx based boards, where the chipset is nice but the driver
not really nice yet.

My tg3 driver sucks then right? Could you send me a bug report?

You can buy syskonnect sk98 boards, which definitely have a good chipset -
but the driver doesn't support the tcp transmit zerocopy path yet. I've
tried to put some pressure on SysKonnect about this - but they seem a bit
'slow'.

The hardware is not capable of doing it, due to bugs in the hw
checksum implementation of the sk98 chipset. They aren't being
"slow" they just can't possibly implement it for you.

Franks a lot,
David S. Miller
[email protected]

2002-03-11 00:56:13

by Richard Gooch

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

David S. Miller writes:
> From: Harald Welte <[email protected]>
> Date: Sun, 10 Mar 2002 16:33:39 +0100
>
> You can buy bcm57xx based boards, where the chipset is nice but the driver
> not really nice yet.
>
> My tg3 driver sucks then right? Could you send me a bug report?
>
> You can buy syskonnect sk98 boards, which definitely have a good chipset -
> but the driver doesn't support the tcp transmit zerocopy path yet. I've
> tried to put some pressure on SysKonnect about this - but they seem a bit
> 'slow'.
>
> The hardware is not capable of doing it, due to bugs in the hw
> checksum implementation of the sk98 chipset. They aren't being
> "slow" they just can't possibly implement it for you.

So what is currently the best combination of gige card/driver/cost?
What do you recommend to the budget-conscious?

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2002-03-11 01:07:15

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Richard Gooch <[email protected]>
Date: Sun, 10 Mar 2002 17:55:44 -0700

David S. Miller writes:
> The hardware is not capable of doing it, due to bugs in the hw
> checksum implementation of the sk98 chipset. They aren't being
> "slow" they just can't possibly implement it for you.

So what is currently the best combination of gige card/driver/cost?
What do you recommend to the budget-conscious?

I can only tell you what I know performance wise about cards,
and currently it looks like:

1) Intel E1000
2) Tigon2, aka. Acenic
3) SysKonnect sk98, but has broken TX checksums. If it had
working TX checksums it would be in 2nd place instead of Acenic.
This hw bug is essentially why Acenics were used for all the
TUX benchmarks runs instead of SysKonnect cards.
4) Tigon3, aka. bcm57xx

This may surprise some people, but frankly I think the Tigon3's PCI
dma engine is junk based upon my current knowledge of the card. It is
always possible I may find out something new which kills this
perception I have of the card, but we'll see...

All the cards listed above have good GPL'd drivers available.

2002-03-11 01:15:25

by Richard Gooch

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

David S. Miller writes:
> From: Richard Gooch <[email protected]>
> Date: Sun, 10 Mar 2002 17:55:44 -0700
>
> David S. Miller writes:
> > The hardware is not capable of doing it, due to bugs in the hw
> > checksum implementation of the sk98 chipset. They aren't being
> > "slow" they just can't possibly implement it for you.
>
> So what is currently the best combination of gige card/driver/cost?
> What do you recommend to the budget-conscious?
>
> I can only tell you what I know performance wise about cards,
> and currently it looks like:
>
> 1) Intel E1000
> 2) Tigon2, aka. Acenic
> 3) SysKonnect sk98, but has broken TX checksums. If it had
> working TX checksums it would be in 2nd place instead of Acenic.
> This hw bug is essentially why Acenics were used for all the
> TUX benchmarks runs instead of SysKonnect cards.
> 4) Tigon3, aka. bcm57xx
>
> This may surprise some people, but frankly I think the Tigon3's PCI
> dma engine is junk based upon my current knowledge of the card. It is
> always possible I may find out something new which kills this
> perception I have of the card, but we'll see...

I note the Intel card is pretty expensive. What are these "Addtron"
cards I see listed on http://www.pricewatch.com for US$36 ?Is that a
supported card under another name?

> All the cards listed above have good GPL'd drivers available.

When is the E1000 driver going to be added to the kernel?

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2002-03-11 01:34:58

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Richard Gooch <[email protected]>
Date: Sun, 10 Mar 2002 18:14:56 -0700

I note the Intel card is pretty expensive. What are these "Addtron"
cards I see listed on http://www.pricewatch.com for US$36 ?Is that a
supported card under another name?

Probably Natsemi chipset based, it's not a good performer at all.


> All the cards listed above have good GPL'd drivers available.

When is the E1000 driver going to be added to the kernel?

It's in 2.5.x already... It will hit 2.4.x as soon as Intel's
QA signs off on what Jeff Garzik currently has (which is in VGER
2.4.x CVS branch btw if you want to check it out).

2002-03-11 01:31:38

by Jeff Garzik

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

Richard Gooch wrote:
>
> David S. Miller writes:
> > From: Richard Gooch <[email protected]>
> > Date: Sun, 10 Mar 2002 17:55:44 -0700
> >
> > David S. Miller writes:
> > > The hardware is not capable of doing it, due to bugs in the hw
> > > checksum implementation of the sk98 chipset. They aren't being
> > > "slow" they just can't possibly implement it for you.
> >
> > So what is currently the best combination of gige card/driver/cost?
> > What do you recommend to the budget-conscious?
> >
> > I can only tell you what I know performance wise about cards,
> > and currently it looks like:
> >
> > 1) Intel E1000
> > 2) Tigon2, aka. Acenic
> > 3) SysKonnect sk98, but has broken TX checksums. If it had
> > working TX checksums it would be in 2nd place instead of Acenic.
> > This hw bug is essentially why Acenics were used for all the
> > TUX benchmarks runs instead of SysKonnect cards.
> > 4) Tigon3, aka. bcm57xx
> >
> > This may surprise some people, but frankly I think the Tigon3's PCI
> > dma engine is junk based upon my current knowledge of the card. It is
> > always possible I may find out something new which kills this
> > perception I have of the card, but we'll see...
>
> I note the Intel card is pretty expensive. What are these "Addtron"
> cards I see listed on http://www.pricewatch.com for US$36 ?Is that a
> supported card under another name?
>
> > All the cards listed above have good GPL'd drivers available.
>
> When is the E1000 driver going to be added to the kernel?

It's already in 2.5.

It will be added to 2.4.x after further testing and review in 2.5.x.

Jeff



--
Jeff Garzik | Usenet Rule #2 (John Gilmore): "The Net interprets
Building 1024 | censorship as damage and routes around it."
MandrakeSoft |

2002-03-11 02:05:55

by Wayne Whitney

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

In mailing-lists.linux-kernel, Richard Gooch wrote:

> What are these "Addtron" cards I see listed on http://www.pricewatch.com
> for US$36? Is that a supported card under another name?

This card seems to be based on the National Semiconductors DP83820,
for which there is an in-kernel driver. Note that US$36 only gets you
the 32-bit/33MHz PCI version (AEG-320T); the 64-bit version (AEG-620T)
goes for about twice as much (according to google).

There is also the D-Link DGE-550T, a 64-bit/66MHz card starting at
US$80 (according to pricewatch). It apparently uses a different
in-kernel driver, dl2k.o.

So does anyone have any comments on the stability and performance of
these cards/drivers?

Cheers,
Wayne

2002-03-11 02:08:35

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Wayne Whitney <[email protected]>
Date: Sun, 10 Mar 2002 18:05:10 -0800

So does anyone have any comments on the stability and performance of
these cards/drivers?

As I said in a previous email the natsemi chips don't perform
too well.

2002-03-11 02:08:35

by Ben Collins

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Sun, Mar 10, 2002 at 05:03:38PM -0800, David S. Miller wrote:
> From: Richard Gooch <[email protected]>
> Date: Sun, 10 Mar 2002 17:55:44 -0700
>
> David S. Miller writes:
> > The hardware is not capable of doing it, due to bugs in the hw
> > checksum implementation of the sk98 chipset. They aren't being
> > "slow" they just can't possibly implement it for you.
>
> So what is currently the best combination of gige card/driver/cost?
> What do you recommend to the budget-conscious?
>
> I can only tell you what I know performance wise about cards,
> and currently it looks like:
>
> 1) Intel E1000
> 2) Tigon2, aka. Acenic
> 3) SysKonnect sk98, but has broken TX checksums. If it had
> working TX checksums it would be in 2nd place instead of Acenic.
> This hw bug is essentially why Acenics were used for all the
> TUX benchmarks runs instead of SysKonnect cards.
> 4) Tigon3, aka. bcm57xx

How does SysKonnect 9Dxx compare? The driver's not in the kernel tree,
but it is available on their website. Cards seem fairly priced (I have
two, but didn't buy them, and haven't run any tests across them).

--
.----------=======-=-======-=========-----------=====------------=-=-----.
/ Ben Collins -- Debian GNU/Linux -- WatchGuard.com \
` [email protected] -- [email protected] '
`---=========------=======-------------=-=-----=-===-======-------=--=---'

2002-03-11 02:11:15

by Richard Gooch

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

David S. Miller writes:
> From: Wayne Whitney <[email protected]>
> Date: Sun, 10 Mar 2002 18:05:10 -0800
>
> So does anyone have any comments on the stability and performance of
> these cards/drivers?
>
> As I said in a previous email the natsemi chips don't perform
> too well.

As Wayne said:
> There is also the D-Link DGE-550T, a 64-bit/66MHz card starting at
> US$80 (according to pricewatch). It apparently uses a different
> in-kernel driver, dl2k.o.

So this is a different chip from the natsemi, right?

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2002-03-11 02:16:45

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Ben Collins <[email protected]>
Date: Sun, 10 Mar 2002 21:04:53 -0500

How does SysKonnect 9Dxx compare?

SysKonnect 9D == Tigon3 and is fully supported by our tg3 driver
:-)))))))

2002-03-11 02:19:25

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Richard Gooch <[email protected]>
Date: Sun, 10 Mar 2002 19:10:46 -0700

As Wayne said:
> There is also the D-Link DGE-550T, a 64-bit/66MHz card starting at
> US$80 (according to pricewatch). It apparently uses a different
> in-kernel driver, dl2k.o.

So this is a different chip from the natsemi, right?

Yes. And from a cursory glance the dl2k.o driver seems to even
be quite portable. I haven't tested it out myself though.

I have no idea how this thing performs, but it does look like
it has a couple of hardware bugs.

2002-03-11 02:22:25

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Sun, Mar 10, 2002 at 06:04:56PM -0800, David S. Miller wrote:
> From: Wayne Whitney <[email protected]>
> Date: Sun, 10 Mar 2002 18:05:10 -0800
>
> So does anyone have any comments on the stability and performance of
> these cards/drivers?
>
> As I said in a previous email the natsemi chips don't perform
> too well.

That's my fault. The version of the driver in the kernel atm sucks in
performance; I'll try to spend the day needed on the driver this week
and it should get up to ~800mbit from the current mess. Getting NAPI
in the kernel would help... ;-)

-ben
--
"A man with a bass just walked in,
and he's putting it down
on the floor."

2002-03-11 02:25:55

by Ben Collins

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Sun, Mar 10, 2002 at 06:13:08PM -0800, David S. Miller wrote:
> From: Ben Collins <[email protected]>
> Date: Sun, 10 Mar 2002 21:04:53 -0500
>
> How does SysKonnect 9Dxx compare?
>
> SysKonnect 9D == Tigon3 and is fully supported by our tg3 driver
> :-)))))))

Ahh...so now I can ditch their driver :) You may want to update the
description so that's more clear.

Thanks

--
.----------=======-=-======-=========-----------=====------------=-=-----.
/ Ben Collins -- Debian GNU/Linux -- WatchGuard.com \
` [email protected] -- [email protected] '
`---=========------=======-------------=-=-----=-===-======-------=--=---'

2002-03-11 02:27:45

by Mike Fedyk

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Sun, Mar 10, 2002 at 09:22:10PM -0500, Benjamin LaHaise wrote:
> On Sun, Mar 10, 2002 at 06:04:56PM -0800, David S. Miller wrote:
> > From: Wayne Whitney <[email protected]>
> > Date: Sun, 10 Mar 2002 18:05:10 -0800
> >
> > So does anyone have any comments on the stability and performance of
> > these cards/drivers?
> >
> > As I said in a previous email the natsemi chips don't perform
> > too well.
>
> That's my fault. The version of the driver in the kernel atm sucks in
> performance; I'll try to spend the day needed on the driver this week
> and it should get up to ~800mbit from the current mess. Getting NAPI
> in the kernel would help... ;-)
>

What is happening with NAPI anyway?

2002-03-11 02:34:17

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Benjamin LaHaise <[email protected]>
Date: Sun, 10 Mar 2002 21:22:10 -0500

That's my fault. The version of the driver in the kernel atm sucks in
performance; I'll try to spend the day needed on the driver this week
and it should get up to ~800mbit from the current mess. Getting NAPI
in the kernel would help... ;-)

Syskonnect sk98 with jumbo frames gets ~107MB/sec TCP bandwidth
without NAPI, there is no reason other cards cannot go full speed as
well.

NAPI is really only going to help with high packet rates not with
thinks like raw bandwidth tests.

2002-03-11 02:35:05

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Ben Collins <[email protected]>
Date: Sun, 10 Mar 2002 21:22:12 -0500

Ahh...so now I can ditch their driver :) You may want to update the
description so that's more clear.

There are many boards based upon the Tigon3 chipset, I don't
mention any of them specifically in any of the documentation.
I don't even know what the full list is.

2002-03-11 02:36:05

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Mike Fedyk <[email protected]>
Date: Sun, 10 Mar 2002 18:28:21 -0800

What is happening with NAPI anyway?

Pending inclusion into 2.5.x once I get my existing networking patches
pushed to Linus first.

It may be backported to 2.4.x one day, but I personally don't think
that is such a great idea for the time being. Maybe in a month or
two, but not right now.

2002-03-11 03:15:54

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Sun, Mar 10, 2002 at 06:30:33PM -0800, David S. Miller wrote:
> Syskonnect sk98 with jumbo frames gets ~107MB/sec TCP bandwidth
> without NAPI, there is no reason other cards cannot go full speed as
> well.
>
> NAPI is really only going to help with high packet rates not with
> thinks like raw bandwidth tests.

Well, the thing that hurts the 83820 is that its interrupt
mitigation capabilities are rather limited. This is where napi
helps: by turning off the rx interrupt for the duration of packet
processing, cpu cycles aren't wasted on excess rx irqs.

As to the lack of bandwidth, it stems from far too much interrupt
overhead and the currently braindead attempt at irq mitigation.
Since the last time I worked on it, a number of potential techniques
have come up that should bring it into the 100MB realm (assuming it
doesn't get trampled on by ksoftirqd).

-ben
--
"A man with a bass just walked in,
and he's putting it down
on the floor."

2002-03-11 04:20:52

by Michael Clark

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Monday, March 11, 2002, at 11:15 AM, Benjamin LaHaise wrote:

> On Sun, Mar 10, 2002 at 06:30:33PM -0800, David S. Miller wrote:
>> Syskonnect sk98 with jumbo frames gets ~107MB/sec TCP bandwidth
>> without NAPI, there is no reason other cards cannot go full speed as
>> well.
>>
>> NAPI is really only going to help with high packet rates not with
>> thinks like raw bandwidth tests.
>
> Well, the thing that hurts the 83820 is that its interrupt
> mitigation capabilities are rather limited. This is where napi
> helps: by turning off the rx interrupt for the duration of packet
> processing, cpu cycles aren't wasted on excess rx irqs.
>
> As to the lack of bandwidth, it stems from far too much interrupt
> overhead and the currently braindead attempt at irq mitigation.
> Since the last time I worked on it, a number of potential techniques
> have come up that should bring it into the 100MB realm (assuming it
> doesn't get trampled on by ksoftirqd).

What about jumbo frames? I notice this comment in the driver "disable
jumbo frames to avoid tx hangs". I'm getting ~550Mb/sec from a single
TCP stream and ~700Mb/sec with 2 in parallel. Jumbo frames would
probably improve this quite a bit.

~mc

2002-03-11 04:28:53

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Mon, Mar 11, 2002 at 12:20:26PM +0800, Michael Clark wrote:
> What about jumbo frames? I notice this comment in the driver "disable
> jumbo frames to avoid tx hangs". I'm getting ~550Mb/sec from a single
> TCP stream and ~700Mb/sec with 2 in parallel. Jumbo frames would
> probably improve this quite a bit.

Jumbo frames work up to RX_BUF_SIZE. Hint: any mtu you try to specify
that the driver lets you set should work.

-ben
--
"A man with a bass just walked in,
and he's putting it down
on the floor."

2002-03-11 06:09:53

by Harald Welte

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

On Sun, Mar 10, 2002 at 04:41:13PM -0800, David Miller wrote:
> From: Harald Welte <[email protected]>
> Date: Sun, 10 Mar 2002 16:33:39 +0100
>
> You can buy bcm57xx based boards, where the chipset is nice but the driver
> not really nice yet.
>
> My tg3 driver sucks then right? Could you send me a bug report?

As stated in the other mail to Jeff Garzik, I was talking about the bcm57xx
driver, _NOT_ about the new tg3. Sorry for not making this clear in the
original mail.

> The hardware is not capable of doing it, due to bugs in the hw
> checksum implementation of the sk98 chipset. They aren't being
> "slow" they just can't possibly implement it for you.

Ouch. Thanks for dropping me this note.

> David S. Miller

--
Live long and prosper
- Harald Welte / [email protected] http://www.gnumonks.org/
============================================================================
GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M+
V-- PS++ PE-- Y++ PGP++ t+ 5-- !X !R tv-- b+++ !DI !D G+ e* h--- r++ y+(*)

2002-03-11 19:49:38

by Richard Gooch

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

David S. Miller writes:
> From: Benjamin LaHaise <[email protected]>
> Date: Sun, 10 Mar 2002 21:22:10 -0500
>
> That's my fault. The version of the driver in the kernel atm sucks in
> performance; I'll try to spend the day needed on the driver this week
> and it should get up to ~800mbit from the current mess. Getting NAPI
> in the kernel would help... ;-)
>
> Syskonnect sk98 with jumbo frames gets ~107MB/sec TCP bandwidth
> without NAPI, there is no reason other cards cannot go full speed as
> well.
>
> NAPI is really only going to help with high packet rates not with
> thinks like raw bandwidth tests.

You're saying that people should just go and use jumbo frames? Isn't
that a problem for mixed 10/100/1000 LANs?

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2002-03-12 05:40:59

by Benjamin LaHaise

[permalink] [raw]
Subject: [patch] ns83820 0.17 (Re: Broadcom 5700/5701 Gigabit Ethernet Adapters)

On Sun, Mar 10, 2002 at 06:30:33PM -0800, David S. Miller wrote:
> Syskonnect sk98 with jumbo frames gets ~107MB/sec TCP bandwidth
> without NAPI, there is no reason other cards cannot go full speed as
> well.
>
> NAPI is really only going to help with high packet rates not with
> thinks like raw bandwidth tests.

A day's tweaking later, and I'm getting 810mbit/s with netperf between
two Athlons with default settings (1500 byte packets). What I've found
is that increasing the size of the RX/TX rings or the max sizes of the
tcp r/wmem backlogs really slows things down, so I'm not doing that
anymore. The pair of P3s shows 262mbit/s (up from 67).

Interrupt mitigation is now pretty stupid, but it helped: the irq
handler disables the rx interrupt and then triggers a tasklet to run
through the rx ring. The tasklet later enables rx interrupts again.
More tweaking tomorrow...

Marcelo, please apply the patch below to the next 2.4 prepatch: it
also has a fix for a tx hang problem, and a few other nasties. Thanks!

-ben
--
"A man with a bass just walked in,
and he's putting it down
on the floor."


--- kernels/2.4/v2.4.19-pre2/drivers/net/ns83820.c Thu Mar 7 16:40:00 2002
+++ ns-2.4.19-pre2/drivers/net/ns83820.c Tue Mar 12 00:09:32 2002
@@ -1,7 +1,7 @@
-#define _VERSION "0.15"
+#define _VERSION "0.17"
/* ns83820.c by Benjamin LaHaise <[email protected]> with contributions.
*
- * $Revision: 1.34.2.12 $
+ * $Revision: 1.34.2.14 $
*
* Copyright 2001 Benjamin LaHaise.
* Copyright 2001 Red Hat.
@@ -51,6 +51,8 @@
* suppress duplicate link status messages
* 20011117 0.14 - ethtool GDRVINFO, GLINK support from jgarzik
* 20011204 0.15 get ppc (big endian) working
+ * 20011218 0.16 various cleanups
+ * 20020310 0.17 speedups
*
* Driver Overview
* ===============
@@ -93,8 +95,8 @@
#include <linux/in.h> /* for IPPROTO_... */
#include <linux/eeprom.h>
#include <linux/compiler.h>
+#include <linux/prefetch.h>
#include <linux/ethtool.h>
-//#include <linux/skbrefill.h>

#include <asm/io.h>
#include <asm/uaccess.h>
@@ -154,10 +156,16 @@
#endif

/* tunables */
-#define RX_BUF_SIZE 6144 /* 8192 */
-#define NR_RX_DESC 256
+#define RX_BUF_SIZE 1500 /* 8192 */

-#define NR_TX_DESC 256
+/* Must not exceed ~65000. */
+#define NR_RX_DESC 64
+#define NR_TX_DESC 64
+
+/* not tunable */
+#define REAL_RX_BUF_SIZE (RX_BUF_SIZE + 14) /* rx/tx mac addr + type */
+
+#define MIN_TX_DESC_FREE 8

/* register defines */
#define CFGCS 0x04
@@ -408,7 +416,8 @@

struct sk_buff *skbs[NR_RX_DESC];

- unsigned next_rx, next_empty;
+ u32 *next_rx_desc;
+ u16 next_rx, next_empty;

u32 *descs;
dma_addr_t phy_descs;
@@ -423,6 +432,7 @@
struct pci_dev *pci_dev;

struct rx_info rx_info;
+ struct tasklet_struct rx_tasklet;

unsigned ihr;
struct tq_struct tq_refill;
@@ -441,10 +451,11 @@
spinlock_t tx_lock;

long tx_idle;
- u32 tx_done_idx;
- u32 tx_idx;
- volatile u32 tx_free_idx; /* idx of free desc chain */
- u32 tx_intr_idx;
+
+ u16 tx_done_idx;
+ u16 tx_idx;
+ volatile u16 tx_free_idx; /* idx of free desc chain */
+ u16 tx_intr_idx;

struct sk_buff *tx_skbs[NR_TX_DESC];

@@ -455,7 +466,7 @@

//free = (tx_done_idx + NR_TX_DESC-2 - free_idx) % NR_TX_DESC
#define start_tx_okay(dev) \
- (((NR_TX_DESC-2 + dev->tx_done_idx - dev->tx_free_idx) % NR_TX_DESC) > NR_TX_DESC/2)
+ (((NR_TX_DESC-2 + dev->tx_done_idx - dev->tx_free_idx) % NR_TX_DESC) > MIN_TX_DESC_FREE)


/* Packet Receiver
@@ -509,7 +520,7 @@
next_empty = dev->rx_info.next_empty;

/* don't overrun last rx marker */
- if (nr_rx_empty(dev) <= 2) {
+ if (unlikely(nr_rx_empty(dev) <= 2)) {
kfree_skb(skb);
return 1;
}
@@ -523,34 +534,39 @@
#endif

sg = dev->rx_info.descs + (next_empty * DESC_SIZE);
- if (dev->rx_info.skbs[next_empty])
+ if (unlikely(NULL != dev->rx_info.skbs[next_empty]))
BUG();
dev->rx_info.skbs[next_empty] = skb;

dev->rx_info.next_empty = (next_empty + 1) % NR_RX_DESC;
- cmdsts = RX_BUF_SIZE | CMDSTS_INTR;
- buf = pci_map_single(dev->pci_dev, skb->tail, RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
+ cmdsts = REAL_RX_BUF_SIZE | CMDSTS_INTR;
+ buf = pci_map_single(dev->pci_dev, skb->tail,
+ REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
build_rx_desc(dev, sg, 0, buf, cmdsts, 0);
/* update link of previous rx */
- if (next_empty != dev->rx_info.next_rx)
+ if (likely(next_empty != dev->rx_info.next_rx))
dev->rx_info.descs[((NR_RX_DESC + next_empty - 1) % NR_RX_DESC) * DESC_SIZE] = cpu_to_le32(dev->rx_info.phy_descs + (next_empty * DESC_SIZE * 4));

return 0;
}

-static int rx_refill(struct ns83820 *dev, int gfp)
+static inline int rx_refill(struct ns83820 *dev, int gfp)
{
unsigned i;
long flags = 0;

+ if (unlikely(nr_rx_empty(dev) <= 2))
+ return 0;
+
dprintk("rx_refill(%p)\n", dev);
if (gfp == GFP_ATOMIC)
spin_lock_irqsave(&dev->rx_info.lock, flags);
for (i=0; i<NR_RX_DESC; i++) {
struct sk_buff *skb;
long res;
- skb = __dev_alloc_skb(RX_BUF_SIZE+16, gfp);
- if (!skb)
+ /* extra 16 bytes for alignment */
+ skb = __dev_alloc_skb(REAL_RX_BUF_SIZE+16, gfp);
+ if (unlikely(!skb))
break;

res = (long)skb->tail & 0xf;
@@ -575,6 +591,12 @@
return i ? 0 : -ENOMEM;
}

+static void FASTCALL(rx_refill_atomic(struct ns83820 *dev));
+static void rx_refill_atomic(struct ns83820 *dev)
+{
+ rx_refill(dev, GFP_ATOMIC);
+}
+
/* REFILL */
static inline void queue_refill(void *_dev)
{
@@ -590,6 +612,7 @@
build_rx_desc(dev, dev->rx_info.descs + (DESC_SIZE * i), 0, 0, CMDSTS_OWN, 0);
}

+static void FASTCALL(phy_intr(struct ns83820 *dev));
static void phy_intr(struct ns83820 *dev)
{
static char *speeds[] = { "10", "100", "1000", "1000(?)", "1000F" };
@@ -600,7 +623,6 @@
cfg = readl(dev->base + CFG) ^ SPDSTS_POLARITY;

if (dev->CFG_cache & CFG_TBI_EN) {
-
/* we have an optical transceiver */
tbisr = readl(dev->base + TBISR);
tanar = readl(dev->base + TANAR);
@@ -646,20 +668,24 @@
new_cfg = dev->CFG_cache & ~(CFG_SB | CFG_MODE_1000 | CFG_SPDSTS);

if (cfg & CFG_SPDSTS1)
- new_cfg |= CFG_MODE_1000 | CFG_SB;
+ new_cfg |= CFG_MODE_1000;
else
- new_cfg &= ~CFG_MODE_1000 | CFG_SB;
+ new_cfg &= ~CFG_MODE_1000;

- if ((cfg & CFG_LNKSTS) && ((new_cfg ^ dev->CFG_cache) & CFG_MODE_1000)) {
+ speed = ((cfg / CFG_SPDSTS0) & 3);
+ fullduplex = (cfg & CFG_DUPSTS);
+
+ if (fullduplex)
+ new_cfg |= CFG_SB;
+
+ if ((cfg & CFG_LNKSTS) &&
+ ((new_cfg ^ dev->CFG_cache) & CFG_MODE_1000)) {
writel(new_cfg, dev->base + CFG);
dev->CFG_cache = new_cfg;
}

dev->CFG_cache &= ~CFG_SPDSTS;
dev->CFG_cache |= cfg & CFG_SPDSTS;
-
- speed = ((cfg / CFG_SPDSTS0) & 3);
- fullduplex = (cfg & CFG_DUPSTS);
}

newlinkstate = (cfg & CFG_LNKSTS) ? LINK_UP : LINK_DOWN;
@@ -690,6 +716,7 @@

dev->rx_info.idle = 1;
dev->rx_info.next_rx = 0;
+ dev->rx_info.next_rx_desc = dev->rx_info.descs;
dev->rx_info.next_empty = 0;

for (i=0; i<NR_RX_DESC; i++)
@@ -724,7 +751,7 @@
dev->IMR_cache |= ISR_RXDESC;
dev->IMR_cache |= ISR_RXIDLE;
dev->IMR_cache |= ISR_TXDESC;
- //dev->IMR_cache |= ISR_TXIDLE;
+ dev->IMR_cache |= ISR_TXIDLE;

writel(dev->IMR_cache, dev->base + IMR);
writel(1, dev->base + IER);
@@ -770,6 +797,41 @@
}
}

+/* I hate the network stack sometimes */
+#ifdef __i386__
+#define skb_mangle_for_davem(skb,len) (skb)
+#else
+static inline struct sk_buff *skb_mangle_for_davem(struct sk_buff *skb, int len)
+{
+ tmp = __dev_alloc_skb(len+2, GFP_ATOMIC);
+ if (!tmp)
+ goto done;
+ tmp->dev = &dev->net_dev;
+ skb_reserve(tmp, 2);
+ memcpy(skb_put(tmp, len), skb->data, len);
+ kfree_skb(skb);
+ return tmp;
+}
+#endif
+
+static void FASTCALL(ns83820_rx_kick(struct ns83820 *dev));
+static void ns83820_rx_kick(struct ns83820 *dev)
+{
+ /*if (nr_rx_empty(dev) >= NR_RX_DESC/4)*/ {
+ if (dev->rx_info.up) {
+ rx_refill_atomic(dev);
+ kick_rx(dev);
+ }
+ }
+
+ if (dev->rx_info.up && nr_rx_empty(dev) > NR_RX_DESC*3/4)
+ schedule_task(&dev->tq_refill);
+ else
+ kick_rx(dev);
+ if (dev->rx_info.idle)
+ Dprintk("BAD\n");
+}
+
/* rx_irq
*
*/
@@ -785,10 +847,10 @@
dprintk("rx_irq(%p)\n", dev);
dprintk("rxdp: %08x, descs: %08lx next_rx[%d]: %p next_empty[%d]: %p\n",
readl(dev->base + RXDP),
- (dev->rx_info.phy_descs),
- dev->rx_info.next_rx,
+ (long)(dev->rx_info.phy_descs),
+ (int)dev->rx_info.next_rx,
(dev->rx_info.descs + (DESC_SIZE * dev->rx_info.next_rx)),
- dev->rx_info.next_empty,
+ (int)dev->rx_info.next_empty,
(dev->rx_info.descs + (DESC_SIZE * dev->rx_info.next_empty))
);

@@ -798,7 +860,7 @@

dprintk("walking descs\n");
next_rx = info->next_rx;
- desc = info->descs + (DESC_SIZE * next_rx);
+ desc = info->next_rx_desc;
while ((CMDSTS_OWN & (cmdsts = le32_to_cpu(desc[CMDSTS]))) &&
(cmdsts != CMDSTS_OWN)) {
struct sk_buff *skb;
@@ -813,29 +875,17 @@
info->skbs[next_rx] = NULL;
info->next_rx = (next_rx + 1) % NR_RX_DESC;

- barrier();
+ mb();
clear_rx_desc(dev, next_rx);

pci_unmap_single(dev->pci_dev, bufptr,
RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
- if (CMDSTS_OK & cmdsts) {
-#if 0 //ndef __i386__
- struct sk_buff *tmp;
-#endif
+ if (likely(CMDSTS_OK & cmdsts)) {
int len = cmdsts & 0xffff;
- if (!skb)
- BUG();
skb_put(skb, len);
-#if 0 //ndef __i386__ /* I hate the network stack sometimes */
- tmp = __dev_alloc_skb(RX_BUF_SIZE+16, GFP_ATOMIC);
- if (!tmp)
- goto done;
- tmp->dev = &dev->net_dev;
- skb_reserve(tmp, 2);
- memcpy(skb_put(tmp, len), skb->data, len);
- kfree_skb(skb);
- skb = tmp;
-#endif
+ skb = skb_mangle_for_davem(skb, len);
+ if (unlikely(!skb))
+ goto netdev_mangle_me_harder_failed;
if (cmdsts & CMDSTS_DEST_MULTI)
dev->stats.multicast ++;
dev->stats.rx_packets ++;
@@ -846,11 +896,10 @@
skb->ip_summed = CHECKSUM_NONE;
}
skb->protocol = eth_type_trans(skb, &dev->net_dev);
- if (NET_RX_DROP == netif_rx(skb))
+ if (NET_RX_DROP == netif_rx(skb)) {
+netdev_mangle_me_harder_failed:
dev->stats.rx_dropped ++;
-#if 0 //ndef __i386__
- done:;
-#endif
+ }
} else {
kfree_skb(skb);
}
@@ -860,6 +909,7 @@
desc = info->descs + (DESC_SIZE * next_rx);
}
info->next_rx = next_rx;
+ info->next_rx_desc = info->descs + (DESC_SIZE * next_rx);

out:
if (0 && !nr) {
@@ -869,6 +919,15 @@
spin_unlock_irqrestore(&info->lock, flags);
}

+static void rx_action(unsigned long _dev)
+{
+ struct ns83820 *dev = (void *)_dev;
+ rx_irq(dev);
+ writel(0x002, dev->base + IHR);
+ writel(dev->IMR_cache | ISR_RXDESC, dev->base + IMR);
+ rx_irq(dev);
+ ns83820_rx_kick(dev);
+}

/* Packet Transmit code
*/
@@ -879,7 +938,9 @@
writel(CR_TXE, dev->base + CR);
}

-/* no spinlock needed on the transmit irq path as the interrupt handler is serialized */
+/* No spinlock needed on the transmit irq path as the interrupt handler is
+ * serialized.
+ */
static void do_tx_done(struct ns83820 *dev)
{
u32 cmdsts, tx_done_idx, *desc;
@@ -917,7 +978,7 @@
tx_done_idx = (tx_done_idx + 1) % NR_TX_DESC;
dev->tx_done_idx = tx_done_idx;
desc[CMDSTS] = cpu_to_le32(0);
- barrier();
+ mb();
desc = dev->tx_descs + (tx_done_idx * DESC_SIZE);
}

@@ -952,7 +1013,6 @@
* while trying to track down a bug in either the zero copy code or
* the tx fifo (hence the MAX_FRAG_LEN).
*/
-#define MAX_FRAG_LEN 8192 /* disabled for now */
static int ns83820_hard_start_xmit(struct sk_buff *skb, struct net_device *_dev)
{
struct ns83820 *dev = (struct ns83820 *)_dev;
@@ -970,9 +1030,9 @@

nr_frags = skb_shinfo(skb)->nr_frags;
again:
- if (__builtin_expect(dev->CFG_cache & CFG_LNKSTS, 0)) {
+ if (unlikely(dev->CFG_cache & CFG_LNKSTS)) {
netif_stop_queue(&dev->net_dev);
- if (__builtin_expect(dev->CFG_cache & CFG_LNKSTS, 0))
+ if (unlikely(dev->CFG_cache & CFG_LNKSTS))
return 1;
netif_start_queue(&dev->net_dev);
}
@@ -981,7 +1041,7 @@
tx_done_idx = dev->tx_done_idx;
nr_free = (tx_done_idx + NR_TX_DESC-2 - free_idx) % NR_TX_DESC;
nr_free -= 1;
- if ((nr_free <= nr_frags) || (nr_free <= 8192 / MAX_FRAG_LEN)) {
+ if (nr_free <= nr_frags) {
dprintk("stop_queue - not enough(%p)\n", dev);
netif_stop_queue(&dev->net_dev);

@@ -996,11 +1056,11 @@

if (free_idx == dev->tx_intr_idx) {
do_intr = 1;
- dev->tx_intr_idx = (dev->tx_intr_idx + NR_TX_DESC/2) % NR_TX_DESC;
+ dev->tx_intr_idx = (dev->tx_intr_idx + NR_TX_DESC/4) % NR_TX_DESC;
}

nr_free -= nr_frags;
- if (nr_free < 1) {
+ if (nr_free < MIN_TX_DESC_FREE) {
dprintk("stop_queue - last entry(%p)\n", dev);
netif_stop_queue(&dev->net_dev);
stopped = 1;
@@ -1028,14 +1088,6 @@
for (;;) {
volatile u32 *desc = dev->tx_descs + (free_idx * DESC_SIZE);
u32 residue = 0;
-#if 0
- if (len > MAX_FRAG_LEN) {
- residue = len;
- /* align the start address of the next fragment */
- len = MAX_FRAG_LEN;
- residue -= len;
- }
-#endif

dprintk("frag[%3u]: %4u @ 0x%08Lx\n", free_idx, len,
(unsigned long long)buf);
@@ -1084,6 +1136,7 @@
{
u8 *base = dev->base;

+ /* the DP83820 will freeze counters, so we need to read all of them */
dev->stats.rx_errors += readl(base + 0x60) & 0xffff;
dev->stats.rx_crc_errors += readl(base + 0x64) & 0xffff;
dev->stats.rx_missed_errors += readl(base + 0x68) & 0xffff;
@@ -1162,54 +1215,54 @@
}
}

+static void ns83820_mib_isr(struct ns83820 *dev)
+{
+ spin_lock(&dev->misc_lock);
+ ns83820_update_stats(dev);
+ spin_unlock(&dev->misc_lock);
+}
+
static void ns83820_irq(int foo, void *data, struct pt_regs *regs)
{
struct ns83820 *dev = data;
- int count = 0;
u32 isr;
dprintk("ns83820_irq(%p)\n", dev);

dev->ihr = 0;

- while (count++ < 32 && (isr = readl(dev->base + ISR))) {
- dprintk("irq: %08x\n", isr);
-
- if (isr & ~(ISR_PHY | ISR_RXDESC | ISR_RXEARLY | ISR_RXOK | ISR_RXERR | ISR_TXIDLE | ISR_TXOK | ISR_TXDESC))
- Dprintk("odd isr? 0x%08x\n", isr);
-
- if ((ISR_RXEARLY | ISR_RXIDLE | ISR_RXORN | ISR_RXDESC | ISR_RXOK | ISR_RXERR) & isr) {
- if (ISR_RXIDLE & isr) {
- dev->rx_info.idle = 1;
- Dprintk("oh dear, we are idle\n");
- }
+ isr = readl(dev->base + ISR);
+ dprintk("irq: %08x\n", isr);

- if ((ISR_RXDESC) & isr) {
- rx_irq(dev);
- writel(4, dev->base + IHR);
- }
-
- if (nr_rx_empty(dev) >= NR_RX_DESC/4) {
- if (dev->rx_info.up) {
- rx_refill(dev, GFP_ATOMIC);
- kick_rx(dev);
- }
- }
+#ifdef DEBUG
+ if (isr & ~(ISR_PHY | ISR_RXDESC | ISR_RXEARLY | ISR_RXOK | ISR_RXERR | ISR_TXIDLE | ISR_TXOK | ISR_TXDESC))
+ Dprintk("odd isr? 0x%08x\n", isr);
+#endif

- if (dev->rx_info.up && nr_rx_empty(dev) > NR_RX_DESC*3/4)
- schedule_task(&dev->tq_refill);
- else
- kick_rx(dev);
- if (dev->rx_info.idle)
- Dprintk("BAD\n");
+ if (ISR_RXIDLE & isr) {
+ dev->rx_info.idle = 1;
+ Dprintk("oh dear, we are idle\n");
+ ns83820_rx_kick(dev);
+ }
+
+ if ((ISR_RXDESC | ISR_RXOK) & isr) {
+ prefetch(dev->rx_info.next_rx_desc);
+ writel(dev->IMR_cache & ~(ISR_RXDESC | ISR_RXOK), dev->base + IMR);
+ tasklet_schedule(&dev->rx_tasklet);
+ //rx_irq(dev);
+ //writel(4, dev->base + IHR);
}

+ if ((ISR_RXIDLE | ISR_RXORN | ISR_RXDESC | ISR_RXOK | ISR_RXERR) & isr)
+ ns83820_rx_kick(dev);
+
if (unlikely(ISR_RXSOVR & isr)) {
- Dprintk("overrun: rxsovr\n");
- dev->stats.rx_over_errors ++;
+ //printk("overrun: rxsovr\n");
+ dev->stats.rx_fifo_errors ++;
}
+
if (unlikely(ISR_RXORN & isr)) {
- Dprintk("overrun: rxorn\n");
- dev->stats.rx_over_errors ++;
+ //printk("overrun: rxorn\n");
+ dev->stats.rx_fifo_errors ++;
}

if ((ISR_RXRCMP & isr) && dev->rx_info.up)
@@ -1241,15 +1294,11 @@
if ((ISR_TXDESC | ISR_TXIDLE) & isr)
do_tx_done(dev);

- if (ISR_MIB & isr) {
- spin_lock(&dev->misc_lock);
- ns83820_update_stats(dev);
- spin_unlock(&dev->misc_lock);
- }
+ if (unlikely(ISR_MIB & isr))
+ ns83820_mib_isr(dev);

- if (ISR_PHY & isr)
+ if (unlikely(ISR_PHY & isr))
phy_intr(dev);
- }

#if 0 /* Still working on the interrupt mitigation strategy */
if (dev->ihr)
@@ -1412,6 +1461,7 @@
dev->net_dev.owner = THIS_MODULE;

PREPARE_TQUEUE(&dev->tq_refill, queue_refill, dev);
+ tasklet_init(&dev->rx_tasklet, rx_action, (unsigned long)dev);

err = pci_enable_device(pci_dev);
if (err) {
@@ -1430,8 +1480,9 @@
if (!dev->base || !dev->tx_descs || !dev->rx_info.descs)
goto out_disable;

- dprintk("%p: %08lx %p: %08lx\n", dev->tx_descs, dev->tx_phy_descs,
- dev->rx_info.descs, dev->rx_info.phy_descs);
+ dprintk("%p: %08lx %p: %08lx\n",
+ dev->tx_descs, (long)dev->tx_phy_descs,
+ dev->rx_info.descs, (long)dev->rx_info.phy_descs);
/* disable interrupts */
writel(0, dev->base + IMR);
writel(0, dev->base + IER);
@@ -1484,14 +1535,14 @@
dev->CFG_cache = readl(dev->base + CFG);

if ((dev->CFG_cache & CFG_PCI64_DET)) {
- printk("%s: enabling 64 bit PCI addressing.\n",
+ printk("%s: detected 64 bit PCI data bus.\n",
dev->net_dev.name);
- dev->CFG_cache |= CFG_T64ADDR | CFG_DATA64_EN;
-#if defined(USE_64BIT_ADDR)
- dev->net_dev.features |= NETIF_F_HIGHDMA;
-#endif
+ /*dev->CFG_cache |= CFG_DATA64_EN;*/
+ if (!(dev->CFG_cache & CFG_DATA64_EN))
+ printk("%s: EEPROM did not enable 64 bit bus. Disabled.\n",
+ dev->net_dev.name);
} else
- dev->CFG_cache &= ~(CFG_T64ADDR | CFG_DATA64_EN);
+ dev->CFG_cache &= ~(CFG_DATA64_EN);

dev->CFG_cache &= (CFG_TBI_EN | CFG_MRM_DIS | CFG_MWI_DIS |
CFG_T64ADDR | CFG_DATA64_EN | CFG_EXT_125 |
@@ -1528,8 +1579,12 @@
writel(dev->CFG_cache, dev->base + CFG);
dprintk("CFG: %08x\n", dev->CFG_cache);

+#if 1 /* Huh? This sets the PCI latency register. Should be done via
+ * the PCI layer. FIXME.
+ */
if (readl(dev->base + SRR))
writel(readl(dev->base+0x20c) | 0xfe00, dev->base + 0x20c);
+#endif

/* Note! The DMA burst size interacts with packet
* transmission, such that the largest packet that
@@ -1543,13 +1598,15 @@
/* Flush the interrupt holdoff timer */
writel(0x000, dev->base + IHR);
writel(0x100, dev->base + IHR);
+ writel(0x000, dev->base + IHR);

/* Set Rx to full duplex, don't accept runt, errored, long or length
- * range errored packets. Set MXDMA to 7 => 512 word burst
+ * range errored packets. Set MXDMA to 0 => 1024 word burst
*/
writel(RXCFG_AEP | RXCFG_ARP | RXCFG_AIRL | RXCFG_RX_FD
+ | RXCFG_STRIPCRC
| RXCFG_ALP
- | RXCFG_MXDMA | 0, dev->base + RXCFG);
+ | (RXCFG_MXDMA0 * 0) | 0, dev->base + RXCFG);

/* Disable priority queueing */
writel(0, dev->base + PQCR);
@@ -1576,7 +1633,11 @@
dev->net_dev.features |= NETIF_F_SG;
dev->net_dev.features |= NETIF_F_IP_CSUM;
#if defined(USE_64BIT_ADDR) || defined(CONFIG_HIGHMEM4G)
- dev->net_dev.features |= NETIF_F_HIGHDMA;
+ if ((dev->CFG_cache & CFG_T64ADDR)) {
+ printk(KERN_INFO "%s: using 64 bit addressing.\n",
+ dev->net_dev.name);
+ dev->net_dev.features |= NETIF_F_HIGHDMA;
+ }
#endif

printk(KERN_INFO "%s: ns83820 v" VERSION ": DP83820 v%u.%u: %02x:%02x:%02x:%02x:%02x:%02x io=0x%08lx irq=%d f=%s\n",
@@ -1587,7 +1648,7 @@
dev->net_dev.dev_addr[2], dev->net_dev.dev_addr[3],
dev->net_dev.dev_addr[4], dev->net_dev.dev_addr[5],
addr, pci_dev->irq,
- (dev->net_dev.features & NETIF_F_HIGHDMA) ? "sg" : "h,sg"
+ (dev->net_dev.features & NETIF_F_HIGHDMA) ? "h,sg" : "sg"
);

return 0;

2002-03-12 06:08:12

by David Miller

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

From: Richard Gooch <[email protected]>
Date: Mon, 11 Mar 2002 12:48:43 -0700

David S. Miller writes:
> NAPI is really only going to help with high packet rates not with
> thinks like raw bandwidth tests.

You're saying that people should just go and use jumbo frames? Isn't
that a problem for mixed 10/100/1000 LANs?

No, I'm saying that the current situation is fine with most cards
and most uses.

Ben pointed out that interrupt-mitigation challenged cards like the
NatSemi do gain, but that is the only case I can imagine at this
time.

Unless you have a card like the NatSemi (no interrupt mitigation) or
your interfaces are being hit with 120,000 packets per second EACH,
then NAPI is not going to be an explosive gain for you.

Look, we were able to get world records in web serving without NAPI,
right? :-)

2002-03-12 06:20:33

by Richard Gooch

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

David S. Miller writes:
> From: Richard Gooch <[email protected]>
> Date: Mon, 11 Mar 2002 12:48:43 -0700
>
> David S. Miller writes:
> > NAPI is really only going to help with high packet rates not with
> > thinks like raw bandwidth tests.
>
> You're saying that people should just go and use jumbo frames? Isn't
> that a problem for mixed 10/100/1000 LANs?
>
> No, I'm saying that the current situation is fine with most cards
> and most uses.
>
> Ben pointed out that interrupt-mitigation challenged cards like the
> NatSemi do gain, but that is the only case I can imagine at this
> time.
>
> Unless you have a card like the NatSemi (no interrupt mitigation) or
> your interfaces are being hit with 120,000 packets per second EACH,
> then NAPI is not going to be an explosive gain for you.
>
> Look, we were able to get world records in web serving without NAPI,
> right? :-)

:-) I'd be happy to get near 1 Gb/s (800 Mb/s is acceptable) with a
cheap card and MTU=1500. Ben's message about his tweaks is
encouraging. Pity the P3 is so piss-poor.

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2002-03-12 11:00:30

by Michael Clark

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17 (Re: Broadcom 5700/5701 Gigabit Ethernet Adapters)

Works for me great too! I see about a 25% boost in performance
between 2 Dual 1Ghz PIIIs on a single TCP stream and 1500 byte
packets. Up from 550Mb/s -> 690Mb/s (82MB/s).

Dave, what performance do you get with the sk98 using normal size
frames? (to compare apples with apples). BTW - i can't try jumbo
frames due to my crappy 3com gig switch.

~mc

On Tuesday, March 12, 2002, at 01:40 PM, Benjamin LaHaise wrote:

> On Sun, Mar 10, 2002 at 06:30:33PM -0800, David S. Miller wrote:
>> Syskonnect sk98 with jumbo frames gets ~107MB/sec TCP bandwidth
>> without NAPI, there is no reason other cards cannot go full speed as
>> well.
>>
>> NAPI is really only going to help with high packet rates not with
>> thinks like raw bandwidth tests.
>
> A day's tweaking later, and I'm getting 810mbit/s with netperf between
> two Athlons with default settings (1500 byte packets). What I've found
> is that increasing the size of the RX/TX rings or the max sizes of the
> tcp r/wmem backlogs really slows things down, so I'm not doing that
> anymore. The pair of P3s shows 262mbit/s (up from 67).
>
> Interrupt mitigation is now pretty stupid, but it helped: the irq
> handler disables the rx interrupt and then triggers a tasklet to run
> through the rx ring. The tasklet later enables rx interrupts again.
> More tweaking tomorrow...
>
> Marcelo, please apply the patch below to the next 2.4 prepatch: it
> also has a fix for a tx hang problem, and a few other nasties. Thanks!
>
> -ben
> --
> "A man with a bass just walked in,
> and he's putting it down
> on the floor."
>
>
> --- kernels/2.4/v2.4.19-pre2/drivers/net/ns83820.c Thu Mar 7 16:40:00
> 2002
> +++ ns-2.4.19-pre2/drivers/net/ns83820.c Tue Mar 12 00:09:32 2002
> @@ -1,7 +1,7 @@
> -#define _VERSION "0.15"
> +#define _VERSION "0.17"
> /* ns83820.c by Benjamin LaHaise <[email protected]> with contributions.
> *
> - * $Revision: 1.34.2.12 $
> + * $Revision: 1.34.2.14 $
> *
> * Copyright 2001 Benjamin LaHaise.
> * Copyright 2001 Red Hat.
> @@ -51,6 +51,8 @@
> * suppress duplicate link status messages
> * 20011117 0.14 - ethtool GDRVINFO, GLINK support from jgarzik
> * 20011204 0.15 get ppc (big endian) working
> + * 20011218 0.16 various cleanups
> + * 20020310 0.17 speedups
> *
> * Driver Overview
> * ===============
> @@ -93,8 +95,8 @@
> #include <linux/in.h> /* for IPPROTO_... */
> #include <linux/eeprom.h>
> #include <linux/compiler.h>
> +#include <linux/prefetch.h>
> #include <linux/ethtool.h>
> -//#include <linux/skbrefill.h>
>
> #include <asm/io.h>
> #include <asm/uaccess.h>
> @@ -154,10 +156,16 @@
> #endif
>
> /* tunables */
> -#define RX_BUF_SIZE 6144 /* 8192 */
> -#define NR_RX_DESC 256
> +#define RX_BUF_SIZE 1500 /* 8192 */
>
> -#define NR_TX_DESC 256
> +/* Must not exceed ~65000. */
> +#define NR_RX_DESC 64
> +#define NR_TX_DESC 64
> +
> +/* not tunable */
> +#define REAL_RX_BUF_SIZE (RX_BUF_SIZE + 14) /* rx/tx mac addr +
> type */
> +
> +#define MIN_TX_DESC_FREE 8
>
> /* register defines */
> #define CFGCS 0x04
> @@ -408,7 +416,8 @@
>
> struct sk_buff *skbs[NR_RX_DESC];
>
> - unsigned next_rx, next_empty;
> + u32 *next_rx_desc;
> + u16 next_rx, next_empty;
>
> u32 *descs;
> dma_addr_t phy_descs;
> @@ -423,6 +432,7 @@
> struct pci_dev *pci_dev;
>
> struct rx_info rx_info;
> + struct tasklet_struct rx_tasklet;
>
> unsigned ihr;
> struct tq_struct tq_refill;
> @@ -441,10 +451,11 @@
> spinlock_t tx_lock;
>
> long tx_idle;
> - u32 tx_done_idx;
> - u32 tx_idx;
> - volatile u32 tx_free_idx; /* idx of free desc chain */
> - u32 tx_intr_idx;
> +
> + u16 tx_done_idx;
> + u16 tx_idx;
> + volatile u16 tx_free_idx; /* idx of free desc chain */
> + u16 tx_intr_idx;
>
> struct sk_buff *tx_skbs[NR_TX_DESC];
>
> @@ -455,7 +466,7 @@
>
> //free = (tx_done_idx + NR_TX_DESC-2 - free_idx) % NR_TX_DESC
> #define start_tx_okay(dev) \
> - (((NR_TX_DESC-2 + dev->tx_done_idx - dev->tx_free_idx) %
> NR_TX_DESC) > NR_TX_DESC/2)
> + (((NR_TX_DESC-2 + dev->tx_done_idx - dev->tx_free_idx) %
> NR_TX_DESC) > MIN_TX_DESC_FREE)
>
>
> /* Packet Receiver
> @@ -509,7 +520,7 @@
> next_empty = dev->rx_info.next_empty;
>
> /* don't overrun last rx marker */
> - if (nr_rx_empty(dev) <= 2) {
> + if (unlikely(nr_rx_empty(dev) <= 2)) {
> kfree_skb(skb);
> return 1;
> }
> @@ -523,34 +534,39 @@
> #endif
>
> sg = dev->rx_info.descs + (next_empty * DESC_SIZE);
> - if (dev->rx_info.skbs[next_empty])
> + if (unlikely(NULL != dev->rx_info.skbs[next_empty]))
> BUG();
> dev->rx_info.skbs[next_empty] = skb;
>
> dev->rx_info.next_empty = (next_empty + 1) % NR_RX_DESC;
> - cmdsts = RX_BUF_SIZE | CMDSTS_INTR;
> - buf = pci_map_single(dev->pci_dev, skb->tail, RX_BUF_SIZE,
> PCI_DMA_FROMDEVICE);
> + cmdsts = REAL_RX_BUF_SIZE | CMDSTS_INTR;
> + buf = pci_map_single(dev->pci_dev, skb->tail,
> + REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
> build_rx_desc(dev, sg, 0, buf, cmdsts, 0);
> /* update link of previous rx */
> - if (next_empty != dev->rx_info.next_rx)
> + if (likely(next_empty != dev->rx_info.next_rx))
> dev->rx_info.descs[((NR_RX_DESC + next_empty - 1) %
> NR_RX_DESC) * DESC_SIZE] = cpu_to_le32(dev->rx_info.phy_descs +
> (next_empty * DESC_SIZE * 4));
>
> return 0;
> }
>
> -static int rx_refill(struct ns83820 *dev, int gfp)
> +static inline int rx_refill(struct ns83820 *dev, int gfp)
> {
> unsigned i;
> long flags = 0;
>
> + if (unlikely(nr_rx_empty(dev) <= 2))
> + return 0;
> +
> dprintk("rx_refill(%p)\n", dev);
> if (gfp == GFP_ATOMIC)
> spin_lock_irqsave(&dev->rx_info.lock, flags);
> for (i=0; i<NR_RX_DESC; i++) {
> struct sk_buff *skb;
> long res;
> - skb = __dev_alloc_skb(RX_BUF_SIZE+16, gfp);
> - if (!skb)
> + /* extra 16 bytes for alignment */
> + skb = __dev_alloc_skb(REAL_RX_BUF_SIZE+16, gfp);
> + if (unlikely(!skb))
> break;
>
> res = (long)skb->tail & 0xf;
> @@ -575,6 +591,12 @@
> return i ? 0 : -ENOMEM;
> }
>
> +static void FASTCALL(rx_refill_atomic(struct ns83820 *dev));
> +static void rx_refill_atomic(struct ns83820 *dev)
> +{
> + rx_refill(dev, GFP_ATOMIC);
> +}
> +
> /* REFILL */
> static inline void queue_refill(void *_dev)
> {
> @@ -590,6 +612,7 @@
> build_rx_desc(dev, dev->rx_info.descs + (DESC_SIZE * i), 0, 0,
> CMDSTS_OWN, 0);
> }
>
> +static void FASTCALL(phy_intr(struct ns83820 *dev));
> static void phy_intr(struct ns83820 *dev)
> {
> static char *speeds[] = { "10", "100", "1000", "1000(?)", "1000F" };
> @@ -600,7 +623,6 @@
> cfg = readl(dev->base + CFG) ^ SPDSTS_POLARITY;
>
> if (dev->CFG_cache & CFG_TBI_EN) {
> -
> /* we have an optical transceiver */
> tbisr = readl(dev->base + TBISR);
> tanar = readl(dev->base + TANAR);
> @@ -646,20 +668,24 @@
> new_cfg = dev->CFG_cache & ~(CFG_SB | CFG_MODE_1000 | CFG_SPDSTS);
>
> if (cfg & CFG_SPDSTS1)
> - new_cfg |= CFG_MODE_1000 | CFG_SB;
> + new_cfg |= CFG_MODE_1000;
> else
> - new_cfg &= ~CFG_MODE_1000 | CFG_SB;
> + new_cfg &= ~CFG_MODE_1000;
>
> - if ((cfg & CFG_LNKSTS) && ((new_cfg ^ dev->CFG_cache) &
> CFG_MODE_1000)) {
> + speed = ((cfg / CFG_SPDSTS0) & 3);
> + fullduplex = (cfg & CFG_DUPSTS);
> +
> + if (fullduplex)
> + new_cfg |= CFG_SB;
> +
> + if ((cfg & CFG_LNKSTS) &&
> + ((new_cfg ^ dev->CFG_cache) & CFG_MODE_1000)) {
> writel(new_cfg, dev->base + CFG);
> dev->CFG_cache = new_cfg;
> }
>
> dev->CFG_cache &= ~CFG_SPDSTS;
> dev->CFG_cache |= cfg & CFG_SPDSTS;
> -
> - speed = ((cfg / CFG_SPDSTS0) & 3);
> - fullduplex = (cfg & CFG_DUPSTS);
> }
>
> newlinkstate = (cfg & CFG_LNKSTS) ? LINK_UP : LINK_DOWN;
> @@ -690,6 +716,7 @@
>
> dev->rx_info.idle = 1;
> dev->rx_info.next_rx = 0;
> + dev->rx_info.next_rx_desc = dev->rx_info.descs;
> dev->rx_info.next_empty = 0;
>
> for (i=0; i<NR_RX_DESC; i++)
> @@ -724,7 +751,7 @@
> dev->IMR_cache |= ISR_RXDESC;
> dev->IMR_cache |= ISR_RXIDLE;
> dev->IMR_cache |= ISR_TXDESC;
> - //dev->IMR_cache |= ISR_TXIDLE;
> + dev->IMR_cache |= ISR_TXIDLE;
>
> writel(dev->IMR_cache, dev->base + IMR);
> writel(1, dev->base + IER);
> @@ -770,6 +797,41 @@
> }
> }
>
> +/* I hate the network stack sometimes */
> +#ifdef __i386__
> +#define skb_mangle_for_davem(skb,len) (skb)
> +#else
> +static inline struct sk_buff *skb_mangle_for_davem(struct sk_buff
> *skb, int len)
> +{
> + tmp = __dev_alloc_skb(len+2, GFP_ATOMIC);
> + if (!tmp)
> + goto done;
> + tmp->dev = &dev->net_dev;
> + skb_reserve(tmp, 2);
> + memcpy(skb_put(tmp, len), skb->data, len);
> + kfree_skb(skb);
> + return tmp;
> +}
> +#endif
> +
> +static void FASTCALL(ns83820_rx_kick(struct ns83820 *dev));
> +static void ns83820_rx_kick(struct ns83820 *dev)
> +{
> + /*if (nr_rx_empty(dev) >= NR_RX_DESC/4)*/ {
> + if (dev->rx_info.up) {
> + rx_refill_atomic(dev);
> + kick_rx(dev);
> + }
> + }
> +
> + if (dev->rx_info.up && nr_rx_empty(dev) > NR_RX_DESC*3/4)
> + schedule_task(&dev->tq_refill);
> + else
> + kick_rx(dev);
> + if (dev->rx_info.idle)
> + Dprintk("BAD\n");
> +}
> +
> /* rx_irq
> *
> */
> @@ -785,10 +847,10 @@
> dprintk("rx_irq(%p)\n", dev);
> dprintk("rxdp: %08x, descs: %08lx next_rx[%d]: %p next_empty[%d]:
> %p\n",
> readl(dev->base + RXDP),
> - (dev->rx_info.phy_descs),
> - dev->rx_info.next_rx,
> + (long)(dev->rx_info.phy_descs),
> + (int)dev->rx_info.next_rx,
> (dev->rx_info.descs + (DESC_SIZE * dev->rx_info.next_rx)),
> - dev->rx_info.next_empty,
> + (int)dev->rx_info.next_empty,
> (dev->rx_info.descs + (DESC_SIZE * dev->rx_info.next_empty))
> );
>
> @@ -798,7 +860,7 @@
>
> dprintk("walking descs\n");
> next_rx = info->next_rx;
> - desc = info->descs + (DESC_SIZE * next_rx);
> + desc = info->next_rx_desc;
> while ((CMDSTS_OWN & (cmdsts = le32_to_cpu(desc[CMDSTS]))) &&
> (cmdsts != CMDSTS_OWN)) {
> struct sk_buff *skb;
> @@ -813,29 +875,17 @@
> info->skbs[next_rx] = NULL;
> info->next_rx = (next_rx + 1) % NR_RX_DESC;
>
> - barrier();
> + mb();
> clear_rx_desc(dev, next_rx);
>
> pci_unmap_single(dev->pci_dev, bufptr,
> RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
> - if (CMDSTS_OK & cmdsts) {
> -#if 0 //ndef __i386__
> - struct sk_buff *tmp;
> -#endif
> + if (likely(CMDSTS_OK & cmdsts)) {
> int len = cmdsts & 0xffff;
> - if (!skb)
> - BUG();
> skb_put(skb, len);
> -#if 0 //ndef __i386__ /* I hate the network stack sometimes */
> - tmp = __dev_alloc_skb(RX_BUF_SIZE+16, GFP_ATOMIC);
> - if (!tmp)
> - goto done;
> - tmp->dev = &dev->net_dev;
> - skb_reserve(tmp, 2);
> - memcpy(skb_put(tmp, len), skb->data, len);
> - kfree_skb(skb);
> - skb = tmp;
> -#endif
> + skb = skb_mangle_for_davem(skb, len);
> + if (unlikely(!skb))
> + goto netdev_mangle_me_harder_failed;
> if (cmdsts & CMDSTS_DEST_MULTI)
> dev->stats.multicast ++;
> dev->stats.rx_packets ++;
> @@ -846,11 +896,10 @@
> skb->ip_summed = CHECKSUM_NONE;
> }
> skb->protocol = eth_type_trans(skb, &dev->net_dev);
> - if (NET_RX_DROP == netif_rx(skb))
> + if (NET_RX_DROP == netif_rx(skb)) {
> +netdev_mangle_me_harder_failed:
> dev->stats.rx_dropped ++;
> -#if 0 //ndef __i386__
> - done:;
> -#endif
> + }
> } else {
> kfree_skb(skb);
> }
> @@ -860,6 +909,7 @@
> desc = info->descs + (DESC_SIZE * next_rx);
> }
> info->next_rx = next_rx;
> + info->next_rx_desc = info->descs + (DESC_SIZE * next_rx);
>
> out:
> if (0 && !nr) {
> @@ -869,6 +919,15 @@
> spin_unlock_irqrestore(&info->lock, flags);
> }
>
> +static void rx_action(unsigned long _dev)
> +{
> + struct ns83820 *dev = (void *)_dev;
> + rx_irq(dev);
> + writel(0x002, dev->base + IHR);
> + writel(dev->IMR_cache | ISR_RXDESC, dev->base + IMR);
> + rx_irq(dev);
> + ns83820_rx_kick(dev);
> +}
>
> /* Packet Transmit code
> */
> @@ -879,7 +938,9 @@
> writel(CR_TXE, dev->base + CR);
> }
>
> -/* no spinlock needed on the transmit irq path as the interrupt
> handler is serialized */
> +/* No spinlock needed on the transmit irq path as the interrupt
> handler is
> + * serialized.
> + */
> static void do_tx_done(struct ns83820 *dev)
> {
> u32 cmdsts, tx_done_idx, *desc;
> @@ -917,7 +978,7 @@
> tx_done_idx = (tx_done_idx + 1) % NR_TX_DESC;
> dev->tx_done_idx = tx_done_idx;
> desc[CMDSTS] = cpu_to_le32(0);
> - barrier();
> + mb();
> desc = dev->tx_descs + (tx_done_idx * DESC_SIZE);
> }
>
> @@ -952,7 +1013,6 @@
> * while trying to track down a bug in either the zero copy code or
> * the tx fifo (hence the MAX_FRAG_LEN).
> */
> -#define MAX_FRAG_LEN 8192 /* disabled for now */
> static int ns83820_hard_start_xmit(struct sk_buff *skb, struct
> net_device *_dev)
> {
> struct ns83820 *dev = (struct ns83820 *)_dev;
> @@ -970,9 +1030,9 @@
>
> nr_frags = skb_shinfo(skb)->nr_frags;
> again:
> - if (__builtin_expect(dev->CFG_cache & CFG_LNKSTS, 0)) {
> + if (unlikely(dev->CFG_cache & CFG_LNKSTS)) {
> netif_stop_queue(&dev->net_dev);
> - if (__builtin_expect(dev->CFG_cache & CFG_LNKSTS, 0))
> + if (unlikely(dev->CFG_cache & CFG_LNKSTS))
> return 1;
> netif_start_queue(&dev->net_dev);
> }
> @@ -981,7 +1041,7 @@
> tx_done_idx = dev->tx_done_idx;
> nr_free = (tx_done_idx + NR_TX_DESC-2 - free_idx) % NR_TX_DESC;
> nr_free -= 1;
> - if ((nr_free <= nr_frags) || (nr_free <= 8192 / MAX_FRAG_LEN)) {
> + if (nr_free <= nr_frags) {
> dprintk("stop_queue - not enough(%p)\n", dev);
> netif_stop_queue(&dev->net_dev);
>
> @@ -996,11 +1056,11 @@
>
> if (free_idx == dev->tx_intr_idx) {
> do_intr = 1;
> - dev->tx_intr_idx = (dev->tx_intr_idx + NR_TX_DESC/2) % NR_TX_DESC;
> + dev->tx_intr_idx = (dev->tx_intr_idx + NR_TX_DESC/4) % NR_TX_DESC;
> }
>
> nr_free -= nr_frags;
> - if (nr_free < 1) {
> + if (nr_free < MIN_TX_DESC_FREE) {
> dprintk("stop_queue - last entry(%p)\n", dev);
> netif_stop_queue(&dev->net_dev);
> stopped = 1;
> @@ -1028,14 +1088,6 @@
> for (;;) {
> volatile u32 *desc = dev->tx_descs + (free_idx * DESC_SIZE);
> u32 residue = 0;
> -#if 0
> - if (len > MAX_FRAG_LEN) {
> - residue = len;
> - /* align the start address of the next fragment */
> - len = MAX_FRAG_LEN;
> - residue -= len;
> - }
> -#endif
>
> dprintk("frag[%3u]: %4u @ 0x%08Lx\n", free_idx, len,
> (unsigned long long)buf);
> @@ -1084,6 +1136,7 @@
> {
> u8 *base = dev->base;
>
> + /* the DP83820 will freeze counters, so we need to read all of
> them */
> dev->stats.rx_errors += readl(base + 0x60) & 0xffff;
> dev->stats.rx_crc_errors += readl(base + 0x64) & 0xffff;
> dev->stats.rx_missed_errors += readl(base + 0x68) & 0xffff;
> @@ -1162,54 +1215,54 @@
> }
> }
>
> +static void ns83820_mib_isr(struct ns83820 *dev)
> +{
> + spin_lock(&dev->misc_lock);
> + ns83820_update_stats(dev);
> + spin_unlock(&dev->misc_lock);
> +}
> +
> static void ns83820_irq(int foo, void *data, struct pt_regs *regs)
> {
> struct ns83820 *dev = data;
> - int count = 0;
> u32 isr;
> dprintk("ns83820_irq(%p)\n", dev);
>
> dev->ihr = 0;
>
> - while (count++ < 32 && (isr = readl(dev->base + ISR))) {
> - dprintk("irq: %08x\n", isr);
> -
> - if (isr & ~(ISR_PHY | ISR_RXDESC | ISR_RXEARLY | ISR_RXOK |
> ISR_RXERR | ISR_TXIDLE | ISR_TXOK | ISR_TXDESC))
> - Dprintk("odd isr? 0x%08x\n", isr);
> -
> - if ((ISR_RXEARLY | ISR_RXIDLE | ISR_RXORN | ISR_RXDESC |
> ISR_RXOK | ISR_RXERR) & isr) {
> - if (ISR_RXIDLE & isr) {
> - dev->rx_info.idle = 1;
> - Dprintk("oh dear, we are idle\n");
> - }
> + isr = readl(dev->base + ISR);
> + dprintk("irq: %08x\n", isr);
>
> - if ((ISR_RXDESC) & isr) {
> - rx_irq(dev);
> - writel(4, dev->base + IHR);
> - }
> -
> - if (nr_rx_empty(dev) >= NR_RX_DESC/4) {
> - if (dev->rx_info.up) {
> - rx_refill(dev, GFP_ATOMIC);
> - kick_rx(dev);
> - }
> - }
> +#ifdef DEBUG
> + if (isr & ~(ISR_PHY | ISR_RXDESC | ISR_RXEARLY | ISR_RXOK |
> ISR_RXERR | ISR_TXIDLE | ISR_TXOK | ISR_TXDESC))
> + Dprintk("odd isr? 0x%08x\n", isr);
> +#endif
>
> - if (dev->rx_info.up && nr_rx_empty(dev) > NR_RX_DESC*3/4)
> - schedule_task(&dev->tq_refill);
> - else
> - kick_rx(dev);
> - if (dev->rx_info.idle)
> - Dprintk("BAD\n");
> + if (ISR_RXIDLE & isr) {
> + dev->rx_info.idle = 1;
> + Dprintk("oh dear, we are idle\n");
> + ns83820_rx_kick(dev);
> + }
> +
> + if ((ISR_RXDESC | ISR_RXOK) & isr) {
> + prefetch(dev->rx_info.next_rx_desc);
> + writel(dev->IMR_cache & ~(ISR_RXDESC | ISR_RXOK), dev->base + IMR);
> + tasklet_schedule(&dev->rx_tasklet);
> + //rx_irq(dev);
> + //writel(4, dev->base + IHR);
> }
>
> + if ((ISR_RXIDLE | ISR_RXORN | ISR_RXDESC | ISR_RXOK | ISR_RXERR) &
> isr)
> + ns83820_rx_kick(dev);
> +
> if (unlikely(ISR_RXSOVR & isr)) {
> - Dprintk("overrun: rxsovr\n");
> - dev->stats.rx_over_errors ++;
> + //printk("overrun: rxsovr\n");
> + dev->stats.rx_fifo_errors ++;
> }
> +
> if (unlikely(ISR_RXORN & isr)) {
> - Dprintk("overrun: rxorn\n");
> - dev->stats.rx_over_errors ++;
> + //printk("overrun: rxorn\n");
> + dev->stats.rx_fifo_errors ++;
> }
>
> if ((ISR_RXRCMP & isr) && dev->rx_info.up)
> @@ -1241,15 +1294,11 @@
> if ((ISR_TXDESC | ISR_TXIDLE) & isr)
> do_tx_done(dev);
>
> - if (ISR_MIB & isr) {
> - spin_lock(&dev->misc_lock);
> - ns83820_update_stats(dev);
> - spin_unlock(&dev->misc_lock);
> - }
> + if (unlikely(ISR_MIB & isr))
> + ns83820_mib_isr(dev);
>
> - if (ISR_PHY & isr)
> + if (unlikely(ISR_PHY & isr))
> phy_intr(dev);
> - }
>
> #if 0 /* Still working on the interrupt mitigation strategy */
> if (dev->ihr)
> @@ -1412,6 +1461,7 @@
> dev->net_dev.owner = THIS_MODULE;
>
> PREPARE_TQUEUE(&dev->tq_refill, queue_refill, dev);
> + tasklet_init(&dev->rx_tasklet, rx_action, (unsigned long)dev);
>
> err = pci_enable_device(pci_dev);
> if (err) {
> @@ -1430,8 +1480,9 @@
> if (!dev->base || !dev->tx_descs || !dev->rx_info.descs)
> goto out_disable;
>
> - dprintk("%p: %08lx %p: %08lx\n", dev->tx_descs, dev->tx_phy_descs,
> - dev->rx_info.descs, dev->rx_info.phy_descs);
> + dprintk("%p: %08lx %p: %08lx\n",
> + dev->tx_descs, (long)dev->tx_phy_descs,
> + dev->rx_info.descs, (long)dev->rx_info.phy_descs);
> /* disable interrupts */
> writel(0, dev->base + IMR);
> writel(0, dev->base + IER);
> @@ -1484,14 +1535,14 @@
> dev->CFG_cache = readl(dev->base + CFG);
>
> if ((dev->CFG_cache & CFG_PCI64_DET)) {
> - printk("%s: enabling 64 bit PCI addressing.\n",
> + printk("%s: detected 64 bit PCI data bus.\n",
> dev->net_dev.name);
> - dev->CFG_cache |= CFG_T64ADDR | CFG_DATA64_EN;
> -#if defined(USE_64BIT_ADDR)
> - dev->net_dev.features |= NETIF_F_HIGHDMA;
> -#endif
> + /*dev->CFG_cache |= CFG_DATA64_EN;*/
> + if (!(dev->CFG_cache & CFG_DATA64_EN))
> + printk("%s: EEPROM did not enable 64 bit bus. Disabled.\n",
> + dev->net_dev.name);
> } else
> - dev->CFG_cache &= ~(CFG_T64ADDR | CFG_DATA64_EN);
> + dev->CFG_cache &= ~(CFG_DATA64_EN);
>
> dev->CFG_cache &= (CFG_TBI_EN | CFG_MRM_DIS | CFG_MWI_DIS |
> CFG_T64ADDR | CFG_DATA64_EN | CFG_EXT_125 |
> @@ -1528,8 +1579,12 @@
> writel(dev->CFG_cache, dev->base + CFG);
> dprintk("CFG: %08x\n", dev->CFG_cache);
>
> +#if 1 /* Huh? This sets the PCI latency register. Should be done via
> + * the PCI layer. FIXME.
> + */
> if (readl(dev->base + SRR))
> writel(readl(dev->base+0x20c) | 0xfe00, dev->base + 0x20c);
> +#endif
>
> /* Note! The DMA burst size interacts with packet
> * transmission, such that the largest packet that
> @@ -1543,13 +1598,15 @@
> /* Flush the interrupt holdoff timer */
> writel(0x000, dev->base + IHR);
> writel(0x100, dev->base + IHR);
> + writel(0x000, dev->base + IHR);
>
> /* Set Rx to full duplex, don't accept runt, errored, long or length
> - * range errored packets. Set MXDMA to 7 => 512 word burst
> + * range errored packets. Set MXDMA to 0 => 1024 word burst
> */
> writel(RXCFG_AEP | RXCFG_ARP | RXCFG_AIRL | RXCFG_RX_FD
> + | RXCFG_STRIPCRC
> | RXCFG_ALP
> - | RXCFG_MXDMA | 0, dev->base + RXCFG);
> + | (RXCFG_MXDMA0 * 0) | 0, dev->base + RXCFG);
>
> /* Disable priority queueing */
> writel(0, dev->base + PQCR);
> @@ -1576,7 +1633,11 @@
> dev->net_dev.features |= NETIF_F_SG;
> dev->net_dev.features |= NETIF_F_IP_CSUM;
> #if defined(USE_64BIT_ADDR) || defined(CONFIG_HIGHMEM4G)
> - dev->net_dev.features |= NETIF_F_HIGHDMA;
> + if ((dev->CFG_cache & CFG_T64ADDR)) {
> + printk(KERN_INFO "%s: using 64 bit addressing.\n",
> + dev->net_dev.name);
> + dev->net_dev.features |= NETIF_F_HIGHDMA;
> + }
> #endif
>
> printk(KERN_INFO "%s: ns83820 v" VERSION ": DP83820 v%u.%u:
> %02x:%02x:%02x:%02x:%02x:%02x io=0x%08lx irq=%d f=%s\n",
> @@ -1587,7 +1648,7 @@
> dev->net_dev.dev_addr[2], dev->net_dev.dev_addr[3],
> dev->net_dev.dev_addr[4], dev->net_dev.dev_addr[5],
> addr, pci_dev->irq,
> - (dev->net_dev.features & NETIF_F_HIGHDMA) ? "sg" : "h,sg"
> + (dev->net_dev.features & NETIF_F_HIGHDMA) ? "h,sg" : "sg"
> );
>
> return 0;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-03-12 11:19:10

by David Miller

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17

From: Michael Clark <[email protected]>
Date: Tue, 12 Mar 2002 19:00:09 +0800

Dave, what performance do you get with the sk98 using normal size
frames? (to compare apples with apples). BTW - i can't try jumbo
frames due to my crappy 3com gig switch.

Use a cross-over cable to play with Jumbo frames, that is
what I do :-)

Later this week I'll rerun tests on all the cards I have
(Acenic, Sk98, tigon3, Natsemi etc.) with current drivers
to see what it looks like with both jumbo and non-jumbo
mtus over gigabit.

2002-03-12 13:03:59

by dean gaudet

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17

On Tue, 12 Mar 2002, David S. Miller wrote:

> From: Michael Clark <[email protected]>
> Date: Tue, 12 Mar 2002 19:00:09 +0800
>
> Dave, what performance do you get with the sk98 using normal size
> frames? (to compare apples with apples). BTW - i can't try jumbo
> frames due to my crappy 3com gig switch.
>
> Use a cross-over cable to play with Jumbo frames, that is
> what I do :-)

you shouldn't even need a crossover cable :) 1000baseT NICs should figure
out what the wire pairings are and adjust the DSP accordingly. at least
the acenic-based cards seem to work host-to-host with a regular patch
cable or a cross-over (or a hacked up pairing i tried which crossed over
both the two 100baseT pairs and the other 2 pairs which aren't usually
crossed over).

-dean

2002-03-12 13:07:29

by David Miller

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17

From: dean gaudet <[email protected]>
Date: Tue, 12 Mar 2002 05:03:45 -0800 (PST)

On Tue, 12 Mar 2002, David S. Miller wrote:

> Use a cross-over cable to play with Jumbo frames, that is
> what I do :-)

you shouldn't even need a crossover cable :)

Intel e1000's can do this too...

come to think of it there is a link polarity bit in one
of the tigon3 registers, hmmm...

2002-03-12 18:12:31

by Trever L. Adams

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17

On Tue, 2002-03-12 at 06:15, David S. Miller wrote:

> Use a cross-over cable to play with Jumbo frames, that is
> what I do :-)
>
> Later this week I'll rerun tests on all the cards I have
> (Acenic, Sk98, tigon3, Natsemi etc.) with current drivers
> to see what it looks like with both jumbo and non-jumbo
> mtus over gigabit.

I no longer have the original, so I will just have to respond to this
one, since it is related. I know I know nearly nothing about
networking, but here has been my thinking.

David, you believe we don't need NAPI. You believe we perform fine
without it. Here is my question. A PCI bus, IRC, has about 500
Megabytes/sec of bandwidth. A full blown gigabit Ethernet stream should
be around 133 Megabytes/sec. Sounds to me like a PC could act easily
(As far as bandwidth is concerned) as a 4 to 5 port gigabit Ethernet
router.

How well does this work with NAPI and how well does it work without? Is
NAPI a gain here?

Maybe there are other issues involved that I am unaware of, if there
are, I would still like to see how the theoretical answers pan out.

Thank you,
Trever Adams

2002-03-12 18:21:12

by David Miller

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17

From: "Trever L. Adams" <[email protected]>
Date: 12 Mar 2002 13:12:32 -0500

David, you believe we don't need NAPI.

I said we don't need NAPI for just bandwidth streams, you mention
routing which is specifically the case I mention that NAPI is good for
(high packet rates).

2002-03-12 18:31:08

by Trever L. Adams

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17

On Tue, 2002-03-12 at 13:17, David S. Miller wrote:
> From: "Trever L. Adams" <[email protected]>
> Date: 12 Mar 2002 13:12:32 -0500
>
> David, you believe we don't need NAPI.
>
> I said we don't need NAPI for just bandwidth streams, you mention
> routing which is specifically the case I mention that NAPI is good for
> (high packet rates).
>

My apologies. I have been trying to follow the conversation, but came
in, I believe, quite late. I only saw the comments about NAPI and
bandwidth last night.

Thank you for clear that up.

Trever Adams

2002-03-12 19:06:38

by Charles Cazabon

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17

Trever L. Adams <[email protected]> wrote:
>
> Here is my question. A PCI bus, IRC, has about 500 Megabytes/sec of
> bandwidth.

Depends. 32-bit 33Mhz PCI is 133MB/s. 64-bit 66MHz PCI is 533MB/s -- those
are theoretical, of course. In real life you're not likely to see better than
about 90% of those figures, even in ideal cases.

> A full blown gigabit Ethernet stream should be around 133 Megabytes/sec.
> Sounds to me like a PC could act easily (As far as bandwidth is concerned)
> as a 4 to 5 port gigabit Ethernet router.

If you define PC as "cheap Athlon box with 32-bit, 33MHz PCI bus", then no.

Charles
--
-----------------------------------------------------------------------
Charles Cazabon <[email protected]>
GPL'ed software available at: http://www.qcc.sk.ca/~charlesc/software/
-----------------------------------------------------------------------

2002-03-12 19:42:47

by Pedro M. Rodrigues

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17


A PC with an 64-bit PCI bus is still cheap nowadays, and more if
you compare it with a full blown four gigabit port router. The question
is not if we have the i/o, but if it would route at wire speed.



/Pedro

On 12 Mar 2002 at 13:06, Charles Cazabon wrote:

> If you define PC as "cheap Athlon box with 32-bit, 33MHz PCI bus",
> then no.
>

2002-03-12 19:53:23

by pjd

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17

David Miller wrote:
>
> I said we don't need NAPI for just bandwidth streams, you mention
> routing which is specifically the case I mention that NAPI is good for
> (high packet rates).

In particular, if you have a small number of high-speed streams the
TCP window mechanism will protect against receive livelock. (actually
a medium number of streams would still be protected - it's not until
the total offered window size in packets exceeds the input packet
queue size that you would become vulnerable to livelock)

Routing, on the other hand, can be driven into a state where you spend
all your CPU processing receive interrupts, and no CPU actually
forwarding the packets, for a net throughput approaching zero.

Peter Desnoyers

2002-03-14 09:54:43

by Jeff Garzik

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17 (Re: Broadcom 5700/5701 Gigabit Ethernet Adapters)

Benjamin LaHaise wrote:

>A day's tweaking later, and I'm getting 810mbit/s with netperf between
>two Athlons with default settings (1500 byte packets). What I've found
>is that increasing the size of the RX/TX rings or the max sizes of the
>tcp r/wmem backlogs really slows things down, so I'm not doing that
>anymore. The pair of P3s shows 262mbit/s (up from 67).
>
>Interrupt mitigation is now pretty stupid, but it helped: the irq
>handler disables the rx interrupt and then triggers a tasklet to run
>through the rx ring. The tasklet later enables rx interrupts again.
>More tweaking tomorrow...
>
>Marcelo, please apply the patch below to the next 2.4 prepatch: it
>also has a fix for a tx hang problem, and a few other nasties. Thanks!
>

Comments:

1) What were the test conditions leading to your decision to decrease
the RX/TX ring count? I'm not questioning the decision, but looking to
be better informed... other gigabit drivers normally have larger rings.
I also wonder if the slowdown you see could be related to a small-sized
FIFO on the natsemi chips, that would probably not be present on other
gigabit chips.

2) PCI latency timer is set with pci_set_master(), as in: if it's not
correctly set, fix it up. If it's correctly set, leave it alone and let
the user tune in BIOS Setup.

3) Seeing "volatile" in your code. Cruft? volatile's meaning change in
recent gcc versions implies to me that your code may need some addition
rmb/wmb calls perhaps, which are getting hidden via the driver's
dependency on a compiler-version-specific implementation of "volatile."

4) Do you really mean to allocate memory for "REAL_RX_BUF_SIZE + 16"?
Why not plain old REAL_RX_BUF_SIZE?

5) Random question, do you call netif_carrier_{on,off,ok} for link
status manipulation? (if not, you should...)

6) skb_mangle_for_davem is pretty gross... curious: what sort of NIC
alignment restrictions are there on rx and tx buffers (not descriptors)?
None? 32-bit? Ignore CPU alignment for a moment here...

7) What are the criteria for netif_wake_queue? If you are waking when
the TX is still "mostly full" you probably generate excessive wakeups...

8) There is no cabal.


2002-03-14 20:38:05

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17 (Re: Broadcom 5700/5701 Gigabit Ethernet Adapters)

On Thu, Mar 14, 2002 at 04:54:03AM -0500, Jeff Garzik wrote:
> Comments:
>
> 1) What were the test conditions leading to your decision to decrease
> the RX/TX ring count? I'm not questioning the decision, but looking to
> be better informed... other gigabit drivers normally have larger rings.
> I also wonder if the slowdown you see could be related to a small-sized
> FIFO on the natsemi chips, that would probably not be present on other
> gigabit chips.

Smaller rings lead to better thruput, especially on the slower cpus. Not
that in part the slowness was caused by having slab debugging enabled.
Turning slab debugging off brought the p3s up to ~500mbit and the athlons
over 900.

> 2) PCI latency timer is set with pci_set_master(), as in: if it's not
> correctly set, fix it up. If it's correctly set, leave it alone and let
> the user tune in BIOS Setup.

Ah. That part is something I was thinking of deleting, and now will do.

> 3) Seeing "volatile" in your code. Cruft? volatile's meaning change in
> recent gcc versions implies to me that your code may need some addition
> rmb/wmb calls perhaps, which are getting hidden via the driver's
> dependency on a compiler-version-specific implementation of "volatile."

Paranoia during writing. I'll reaudit. That said, volatile behaviour
is not compiler version specific.

> 4) Do you really mean to allocate memory for "REAL_RX_BUF_SIZE + 16"?
> Why not plain old REAL_RX_BUF_SIZE?

The +16 is for alignment (just like the comment says). The hardware
requires that rx buffers be 64 bit aligned.

> 5) Random question, do you call netif_carrier_{on,off,ok} for link
> status manipulation? (if not, you should...)

Ah, api updates. Added to the todo.

> 6) skb_mangle_for_davem is pretty gross... curious: what sort of NIC
> alignment restrictions are there on rx and tx buffers (not descriptors)?
> None? 32-bit? Ignore CPU alignment for a moment here...

tx descriptors have no alignment restriction, rx descriptors must be
64 bit aligned. Someone chose not to include the transistors for a
barrel shifter in the rx engine.

> 7) What are the criteria for netif_wake_queue? If you are waking when
> the TX is still "mostly full" you probably generate excessive wakeups...

Hrm? Currently it will do a wakeup when at least one packet (8 sg
descriptors) can be sent. Given that the tx done code is only called
when a tx desc (every 1/4 or so of the tx queue) or txidle interrupt
occurs, it shouldn't be that often.

-ben
--
"A man with a bass just walked in,
and he's putting it down
on the floor."

2002-03-15 01:02:56

by Jeff Garzik

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17 (Re: Broadcom 5700/5701 Gigabit Ethernet Adapters)

Benjamin LaHaise wrote:

>>3) Seeing "volatile" in your code. Cruft? volatile's meaning change in
>>recent gcc versions implies to me that your code may need some addition
>>rmb/wmb calls perhaps, which are getting hidden via the driver's
>>dependency on a compiler-version-specific implementation of "volatile."
>>
>
>Paranoia during writing. I'll reaudit. That said, volatile behaviour
>is not compiler version specific.
>
gcc 3.1 volatile behavior changes, so, yes, it is...

>>4) Do you really mean to allocate memory for "REAL_RX_BUF_SIZE + 16"?
>> Why not plain old REAL_RX_BUF_SIZE?
>>
>
>The +16 is for alignment (just like the comment says). The hardware
>requires that rx buffers be 64 bit aligned.
>
Cool... just checking. Both RX_BUF_SIZE and REAL_RX_BUF_SIZE are defined as
foo + magic_number

so I wasn't sure if the alignment space was -already- accounted for, in
the definition of RX_BUF_SIZE, thus making the addition op next to
allocations of REAL_RX_BUF_SIZE superfluous. But, I stand corrected,
thanks.

>5) Random question, do you call netif_carrier_{on,off,ok} for link
>> status manipulation? (if not, you should...)
>
>
>Ah, api updates. Added to the todo.
>
More than just api updates... You have a bunch of hack-y logic for when
the link goes down and up, messing around with netif_stop_queue and
netif_wake_queue. That stuff will be simplified or simply go away. The
basic idea is, if netif_carrier_ok(dev) is not true, then the net stack
will not be sending you any packets. So those extra
netif_{stop,wake}_queue calls are superflouous.

We're also about to start sending link up/down messages async-ly via
netlink, so that's even more added value as well.

>>6) skb_mangle_for_davem is pretty gross... curious: what sort of NIC
>>alignment restrictions are there on rx and tx buffers (not descriptors)?
>> None? 32-bit? Ignore CPU alignment for a moment here...
>>
>
>tx descriptors have no alignment restriction, rx descriptors must be
>64 bit aligned. Someone chose not to include the transistors for a
>barrel shifter in the rx engine.
>

Sigh :)

>>7) What are the criteria for netif_wake_queue? If you are waking when
>>the TX is still "mostly full" you probably generate excessive wakeups...
>>
>
>Hrm? Currently it will do a wakeup when at least one packet (8 sg
>descriptors) can be sent. Given that the tx done code is only called
>when a tx desc (every 1/4 or so of the tx queue) or txidle interrupt
>occurs, it shouldn't be that often.
>

Cool. As FYI (_not_ advice on your driver), here's the logic I was
referring to:

dev->hard_start_xmit()
if (free slots < MAX_SKB_FRAGS)
BUG()
queue packet
if (free slots < MAX_SKB_FRAGS)
netif_stop_queue(dev)

foo_interrupt()
if (some tx interrupt)
complete as many TX's as possible
if (netif_queue_stopped && (free slots > (TX_RING_SIZE / 4)))
netif_wake_queue(dev)

But as long as your TX interrupts are well mitigated (and it sounds like
they are), you can get by with your current scheme just fine.

Jeff




2002-03-15 09:01:32

by Daniel Phillips

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17 (Re: Broadcom 5700/5701 Gigabit Ethernet Adapters)

On March 14, 2002 09:37 pm, Benjamin LaHaise wrote:
> > 4) Do you really mean to allocate memory for "REAL_RX_BUF_SIZE + 16"?
> > Why not plain old REAL_RX_BUF_SIZE?
>
> The +16 is for alignment (just like the comment says). The hardware
> requires that rx buffers be 64 bit aligned.

Nit: that would be REAL_RX_BUF_SIZE + 15.

--
Daniel

2002-03-21 20:39:52

by Thomas Langås

[permalink] [raw]
Subject: Re: Broadcom 5700/5701 Gigabit Ethernet Adapters

David S. Miller:
> This may surprise some people, but frankly I think the Tigon3's PCI
> dma engine is junk based upon my current knowledge of the card. It is
> always possible I may find out something new which kills this
> perception I have of the card, but we'll see...

Is it possible that they've changed this in newer revisions of their
chips? Why is it junk? What could've been done better? What effects has
does the "junky" DMA engine have on the NIC (can there be packet losses when
there's packet bursts, for instance)?

(I'm asking this because we're considering servers which as broadcom NICs
as the only NICs on-board, on the servers).

--
Thomas

2002-04-08 05:15:10

by Richard Gooch

[permalink] [raw]
Subject: Re: [patch] ns83820 0.17 (Re: Broadcom 5700/5701 Gigabit Ethernet Adapters)

Benjamin LaHaise writes:
> On Sun, Mar 10, 2002 at 06:30:33PM -0800, David S. Miller wrote:
> > Syskonnect sk98 with jumbo frames gets ~107MB/sec TCP bandwidth
> > without NAPI, there is no reason other cards cannot go full speed as
> > well.
> >
> > NAPI is really only going to help with high packet rates not with
> > thinks like raw bandwidth tests.
>
> A day's tweaking later, and I'm getting 810mbit/s with netperf
> between two Athlons with default settings (1500 byte packets). What
> I've found is that increasing the size of the RX/TX rings or the max
> sizes of the tcp r/wmem backlogs really slows things down, so I'm
> not doing that anymore. The pair of P3s shows 262mbit/s (up from
> 67).

Just a public word of thanks to Ben for improving the driver. I've got
7 shiny new Addtron cards (aka "El Cheapo":-) and with 2.4.19-pre6 I'm
getting TCP bandwidths of 69 MB/s (PIII 450 SMP to Athalon 500 UP) and
52 MB/s (Athalon 500 UP to PIII 450 SMP). CONFIG_DEBUG_SLAB=n. This is
with two 16 m runs of Category 5 cable (still waiting for the Cat 6).
MTU=1500.

Only half of the theoretical bandwidth, but, hey, it's heaps better
than 100 Mb/s. Hopefully the remaining 50% will be reclaimed after
more hackery (and/or NAPI). Being on a mixed 10/100/1000 Mb/s network
means that I'm stuck with MTU=1500.

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]