2003-07-14 19:29:09

by David griego

[permalink] [raw]
Subject: Re: Alan Shih: "TCP IP Offloading Interface"

>Jeff Garzik wrote:
>Anything beyond basic host-only TOE adds massive complexity for very little
>gain: interfacing netfilter and routing code with a black box we _hope_
>will act properly sounds like suicide.
Keep most of this on the host, offload only performance path like the
Alacritech TOE.

>All this is vague handwaving without supporting evidence. So far we get
>stuff like Internet2 speed records _without_ TOE. And Linux currently
>supports 10gige... and hosts are just going to keep getting faster and
>faster.

Intel Clusters and Network Storage Volume Platforms Lab reported that it
takes about 1MHz to process 1Mbps on a PIII. Using this rule of thumb (they
showed it scaling from 400MHz to 800MHz) it would take 10GHz to process
10Mbps. Well you might say "what about multi-processers?" This would be
good for people that have multi-processors, but there is a large segment of
embedded processors that are not going have SMP, or be at 10GHz anytime
soon. Besides that processing interrupts does not scale across MPs
liniarly. The truth is that communication speeds are outpacing processor
speeds at this time.
David

>
> Jeff
>
>
>

_________________________________________________________________
Help STOP SPAM with the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail


2003-07-14 19:49:11

by Jeff Garzik

[permalink] [raw]
Subject: Re: Alan Shih: "TCP IP Offloading Interface"

David griego wrote:
> Intel Clusters and Network Storage Volume Platforms Lab reported that it
> takes about 1MHz to process 1Mbps on a PIII. Using this rule of thumb
> (they showed it scaling from 400MHz to 800MHz) it would take 10GHz to
> process 10Mbps. Well you might say "what about multi-processers?" This

Um. It doesn't take nearly 10Ghz to handle 10Mbps, or even 100Mbps.


> would be good for people that have multi-processors, but there is a
> large segment of embedded processors that are not going have SMP, or be
> at 10GHz anytime soon. Besides that processing interrupts does not
> scale across MPs liniarly. The truth is that communication speeds are
> outpacing processor speeds at this time.

If the host CPU is a bottleneck after large-send and checksums have been
offloaded, then logically you aren't getting any work done _anyway_.
You have to interface with the net stack at some point, in which case
you incur a fixed cost, for socket handling, TCP exception handling, etc.

Maybe somebody needs to be looking into AMP (asymmetric
multiprocessing), too.

Jeff



2003-07-14 19:56:43

by Alan

[permalink] [raw]
Subject: Re: Alan Shih: "TCP IP Offloading Interface"

On Llu, 2003-07-14 at 20:43, David griego wrote:
> Intel Clusters and Network Storage Volume Platforms Lab reported that it
> takes about 1MHz to process 1Mbps on a PIII. Using this rule of thumb (they

1MHz to proces 1Mbit doing what - file I/O to and from disk, web serving
- because ToE or otherwise I still have to process the data I receive
and do something useful with it unless I'm just a router, firewall or
load balancer. If you want to argue about using gate arrays and hardware
to accelerate IP routing, balancing and firewall filter cams then you
might get somewhere - but they dont need to talk TCP.

Also if its 1MHz per 1Mbit worse case and your ToE engine isnt entirely
hardware paths capable of sustaining 10Gbit/sec, what happens when I hit
you with 10Gbit of carefully chosen non optimal frames ?

2003-07-14 20:17:50

by Shawn

[permalink] [raw]
Subject: Re: Alan Shih: "TCP IP Offloading Interface"

Don't push him... he'll do it! We're talking "floor sweepings" frames!
"Too rank for sausage" frames!

On Mon, 2003-07-14 at 15:05, Alan Cox wrote:
> Also if its 1MHz per 1Mbit worse case and your ToE engine isnt entirely
> hardware paths capable of sustaining 10Gbit/sec, what happens when I hit
> you with 10Gbit of carefully chosen non optimal frames ?

2003-07-14 20:23:23

by Alan

[permalink] [raw]
Subject: Re: Alan Shih: "TCP IP Offloading Interface"

On Llu, 2003-07-14 at 21:03, Jeff Garzik wrote:
> If the host CPU is a bottleneck after large-send and checksums have been
> offloaded, then logically you aren't getting any work done _anyway_.
> You have to interface with the net stack at some point, in which case
> you incur a fixed cost, for socket handling, TCP exception handling, etc.
>
> Maybe somebody needs to be looking into AMP (asymmetric
> multiprocessing), too.

There isnt currently any evidence it buy you anything, although HT may
change that equation a bit. Its still the same RAM bandwidth and you've
not really gotten rid of most of the socket handling/event/wakeup
overhead either.

2003-07-15 05:43:44

by Werner Almesberger

[permalink] [raw]
Subject: Re: Alan Shih: "TCP IP Offloading Interface"

Alan Cox wrote:
> load balancer. If you want to argue about using gate arrays and hardware
> to accelerate IP routing, balancing and firewall filter cams then you
> might get somewhere - but they dont need to talk TCP.

One thing that sounds right about TOE is that per-packet overhead
is becoming an issue, too. At 10 Gbps, the critters come flying in
at almost 1 MHz if you're using standard MTU sizes.

On the other hand, replicating the entire infrastructure on some
non-Linux hardware has several problems, even if we don't consider
performance:

- where is the configuration interface ? In the kernel or in
user space ? What about existing interfaces ?
- you'll never get exactly the same semantics. Just identifying
the differences is a very painful process. And again, what
about existing interfaces ?
- testing has just become a lot harder

What I think would be more promising is to investigate in the
direction of NUMA-style architectures, where some CPUs are closer
to NICs and whatever data source/sink those TCP streams go to.

Licensing issues, the classical reason for using independent
stacks, can be elegantly avoided on Linux.

Another area are network processors. They could help with fancy
things like Dave's flow cache, but also with fine-grained timing
needed for traffic control. One problem there is that they're
locked away behind walls of NDAs and proprietary development
environments, so one couldn't even begin to properly support them
in Linux. (What can be done is to treat NP+software as a black
box, but I wouldn't consider this a satisfying choice.)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/