2001-02-04 19:56:48

by jamal

[permalink] [raw]
Subject: Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing todowith ECN)



On Tue, 30 Jan 2001, Rick Jones wrote:

> > > How does ZC/SG change the nature of the packets presented to the NIC?
> >
> > what do you mean? I am _sure_ you know how SG/ZC work. So i am suspecting
> > more than socratic view on life here. Could be influence from Aristotle;->
>
> Well, I don't know the specifics of Linux, but I gather from what I've
> read on the list thusfar, that prior to implementing SG support, Linux
> NIC drivers would copy packets into single contiguous buffers that were
> then sent to the NIC yes?
>

yes.

> If so, the implication is with SG going, that copy no longer takes
> place, and so a chain of buffers is given to the NIC.
>

yes.

> Also, if one is fully ZC :) pesky things like protocol headers can
> naturally end-up in separate buffers.
>

yes.

> So, now you have to ask how well any given NIC follows chains of
> buffers. At what number of buffers is the overhead in the NIC of
> following the chains enough to keep it from achieving link-rate?
>

hmmm... not sure how you would enforce this today or why you would
want that. Alexey, Dave?
The kernel should be able to break it into two buffers(with netperf,
for example -- header + data).
Ok, probably with tux-http 3 (header, data, trailler).

> One way to try and deduce that would be to meld some of the SG and preSG
> behaviours and copy packets into varying numbers of buffers per packet
> and measure the resulting impact on throughput through the NIC.
>

If only time were on my hands i'd love to do this. Alas.
NOTE also, that effect would also be an effect of the specif NIC.

> rick jones
>
> As time marches on, the orders of magnitude of the constants may change,
> but basic concepts still remain, and the "lessons" learned in the past
> by one generation tend to get relearned in the next :) for example -
> there is no such a thing as a free lunch... :)

;->
BTW, i am reading one of your papers (circa 1993 ;->, "we go fast with a
little help from your apps") in which you make an interesting
observation. That (figure 2) there is "a considerable increase in
efficiency but not a considerable increase in throughput" .... I "scanned"
to the end of the paper and dont see an explanation.
I've made a somehow similar observation with the current zc patches and
infact observed that throughput goes down with the linux zc patches.
[This is being contested but no-one else is testing at gigE, so my word is
the only truth].
Of course your paper doesnt talk about sendfile rather the page pinning +
COW tricks (which are considered taboo in Linux) but i do sense a
relationship.

cheers,
jamal

PS:- I dont have "my" machines yet and i have a feeling it will be a while
before i re-run the tests; however, i have created a patch for
linux-sendfile with netperf. Please take a look at it at:
http://www.cyberus.ca/~hadi/patch-nperf-sfile-linux.gz
tell me if is missing anything and if it is ok, could you please merge in
your tree?




2001-02-05 05:16:04

by David Miller

[permalink] [raw]
Subject: Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing todowith ECN)


jamal writes:
> > So, now you have to ask how well any given NIC follows chains of
> > buffers. At what number of buffers is the overhead in the NIC of
> > following the chains enough to keep it from achieving link-rate?
> >
>
> hmmm... not sure how you would enforce this today or why you would
> want that. Alexey, Dave?
> The kernel should be able to break it into two buffers(with netperf,
> for example -- header + data).
> Ok, probably with tux-http 3 (header, data, trailler).

First, just to make sure Jamal understands what Rick Jones is
trying to make note of. He is trying to say that the cost of
dealing with extra TX descriptor ring entries can begin to
nullify the gains of zerocopy, depending upon HW implementation (both
at the NIC and the PCI controller).

Back to today, it is possible that this is an issue if your machine
is near PCI bandwidth saturation before zerocopy for these tests.
I think this may be one of the factors causing Jamal to see results
Alexey cannot reproduce. Get two people with identical PCI host
bridges, Acenic in identical PCI slot, I bet the numbers begin to
jive.

Currently, you get "1 + ((MTU + PAGE_SIZE - 1) / PAGE_SIZE)" buffers
per packet when going over a zerocopy device using TCP.

Later,
David S. Miller
[email protected]

2001-02-05 18:51:49

by Rick Jones

[permalink] [raw]
Subject: Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing todowith ECN)

> > As time marches on, the orders of magnitude of the constants may change,
> > but basic concepts still remain, and the "lessons" learned in the past
> > by one generation tend to get relearned in the next :) for example -
> > there is no such a thing as a free lunch... :)
>
> ;->
> BTW, i am reading one of your papers (circa 1993 ;->, "we go fast with a
> little help from your apps") in which you make an interesting
> observation. That (figure 2) there is "a considerable increase in
> efficiency but not a considerable increase in throughput" .... I "scanned"
> to the end of the paper and dont see an explanation.

That would be the copyavoidance paper using the very old G30 with the
HP-PB (sometimes called PeanutButter) bus :)
(http://ftp.cup.hp.com/dist/networking/briefs/)

No, back then we were not going to describe the dirty laundry of the G30
hardware :) The limiter appears to have been the bus converter from the
SGC (?) main bus of the Novas (8x7,F,G,H,I) to the HP-PB bus. The chip
was (apropriately enough) codenamed "BOA" and it was a constrictor :)

I never had a chance to carry-out the tests on an older 852 system -
those have slower CPU's, but HP-PB was _the_ bus in the system.
Prototypes leading to the HP-PB FDDI card achieved 10 MB/s on an 832
system using UDP - this was back in the 1988-1989 timeframe iirc.

> I've made a somehow similar observation with the current zc patches and
> infact observed that throughput goes down with the linux zc patches.
> [This is being contested but no-one else is testing at gigE, so my word is
> the only truth].
> Of course your paper doesnt talk about sendfile rather the page pinning +
> COW tricks (which are considered taboo in Linux) but i do sense a
> relationship.

Well, the HP-PB FDDI card did follow buffer chains rather well, and
there was no mapping overhead on a Nova - it was a non-coherent I/O
subsystem and DMA was done exclusively with physical addresses (and
requisite pre-DMA flushes on outbound, and purges on inbound - another
reason why copy-avoidance was such a win overheadwise).

Also, there was no throughput drop when going to copyavoidance in that
stuff. So, I'd say that while somethings might "feel" similar, it does
not go much deeper than that.


rick

> PS:- I dont have "my" machines yet and i have a feeling it will be a while
> before i re-run the tests; however, i have created a patch for
> linux-sendfile with netperf. Please take a look at it at:
> http://www.cyberus.ca/~hadi/patch-nperf-sfile-linux.gz
> tell me if is missing anything and if it is ok, could you please merge in
> your tree?

I will take a look.

--
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...