2002-02-22 01:43:30

by Dan Kegel

[permalink] [raw]
Subject: is CONFIG_PACKET_MMAP always a win?

Newbie gig-ethernet question here.
I'm helping somebody implement a program that needs to process
raw packets as close to wire rate as is possible. We're
using kernel 2.4.16 or so.

What's the best way to retrieve raw packets from the kernel?

a) use libpcap
Overhead: a little bit worse than the best of any of the other options?

b) use af_packet, and call recvfrom or recvmsg myself for each packet
Overhead: one full memcpy of the packet body and one
system call per packet.

c) enable CONFIG_PACKET_MMAP, use PACKET_RX_RING, and read packets from an mmap'd ring buffer
Overhead: kernel does a full memcpy of the packet body to get it
into the ring buffer, and my program does another to get it out.

If I understand it right, b costs one memcpy and one recv, and c costs
two memcpys. Which one wins?

I guess I should benchmark these alternatives myself, but before I do,
does anyone know of a good place to look for this info? Maybe
I'm reinventing the wheel here.

Thanks,
Dan


2002-02-22 02:19:28

by Alan

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

> c) enable CONFIG_PACKET_MMAP, use PACKET_RX_RING, and read packets from an mmap'd ring buffer
> Overhead: kernel does a full memcpy of the packet body to get it
> into the ring buffer, and my program does another to get it out.

Why are you copying it out of the ring not processing it in place ?

2002-02-22 06:01:47

by David Miller

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

From: Dan Kegel <[email protected]>
Date: Thu, 21 Feb 2002 17:51:20 -0800

What's the best way to retrieve raw packets from the kernel?

a) use libpcap
...
b) use af_packet
...
c) enable CONFIG_PACKET_MMAP, use PACKET_RX_RING

If I understand it right, b costs one memcpy and one recv, and c costs
two memcpys. Which one wins?

"a" should be doing "c" when it is available in the kernel.
If not, get a newer copy of the libpcap sources, preferably
from Alexey's site:

ftp.inr.ac.ru:/ip-routing/

2002-02-22 06:45:55

by Ben Greear

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?



David S. Miller wrote:

> From: Dan Kegel <[email protected]>
> Date: Thu, 21 Feb 2002 17:51:20 -0800
>
> What's the best way to retrieve raw packets from the kernel?
>
> a) use libpcap
> ...
> b) use af_packet
> ...
> c) enable CONFIG_PACKET_MMAP, use PACKET_RX_RING
>
> If I understand it right, b costs one memcpy and one recv, and c costs
> two memcpys. Which one wins?
>
> "a" should be doing "c" when it is available in the kernel.
> If not, get a newer copy of the libpcap sources, preferably
> from Alexey's site:
>
> ftp.inr.ac.ru:/ip-routing/


And if you can figure out how to do c, and feel like
sharing, please do let me know! Documentation is a
bit sparse..at least wherever I've been looking.

Enjoy,
Ben



--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear


2002-02-22 06:56:19

by Dan Kegel

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

"David S. Miller" wrote:
> ...
> If not, get a newer copy of the libpcap sources, preferably
> from Alexey's site:
>
> ftp.inr.ac.ru:/ip-routing/

The important files are a bit buried. The important ones seem to be

ftp://ftp.inr.ac.ru/ip-routing/lbl-tools/README
ftp://ftp.inr.ac.ru/ip-routing/lbl-tools/libpcap-0.4-ss991029.dif.gz
ftp://ftp.inr.ac.ru/ip-routing/lbl-tools/libpcap-0.4.tar.gz

The .dif file contains the first example I've seen of
how to use socket option PACKET_RX_RING.

- Dan

2002-02-22 08:37:42

by Gianni Tedesco

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

On Fri, 2002-02-22 at 06:45, Ben Greear wrote:
>
>
> David S. Miller wrote:
>
> > From: Dan Kegel <[email protected]>
> > Date: Thu, 21 Feb 2002 17:51:20 -0800
> >
> > What's the best way to retrieve raw packets from the kernel?
> >
> > a) use libpcap
> > ...
> > b) use af_packet
> > ...
> > c) enable CONFIG_PACKET_MMAP, use PACKET_RX_RING
> >
> > If I understand it right, b costs one memcpy and one recv, and c costs
> > two memcpys. Which one wins?
> >
> > "a" should be doing "c" when it is available in the kernel.
> > If not, get a newer copy of the libpcap sources, preferably
> > from Alexey's site:
> >
> > ftp.inr.ac.ru:/ip-routing/
>
>
> And if you can figure out how to do c, and feel like
> sharing, please do let me know! Documentation is a
> bit sparse..at least wherever I've been looking.

Yeah I found it a bit lacking too, I got there in the end though. Check
out: http://www.scaramanga.co.uk/code-fu/lincap.c

--
// Gianni Tedesco <[email protected]>
80% of all email is a figment of procmails imagination.

2002-02-22 14:06:13

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

On Thu, 21 Feb 2002, Dan Kegel wrote:

> The important files are a bit buried. The important ones seem to be
>
> ftp://ftp.inr.ac.ru/ip-routing/lbl-tools/README
> ftp://ftp.inr.ac.ru/ip-routing/lbl-tools/libpcap-0.4-ss991029.dif.gz
> ftp://ftp.inr.ac.ru/ip-routing/lbl-tools/libpcap-0.4.tar.gz
>
> The .dif file contains the first example I've seen of
> how to use socket option PACKET_RX_RING.

Too bad the changes did not get integrated -- libpcap 0.7.1 doesn't know
anything about PACKET_RX_RING...

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2002-02-22 18:11:10

by Jamie Lokier

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

Dan Kegel wrote:
> c) enable CONFIG_PACKET_MMAP, use PACKET_RX_RING, and read packets from an mmap'd ring buffer
> Overhead: kernel does a full memcpy of the packet body to get it
> into the ring buffer, and my program does another to get it out.

I had a look at this about a year ago, and it seems there is no method
provided to read the packets without copying them, if you need them in
user space.

Probably the fastest way to process packets in user space is to use a
special protocol handler of your own that mmaps the area where packets
are already DMAd from the driver. I have been known to suggest simply
mapping all 1GB of low kernel memory into user space for this :-) I
haven't tried this, or writing the protocol handler, though.

cheers,
-- Jamie


2002-02-22 18:26:29

by Alan

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

> > Overhead: kernel does a full memcpy of the packet body to get it
> > into the ring buffer, and my program does another to get it out.
>
> I had a look at this about a year ago, and it seems there is no method
> provided to read the packets without copying them, if you need them in
> user space.

You can process them in the ring buffer. If you can't keep up then you
are screwed any way you look at it 8)

2002-02-22 19:06:10

by Jamie Lokier

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

Alan Cox wrote:
> > > Overhead: kernel does a full memcpy of the packet body to get it
> > > into the ring buffer, and my program does another to get it out.
> >
> > I had a look at this about a year ago, and it seems there is no method
> > provided to read the packets without copying them, if you need them in
> > user space.
>
> You can process them in the ring buffer. If you can't keep up then you
> are screwed any way you look at it 8)

That still doesn't avoid copying: af_packet copies the whole packet (if
you want the whole packet) from the original skbuff to the ring buffer.

-- Jamie

2002-02-22 19:43:56

by Alan

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

> > You can process them in the ring buffer. If you can't keep up then you
> > are screwed any way you look at it 8)
>
> That still doesn't avoid copying: af_packet copies the whole packet (if
> you want the whole packet) from the original skbuff to the ring buffer.

I'd make a handwaved claim that the first copy of the packet from a DMA
receiving source is free. Its certainly pretty close to free because the
overhead of sucking it into L1 cache will dominate and you need to do that
anyway.

Zero copy is sometimes a false friend.

2002-02-22 21:44:37

by Mike Fedyk

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

On Fri, Feb 22, 2002 at 07:57:33PM +0000, Alan Cox wrote:
> > > You can process them in the ring buffer. If you can't keep up then you
> > > are screwed any way you look at it 8)
> >
> > That still doesn't avoid copying: af_packet copies the whole packet (if
> > you want the whole packet) from the original skbuff to the ring buffer.
>
> I'd make a handwaved claim that the first copy of the packet from a DMA
> receiving source is free. Its certainly pretty close to free because the
> overhead of sucking it into L1 cache will dominate and you need to do that
> anyway.
>

Doesn't DMA access system memory directly and leave processor caches alone?
If so, then the fewer copies that have to pollute the L1/2 caches the better.

Even if it does for UP, I'd immagine that it doesn't for SMP...

Mike

2002-02-23 00:11:42

by Alan

[permalink] [raw]
Subject: Re: is CONFIG_PACKET_MMAP always a win?

> > receiving source is free. Its certainly pretty close to free because the
> > overhead of sucking it into L1 cache will dominate and you need to do that
> > anyway.
> >
> Doesn't DMA access system memory directly and leave processor caches alone?

It accesses system memory. That means the copy you have in cache is stale
so you need to get rid of the copy in the cache - be that software or
hardware.