2010-12-01 00:28:46

by Thomas Fjellstrom

[permalink] [raw]
Subject: low overhead packet capturing on linux

I'm working on a little tool to monitor and measure bandwidth use on a vm
host, down to keeping track of all guest and host bandwidth, including,
eventually per layer7 protocol use.

Right now I have a pretty simple setup, I setup an AF_PACKET socket, select on
it, and read data as it comes in. Obviously, this has a fatal flaw. It takes up
a rather large amount of cpu time just to capture the packets. On a GbE
interface, it uses up easily 60-80% cpu (on a 2.6Ghz amd phenom II cpu core)
just to capture the packets, trying to do anything fancy with them will likely
cause the kernel to drop some packets.

So what I'm looking for is a very low overhead way to capture packets. I've
come up with a few ideas, some of which I have no idea if they'd even work.

One idea that came to mind (that doesn't entirely look possible) is using
splice or vmsplice to get me as little copying as is necessary from the net
device to my own chunk of memory. Even better if it can be a circular queue of
sorts. I'd probably use one thread to just sit on the socket and manage the
packets, and a second thread to actually do the accounting on the incoming
packets.

Anyone have any pointers or tips for me?

--
Thomas Fjellstrom
[email protected]


2010-12-01 10:08:28

by Alexander Clouter

[permalink] [raw]
Subject: Re: low overhead packet capturing on linux

Thomas Fjellstrom <[email protected]> wrote:
>
> I'm working on a little tool to monitor and measure bandwidth use on a vm
> host, down to keeping track of all guest and host bandwidth, including,
> eventually per layer7 protocol use.
>
...iptables? You get packet and byte counters there for free and you
can have a 'web, smtp, $service[0], $service[1], ... , other' easily
enough.

Five to eight years ago we (an ISP) used this at a previous workplace of
mine to do xDSL traffic accounting for our users.

Cheers

--
Alexander Clouter
.sigmonster says: problem drinker, n.:
A man who never buys.

2010-12-01 10:18:43

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: low overhead packet capturing on linux

On December 1, 2010, you wrote:
> Thomas Fjellstrom <[email protected]> wrote:
> > I'm working on a little tool to monitor and measure bandwidth use on a vm
> > host, down to keeping track of all guest and host bandwidth, including,
> > eventually per layer7 protocol use.
>
> ...iptables? You get packet and byte counters there for free and you
> can have a 'web, smtp, $service[0], $service[1], ... , other' easily
> enough.

Not with full layer7 support these days. None of the old things like pp2p or
l7filter will even apply to anything remotely resembling a recent kernel.

Also I'm not sure it'll dynamically keep track of hosts. My solution will
track all hosts it sees. Where as iptables would be somewhat manual.

> Five to eight years ago we (an ISP) used this at a previous workplace of
> mine to do xDSL traffic accounting for our users.
>
> Cheers


--
Thomas Fjellstrom
[email protected]

2010-12-01 13:04:09

by Pekka Pietikäinen

[permalink] [raw]
Subject: Re: low overhead packet capturing on linux

On Tue, Nov 30, 2010 at 05:28:05PM -0700, Thomas Fjellstrom wrote:
> I'm working on a little tool to monitor and measure bandwidth use on a vm
> host, down to keeping track of all guest and host bandwidth, including,
> eventually per layer7 protocol use.
>
> Right now I have a pretty simple setup, I setup an AF_PACKET socket, select on
> it, and read data as it comes in. Obviously, this has a fatal flaw. It takes up
> a rather large amount of cpu time just to capture the packets. On a GbE
> interface, it uses up easily 60-80% cpu (on a 2.6Ghz amd phenom II cpu core)
> just to capture the packets, trying to do anything fancy with them will likely
> cause the kernel to drop some packets.
>
> So what I'm looking for is a very low overhead way to capture packets. I've
> come up with a few ideas, some of which I have no idea if they'd even work.
Have you checked out

http://public.lanl.gov/cpw/ (IIRC it's actually a part of recent libpcap,
but could be wrong) and http://www.ntop.org/PF_RING.html ?

2010-12-01 20:28:48

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: low overhead packet capturing on linux

On December 1, 2010, Pekka Pietikainen wrote:
> On Tue, Nov 30, 2010 at 05:28:05PM -0700, Thomas Fjellstrom wrote:
> > I'm working on a little tool to monitor and measure bandwidth use on a vm
> > host, down to keeping track of all guest and host bandwidth, including,
> > eventually per layer7 protocol use.
> >
> > Right now I have a pretty simple setup, I setup an AF_PACKET socket,
> > select on it, and read data as it comes in. Obviously, this has a fatal
> > flaw. It takes up a rather large amount of cpu time just to capture the
> > packets. On a GbE interface, it uses up easily 60-80% cpu (on a 2.6Ghz
> > amd phenom II cpu core) just to capture the packets, trying to do
> > anything fancy with them will likely cause the kernel to drop some
> > packets.
> >
> > So what I'm looking for is a very low overhead way to capture packets.
> > I've come up with a few ideas, some of which I have no idea if they'd
> > even work.
>
> Have you checked out
>
> http://public.lanl.gov/cpw/ (IIRC it's actually a part of recent libpcap,
> but could be wrong) and http://www.ntop.org/PF_RING.html ?

Hi,

Thanks, yes, at least I've seen the cpw page, probably briefly looked at the
PF_RING stuff before. But I'll take a closer look this time, thanks :)

When I was looking before, I was unduly rejecting things that required
patching the kernel, or adding special drivers. But if it really can help I
might as well take a look.

--
Thomas Fjellstrom
[email protected]

Subject: Re: low overhead packet capturing on linux

On Tue, 30 Nov 2010, Thomas Fjellstrom wrote:
> So what I'm looking for is a very low overhead way to capture packets. I've
> come up with a few ideas, some of which I have no idea if they'd even work.

Out-of-tree PF_RING :-(

I really wish someone would tack this problem in a way suitable for
inclusion on mainline, now that we have very good generic backend
infrastructure for such stuff (such as high-speed ring buffers).

AFAIK, what we have right now simply can't cope well with wirespeed taps
(or implement sflow-style taps with low overhead) on very fast links.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh