Hi,
nf-hipac aims to become a drop-in replacement for the iptables packet
filtering module. It implements a novel framework for packet classification
which uses an advanced algorithm to reduce the number of memory lookups per
packet. The module is ideal for environments where large rulesets and/or
high bandwidth networks are involved.
The algorithm code is designed in a way that it can be verified in userspace,
so the algorithm code itself can be considered correct. We are not able to
really verify the remaining files nfhp_mod.[ch] and the userspace tool
(nf-hipac.[ch]), but they are tested in depth and shouldn't contain any
critical bugs.
We have the results of some basic performance tests available on our web page.
The test compares the performance of the iptables filter table to the
performance of nf-hipac. Results are pretty impressive :-)
You can find the performance test results on our web page http://www.hipac.org
The releases can be downloaded from http://sourceforge.net/projects/nf-hipac/
Features:
- optimized for high performance packet classification
with moderate memory usage
- completely dynamic:
data structure isn't rebuild from scratch when inserting or
deleting rules, so fast updates are possible
- userspace tool syntax is very similar to the iptables syntax
- kernel does not need to be patched
- compatible to iptables: you can use iptables and nf-hipac at
the same time:
for example you could use the connection tracking module from
iptables and match the states with nf-hipac
- match support for:
+ source/destination ip
+ in/out interface
+ protocol (udp, tcp, icmp)
+ source/destination ports (udp, tcp)
+ icmp type
+ tcp flags
+ ttl
+ state match (conntrack module must be loaded)
- /proc/net/nf-hipac:
+ algorithm statistics available via
# cat /proc/net/nf-hipac
+ allows to dynamically limit the maximum memory usage
# echo > /proc/net/nf-hipac
Enjoy,
+-----------------------+----------------------+
| Michael Bellion | Thomas Heinz |
| <[email protected]> | <[email protected]> |
+-----------------------+----------------------+
From: [email protected]
Date: Thu, 26 Sep 2002 00:41:56 +0200
We have the results of some basic performance tests available on our web page.
The test compares the performance of the iptables filter table to the
performance of nf-hipac. Results are pretty impressive :-)
You missed the real trick, extending the routing tables to
do packet filter rule lookup. That's where the real
performance gains can be found, and how we intend to eventually
make all firewalling/encapsulation/et al. work.
Such a scheme can even obviate socket lookup if implemented properly.
It'd basically be a flow cache, much like route lookups but with an
expanded key set and the capability to stack routes. Such a flow
cache could even be two level, with the top level being %100 cpu local
on SMP (ie. no shared cache lines).
We have to do the route lookup anyways, if it got you to the packet
filtering tables (or ipsec encap information, or TCP socket, etc etc)
as a side effect, lots of netfilter becomes superfluous because all it
is doing is maintaining these lookup tables etc. which is what you are
optimizing.
Everyone looks to optimize these things on such a local level, and
that's not where the real improvements live. Think globally, and the
more powerful solutions are much more evident.
Everything, from packet forwarding, to firewalling, to TCP socket
packet receive, can be described with routes. It doesn't make sense
for forwarding, TCP, netfilter, and encapsulation schemes to duplicate
all of this table lookup logic and in fact it's entirely superfluous.
This stackable routes idea being worked on, watch this space over the
next couple of weeks :-)
On Wed, 25 Sep 2002, David S. Miller wrote:
> Everything, from packet forwarding, to firewalling, to TCP socket
> packet receive, can be described with routes.
QoS stuff too ?
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
Spamtraps of the month: [email protected] [email protected]
From: Rik van Riel <[email protected]>
Date: Wed, 25 Sep 2002 21:10:02 -0300 (BRT)
On Wed, 25 Sep 2002, David S. Miller wrote:
> Everything, from packet forwarding, to firewalling, to TCP socket
> packet receive, can be described with routes.
QoS stuff too ?
It is definitely possible.
Hi,
> You missed the real trick, extending the routing tables to
> do packet filter rule lookup. That's where the real
> performance gains can be found, ...
Yes, that's certainly true. But we have to take step by step.
We started our project in August 2001 and up to now almost all our work went
into developing and optimizing the algorithm and not into an optimal
integration into the linux kernel. We chose the netfilter integration as a
first step, because it was easy and fast to do. It doesn't break anything in
the kernel, no kernel patch is needed, it can be used together with other
existing netfilter/iptables modules and switching from the iptables filter
module to nf-hipac is really easy.
We have now basically finished the work on the algorithm itself. We can now
concentrate on porting the algorithm to other fields and on a better
integration into the kernel. We designed the algorithm code in a way that
allows to port it to other fields than packetfiltering without much work.
We were aware from the beginning that combining several fields (e.g. routing
and filtering) is THE way to go and it is no problem to support this with our
algorithm.
Our algorithm is already fast with a small number of rules, but what makes it
really interesting is, that it is possible to use huge rulesets/routing
tables/qos ... without much slowing down performance. In practical use people
won't notice much of a difference between using 25 or 25000 rules.
The nf-hipac team
Michael Bellion, Thomas Heinz
From: [email protected]
Date: Thu, 26 Sep 2002 02:38:06 +0200
> You missed the real trick, extending the routing tables to
> do packet filter rule lookup. That's where the real
> performance gains can be found, ...
Yes, that's certainly true. But we have to take step by step.
All the work you will spend to validate all the converted modules
will be equal if not greater to doing the algorithm correctly.
So why not do it correctly from the start? :-)
Seriously, just sit tight with your work, and once the stackable
route stuff is done, you can look into applying your algorithms
to the new flow cache.
> Seriously, just sit tight with your work, and once the stackable
> route stuff is done, you can look into applying your algorithms
> to the new flow cache.
Sorry, we are a bit confused of the formulation "adding the algorithmus to the
new flow cache"
Why to the flow cache? What exaclty is the job of this flow cache?
Does the job go beyond caching recently "lookup results"?
What happens if the flow cache doesn't have a certain lookup result in the
cache yet?
We mean, how is the packet classification solved then?
Is it right, that the code will then use a linear search algorithm and compare
the packet with each rule sequentially until a rule is found that matches all
relevant fields?
Our algorithm does not implement some kind of cache. Our algorithm is actually
a replacement for that linear search algorithm. Our algorithm implements an
advanced approach to the packet classification problem itself.
the nf-hipac team
Michael Bellion, Thomas Heinz
From: [email protected]
Date: Thu, 26 Sep 2002 03:44:26 +0200
Sorry, we are a bit confused of the formulation "adding the
algorithmus to the new flow cache"
Why to the flow cache? What exaclty is the job of this flow cache?
Does the job go beyond caching recently "lookup results"?
It is just a lookup function, keyed on whatever we'd like to take
from the incoming packet.
I mean that if you find a stronger hash/lookup architecture that
is faster for this purpose, we can integrate into _whatever_
scheme we end up using.
What happens if the flow cache doesn't have a certain lookup result
in the cache yet?
It goes to the level 2 lookup tables, which are slightly more complex
yet are still reasonably fast for lookups.
Is it right, that the code will then use a linear search algorithm and compare
the packet with each rule sequentially until a rule is found that matches all
relevant fields?
No linear search, but because we'll be doing masked/prefix lookups
on the various keys the lookup table will be different than the one
at the top level which uses perfect comparison results.
Just look at how the routing code works now, the final result will
be architected not much differently.
Franks a lot,
David S. Miller
[email protected]
On Wed, 25 Sep 2002 15:52:46 -0700 (PDT)
"David S. Miller" <[email protected]> wrote:
> We have to do the route lookup anyways, if it got you to the packet
> filtering tables (or ipsec encap information, or TCP socket, etc etc)
> as a side effect, lots of netfilter becomes superfluous because all it
> is doing is maintaining these lookup tables etc. which is what you are
> optimizing.
Indeed. Note that netfilter infrastructure had this from the beginning, but
it sits unused (skb->nf_cache), awaiting someone to get enthusiastic.
There are three real issues:
1) You need a way to say "too hard, don't cache this". We have
NFC_UNKNOWN (I looked at some packet field you don't have a bit for)
and NFC_UNCACHABLE (give me every packet dammit!).
2) You need a sane "selective flush" mechanism for timeouts and rule changes
(eg. connection tracking and nat).
3) If you want to keep counters in the subsystems (eg. iptables keeps packet
and byte counters at the moment for every rule because it's easy). You
need to keep this info in your route cache, then call the subsystems when
you evict something from the cache.
Cheers!
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy
From: Rusty Russell <[email protected]>
Date: Thu, 26 Sep 2002 15:19:33 +1000
3) If you want to keep counters in the subsystems (eg. iptables keeps packet
and byte counters at the moment for every rule because it's easy). You
need to keep this info in your route cache, then call the subsystems when
you evict something from the cache.
The flow cache, routing table, whatever you want to call it, doesn't
care what your info looks like, that will be totally private.
Actually the idea is that skb tagging would not be needed, the
skb->dst would _BE_ the tag. From it you could get at all the
info you needed.
Draw a picture in your head as you read this.
The private information could be a socket, a fast forwarding path, a
stack of IPSEC decapsulation operations, netfilter connection tracking
info, anything.
So skb->dst could point to something like:
struct socket_dst {
struct dst_entry dst;
enum dst_type dst_type;
struct sock *sk;
}
Here dst->input() would be tcp_ipv4_rcv().
It could just as easily be:
struct nf_conntrack_dst {
struct dst_entry dst;
enum dst_type dst_type;
struct nf_conntrack_info *ct;
}
And dst->input() would be nf_conntrack_input(), or actually something
more specific.
See?
And when you have the "drop packet" firewall rule, the skb->dst then
points to nf_blackhole() (which perhaps logs the event if you need to
do that).
If you have things that must happen in a sequence to flow through
your path properly, that's where the "stackable" bit comes in. You
do that one bit, skb->dst = dst_pop(skb->dst), then your caller
will pass the packet on to skb->dst->{output,input}().
Is it clearer now the kind of things you'll be able to do?
Stackable entries also allow encapsulation situations to propagate
precisely calculated PMTU information back to the transports.
Cached entries are created by demand, the level 2 lookup area is
where long term information sits. The top level lookup engine
carries cached entries of whatever is active at a given moment,
that is why I refer to it as a flow cache sometimes.
All of this was concocted to do IPSEC in a reasonable way, but as a
nice side effect it ends up solving a ton of other problems too. I
think some guy named Linus once told me that's the indication of a
clean design. :-)
...
Let me give another example of how netfilter could use this. Consider
FTP nat stuff, you'd start with the "from TCP, any address/anyport, to
any address/ FTP port" DST. This watches for new FTP connections, so
you can track them properly and do NAT or whatever else. This
destination would drop anything other only than SYN packets. It's
dst->input() would point to something like ftp_new_connection_input().
When you get a new connection, you create a destination which handles
the translation of specifically THAT ftp connection (ie. specific
instead of "any" addr/port pairs are used for the key). Here,
dst->input() would point to ftp_existing_connection_input().
On Wed, 25 Sep 2002, David S. Miller wrote:
> If you have things that must happen in a sequence to flow through
> your path properly, that's where the "stackable" bit comes in. You
> do that one bit, skb->dst = dst_pop(skb->dst), then your caller
> will pass the packet on to skb->dst->{output,input}().
>
> Is it clearer now the kind of things you'll be able to do?
>
So, this could be used for generic network layer encapsulation, and be
used for GRE tunnels, SIT etc. without the kinds of kludges currently in
use? Sounds nice.
- James
--
James Morris
<[email protected]>
From: James Morris <[email protected]>
Date: Fri, 27 Sep 2002 01:27:41 +1000 (EST)
So, this could be used for generic network layer encapsulation, and be
used for GRE tunnels, SIT etc. without the kinds of kludges currently in
use? Sounds nice.
Such IPIP tunnels have very real problems though, since only 64-bits
of packet quoting are required in ICMP errors, it is often impossible
to deal with PMTU requests properly, see "#ifndef
I_WISH_WORLD_WERE_PERFECT" in net/ipv4/ip_gre.c
-----BEGIN PGP SIGNED MESSAGE-----
>>>>> "David" == David S Miller <[email protected]> writes:
David> From: James Morris <[email protected]>
David> Date: Fri, 27 Sep 2002 01:27:41 +1000 (EST)
David> So, this could be used for generic network layer encapsulation, and be
David> used for GRE tunnels, SIT etc. without the kinds of kludges currently in
David> use? Sounds nice.
David> Such IPIP tunnels have very real problems though, since only 64-bits
David> of packet quoting are required in ICMP errors, it is often impossible
David> to deal with PMTU requests properly, see "#ifndef
David> I_WISH_WORLD_WERE_PERFECT" in net/ipv4/ip_gre.c
IPsec tunnels are even worse, because, not only is there not enough
info returned, but, being paranoid, one should really not even trust it.
ICMP Port not reachable for UDP port 500 are even more nasty, because
sometimes they indicate a REAL problem :-)
Eons ago, I proposed a way to deal with this problem, see:
http://www.sandelman.ottawa.on.ca/SSW/ietf/draft-richardson-ipsec-pmtu-discovery-00.txt
I think that now that Linux doesn't linearize skbuff's prior to passing
them to protocol handlers, that I actually could get the fragment info from
the skb chain.
Excerpt from document:
Gateway G2 upon receiving an ESP or AH packet that needs to be
reassembled, MUST take note of the size largest fragment received. This
value is compared to the previous largest fragment size. If this size
has changed by more than 10%, or more than 2*MSL time (i.e. 2 minutes)
has passed since the previous ICMP message, then an ICMP Datagram Too
Big message is generated. The largest fragment size is initialized to
576 bytes.
The ICMP datagram is addressed from gateway G2 to the originating node
C, and gives a size that is based on the maximum fragment size (above),
minus the IPsec overhead. The ICMP datagram is sent via the tunnel on
which the IPsec packet was a member. I.e. the ICMP is encapsulated.
A packet arriving at G1 with the DF bit set, does not cause the DF bit
to be set on the encapsulating datagram.
(proposal two changes the destination IP of the ICMP message)
] ON HUMILITY: to err is human. To moo, bovine. | firewalls [
] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[
] [email protected] http://www.sandelman.ottawa.on.ca/ |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Finger me for keys
iQCVAwUBPZPJt4qHRg3pndX9AQGqYwP+JtDyOhQwV2kiFWqxs8H8QbU0OJmV/krd
rNhapv6/qzxcqHHPWHiypxQLZ3uYHcNKfwdoQFhOgVxdZXivkOnvn9bjoKDIp+EL
JKNNvSrHNZMJ9yQCqUnsI+2XknkR9bCOOK9DfTcJ9e2lSFlH7H1vSTnRo6GOJhVh
arQUa22xoc8=
=VAyj
-----END PGP SIGNATURE-----
Dave,
now that i followed the thread on lk (slrn is great; thanks Jason);
I am actually interested to find out how you are going to pull what
you propose ;-> There are not that many things that will work well
with a dst-cache like idea. I actually considered the stacking idea
when i first was trying to prototype code that i posted. It is much harder
to make use of in practise. At least this is my experience.
If you look at the scheme i posted, youll see that the policy
could be to direct the packets to a IPV4-forwarding block or
totaly bypass it etc (i just didnt wanna jump into that step yet sicne it
is quiet involved architecturaly)
In any case we need to encourage people like the hipac authors to be
putting out things (i only wish theyd incorporate it into the tc
framework!); whatever changes made should consider that there is
more than one way to do things and people will always come with better
ways to do certain portions of the packet path.
cheers,
jamal
On Thu, 26 Sep 2002, David S. Miller wrote:
> From: James Morris <[email protected]>
> Date: Fri, 27 Sep 2002 01:27:41 +1000 (EST)
>
> So, this could be used for generic network layer encapsulation, and be
> used for GRE tunnels, SIT etc. without the kinds of kludges currently in
> use? Sounds nice.
>
> Such IPIP tunnels have very real problems though, since only 64-bits
> of packet quoting are required in ICMP errors, it is often impossible
> to deal with PMTU requests properly, see "#ifndef
> I_WISH_WORLD_WERE_PERFECT" in net/ipv4/ip_gre.c
>
>
>
From: jamal <[email protected]>
Date: Fri, 27 Sep 2002 10:12:26 -0400 (EDT)
whatever changes made should consider that there is more than one
way to do things and people will always come with better ways to do
certain portions of the packet path.
Of course.