LinuxLists.cc - Re: Route cache performance under stress

2003-05-16 22:11:51

Subject: Re: Route cache performance under stress

On Sat, Apr 05, 2003 at 06:37:43PM +0200, Florian Weimer wrote:

> Please read the following paper:
>
> <http://www.cs.rice.edu/~scrosby/tr/HashAttack.pdf>
>
> Then look at the 2.4 route cache implementation.
>
> Short summary: It is possible to freeze machines with 1 GB of RAM and
> more with a stream of 400 packets per second with carefully chosen
> source addresses. Not good.

Finally, somebody else has noticed this problem! Unfortunately, I didn't
see this message until now.

I have been seeing this problem for over a year, and have had the same
problems you have with DoS attacks saturating the CPUs on our routers.

We have two Athlon 1800MP boxes doing routing on our network, and the CPU
saturates under embarrassingly low traffic rates with random source IPs.
We've noticed this a few times with DoS attacks generated internally and
with remote DoS attacks. I too have had to abuse the PREROUTING chain
(in the mangle table to avoid loading the nat table which would bring in
connection tracking -- grr...I hate the way this works in iptables),
particularly with the MSSQL worm that burst out to the 'net that one
Friday night several few months ago. Adding a single match udp port,
DROP rule to PREROUTING chain made the load go back down to normal
levels. The same rule in the INPUT/FORWARD chain had no affect on the
CPU utilization (still saturated).

> The route cache is a DoS bottleneck in general (that's why I started
> looking at it). You have to apply rate-limits in the PREROUTING
> chain, otherwise a modest packet flood will push the machine off the
> network (even with truly random source addresses, not triggering hash
> collisions). The route cache partially defeats the purpose of SYN
> cookies, too, because the kernel keeps (transient) state for spoofed
> connection attempts in the route cache.

The idea, I believe, was that the route cache was supposed to stay as a
mostly O(1) overhead and not fall over in any specific cases. As you
say, however, we also have problems with truly random IPs killing large
boxes. This same box is capable of routing more than one gigabit of tiny
(64 byte) packets when the source IP is not random (using Tigon3 cards).

Under normal operation, it looks like most load we are seeing is in fact
normal route lookups. We run BGP peering, and so there is a lot of
routes in the table. Alexey suggested adding another level of hashing to
the fn_hash_lookup function, but that didn't seem to help very much. The
last time I was looking at this, I enabled profiling on one router to see
what functions were using the most CPU. Here are the results (71 days
uptime):

http://blue.netnation.com/sim/ref/readprofile.r1
http://blue.netnation.com/sim/ref/readprofile.r1.call_sorted
http://blue.netnation.com/sim/ref/readprofile.r1.time_sorted

The results of this profile are from mostly normal operation.

I will try playing more with this code and look at your patch and paper.

Thanks,

Simon-

2003-05-16 23:03:20

by Florian Weimer

[permalink] [raw]

Subject: Re: Route cache performance under stress

Simon Kirby <[email protected]> writes:

> I hate the way this works in iptables), particularly with the MSSQL
> worm that burst out to the 'net that one Friday night several few
> months ago. Adding a single match udp port, DROP rule to PREROUTING
> chain made the load go back down to normal levels. The same rule in
> the INPUT/FORWARD chain had no affect on the CPU utilization (still
> saturated).

Yes, that's exactly the phenomenon, but operators traditionally
attributed it to other things running on the router (such as
accounting).

> Under normal operation, it looks like most load we are seeing is in fact
> normal route lookups. We run BGP peering, and so there is a lot of
> routes in the table.

You should aggregate the routes before you load them into the kernel.
Hardly anybody seems to do this, but usually, you have much fewer
interfaces than prefixes 8-), so this could result in a huge win.

Anyway, using data structures tailored to the current Internet routing
table, it's certainly possible to do destination-only routing using
half a dozen memory lookups or so (or a few indirect calls, I'm not
sure which option is cheaper).

> I will try playing more with this code and look at your patch and paper.

Well, I didn't write the paper, I found it after discovering the
problem in the kernel. This complexity attack has been resolved, but
this won't help people like you who have to run Linux on an
uncooperative network.

The patch I posted won't help you as it increases the load
considerably unless most of your flows consist of one packet. (And
there's no need for patching, you can go ahead and just change the
value via /proc.)

2003-05-17 05:52:46

by David Miller

[permalink] [raw]

Subject: Re: Route cache performance under stress

On Fri, 2003-05-16 at 15:24, Simon Kirby wrote:
> I have been seeing this problem for over a year, and have had the same
> problems you have with DoS attacks saturating the CPUs on our routers.

Have a look at current kernels and see if they solve your problem.
They undoubtedly should, and I consider this issue resolved.

--
David S. Miller <[email protected]>

2003-05-17 07:18:23

by Florian Weimer

[permalink] [raw]

Subject: Re: Route cache performance under stress

"David S. Miller" <[email protected]> writes:

> On Fri, 2003-05-16 at 15:24, Simon Kirby wrote:
>> I have been seeing this problem for over a year, and have had the same
>> problems you have with DoS attacks saturating the CPUs on our routers.
>
> Have a look at current kernels and see if they solve your problem.
> They undoubtedly should, and I consider this issue resolved.

The hash collision problem appears to be resolved, but not the more
general performance issues. Or are there any kernels without a
routing cache?

2003-05-17 21:57:33

by David Miller

[permalink] [raw]

Subject: Re: Route cache performance under stress

From: Florian Weimer <[email protected]>
Date: Sat, 17 May 2003 09:31:04 +0200

The hash collision problem appears to be resolved, but not the more
general performance issues. Or are there any kernels without a
routing cache?

I think your criticism of the routing cache is not well
founded.

2003-05-18 09:08:31

by Florian Weimer

[permalink] [raw]

Subject: Re: Route cache performance under stress

"David S. Miller" <[email protected]> writes:

> From: Florian Weimer <[email protected]>
> Date: Sat, 17 May 2003 09:31:04 +0200
>
> The hash collision problem appears to be resolved, but not the more
> general performance issues. Or are there any kernels without a
> routing cache?
>
> I think your criticism of the routing cache is not well
> founded.

Well, what would change your mind?

I don't really care, as I maintain just one Linux router which routes
substantial bandwidth and which could easily be protected by upstream
ACLs in the case of an emergency. Others rely on Linux routers for
their ISP business, and they see similar problems as well, and these
people *do* care about this problem. Some of them contacted me and
told me that they were ignored when they described it. They certainly
want this problem to be fixed, as using FreeBSD is not always an
option. 8-(

2003-05-18 09:19:55

by David Miller

[permalink] [raw]

Subject: Re: Route cache performance under stress

From: Florian Weimer <[email protected]>
Date: Sun, 18 May 2003 11:21:14 +0200

[ Please don't CC: [email protected] any more, his address
bounces at least for me (maybe his site rejects ECN, it is
the most likely problem if it works for other people) ]

"David S. Miller" <[email protected]> writes:

> I think your criticism of the routing cache is not well
> founded.

Well, what would change your mind?

I'll start to listen when you start to demonstrate that you understand
why the input routing cache is there and what problems it solves.

More people will also start to listen when you acutally discuss this
matter on the proper list(s) (which isn't linux-kernel, since
linux-net and [email protected] are the proper places). Most of the
net hackers have zero time to follow the enourmous amount of traffic
that exists on linux-kernel and picking out the networking bits.
Frankly, I /dev/null linux-kernel from time to time as well.

The fact is, our routing cache slow path is _FAST_. And we garbage
collect routing cache entries, so the attacker's entries are deleted
quickly while the entries for legitimate flows stick around. And
especially during an attack you want your legitimate traffic using the
routing cache.

I've never seen you mention this attribute of how the routing cache
works, nor have I seen you say anything which even suggests that you
are aware of this. You could even make this apparent by proposing a
replacement for the input routing cache. But remember, it has to
provide all of the functionality that is there today.

Nobody has demonstrated that there is a performance problem due to the
input routing cache once the hashing DoS is eliminated, which it is
in current kernels. Take this as my challenge to you. :-)

using FreeBSD is not always an option

Yeah, that dinosaur :-)

2003-05-19 17:23:26

by Jamal Hadi

[permalink] [raw]

Subject: Re: Route cache performance under stress

Florian,
I actually asked you to run some tests last time you showed up
on netdev but never heard back. Maybe we can get some results now
that the complaining is still continuing. Note, we cant just invent
things because "CISCO is doing it like that". That doesnt cut it.
What we need is data to substantiate things and then we move from there.

And oh, i am pretty sure we can beat any of the BSDs forwarding rate.
Anyone wants a duel, lets meet at the water fountain by the town
hall at sunrise.

cheers,
jamal

2003-05-19 18:58:00

by Simon Kirby

[permalink] [raw]

Subject: Re: Route cache performance under stress

[ Apologies all -- I had my address incorrectly set to [email protected]
for some reason. ]

On Sat, May 17, 2003 at 01:16:08AM +0200, Florian Weimer wrote:

> > Under normal operation, it looks like most load we are seeing is in fact
> > normal route lookups. We run BGP peering, and so there is a lot of
> > routes in the table.
>
> You should aggregate the routes before you load them into the kernel.
> Hardly anybody seems to do this, but usually, you have much fewer
> interfaces than prefixes 8-), so this could result in a huge win.

Hmm... Looking around, I wasn't able to find an option in Zebra to do
this. Do you know the command to do this?

> Anyway, using data structures tailored to the current Internet routing
> table, it's certainly possible to do destination-only routing using
> half a dozen memory lookups or so (or a few indirect calls, I'm not
> sure which option is cheaper).

Would this still route packets to destinations which would otherwise be
unreachable, then? While this isn't a big issue, it would be nice to
stop unroutable traffic before it leaves our networks (mostly in the case
where a customer machine is generating bad traffic).

I did experiment with trying to increase the routing (normal, not cache)
hash table another level, but it didn't seem to have much effect. I
believe I would have to change the algorithm somewhat to prefer falling
into larger hash buckets sooner than how it does at the moment. I seem
to recall that it would let the hash buckets get rather large before
expanding them. I haven't had a chance to look at this very deeply, but
the profile I linked to before does show that fn_hash_lookup() does
indeed use more CPU than any other function, so it may be worth looking
at more. (Aggregating routes would definitely improve the situation in
any case.)

> The patch I posted won't help you as it increases the load
> considerably unless most of your flows consist of one packet. (And
> there's no need for patching, you can go ahead and just change the
> value via /proc.)

Yes. I have fiddled with this before, and making the changes you
suggested actually doubled the load in normal operation. I would assume
this is putting even more pressure on fn_hash_lookup().

Simon-

2003-05-19 19:04:55

by Ralph Doncaster

[permalink] [raw]

Subject: Re: Route cache performance under stress

When I looked at the route-cache code, efficient wasn't the word the came
to mind. Whether the problem is in the route-cache or not, getting
>100kpps out of a linux router with <= 1Ghz of CPU is not at all an easy
task. I've tried 2.2 and 2.4 (up to 2.4.20) with 3c905CX cards, with and
without NAPI, on a 750Mhz AMD. I've never reached 100kpps without
userland (zebra) getting starved. I've even tried the e1000 with 2.4.20,
and it still doesn't cut it (about 50% better performance than the 3Com).
This is always with a full routing table (~110K routes).

If I actually had the time to do the code, I'd try dumping the route-cache
altogether and keep the forwarding table as an r-tree (probably 2 levels
of 2048 entries since average prefix size is /22). Frequently-used routes
would lookup faster due to CPU cache hits. I'd have all the crap for
source-based routing ifdef'd out when firewalling is not compiled in.

My next try will be with FreeBSD, using device polling and the e1000 cards
(since it seems there are no polling patches for the 3c905CX under
FreeBSD). From the description of how polling under FreeBSD works
http://info.iet.unipi.it/~luigi/polling/
vs NAPI under linux, polling sounds better due to the ability to configure
the polling cycle and CPU load triggers. From the testing and reading
I've done so far, NAPI doesn't seem to kick in until after 75-80% CPU
load. With less than 25kpps coming into the box zebra seems to take
almost 10x longer to bring up a session with full routes than it does with
no packet load. Since CPU load before zebra becomes active is 70-75%, it
would seem a lot of cycles is being wasted on context switching when zebra
gets busy.

If there is a way to get the routing performance I'm looking for in Linux,
I'd really like to know. I've been searching an asking for over a year
now. When I initially talked to Jamal about it, he told me NAPI was the
answer. It does help, but from my experience it's not the answer. I get
the impression nobody involved in the code has has tested under real-world
conditions. If that is, in fact, the problem then I can provide an ebgp
multihop full feed and a synflood utility for stress testing. If the
linux routing and ethernet driver code is improved so I can handle 50kpps
of inbound regular traffic, a 50kpps random-source DOS, and still have 50%
CPU left for Zebra then Cisco might have something to worry about...

Ralph Doncaster, president
6042147 Canada Inc. o/a IStop.com

On Mon, 19 May 2003, Jamal Hadi wrote:

>
> Florian,
> I actually asked you to run some tests last time you showed up
> on netdev but never heard back. Maybe we can get some results now
> that the complaining is still continuing. Note, we cant just invent
> things because "CISCO is doing it like that". That doesnt cut it.
> What we need is data to substantiate things and then we move from there.
>
> And oh, i am pretty sure we can beat any of the BSDs forwarding rate.
> Anyone wants a duel, lets meet at the water fountain by the town
> hall at sunrise.
>
> cheers,
> jamal
>
>
>