2006-05-31 04:29:10

by Raghuram

[permalink] [raw]
Subject: Question about tcp hash function tcp_hashfn()


Hi,

I needed a hash function (in my TCP related work) for
a project and happened to look at the function used by
TCP implementation in Linux. I searched for some
information about this function but couldn't find much
info. I would appreciate it if someone can provide
details or some pointers in this regard. Specifically,


1) Are there some design considerations/assumptions
behind the algorithm? In general, how was the
algorithm arrived at?

2) What happens if there are collisions? I am assuming
that each entry in the array will point to a linked
list of structures. Is there any limit on the length
of this list?

I hope it is ok to post questions like these on this
list. Please also CC me as I am not subscribed (at
this point).

Thanks,
Raghu.


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


2006-05-31 05:55:28

by Brian F. G. Bidulock

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

Raghuram,

On Tue, 30 May 2006, Raghuram wrote:

>
> Hi,
>
> I needed a hash function (in my TCP related work) for
> a project and happened to look at the function used by
> TCP implementation in Linux. I searched for some
> information about this function but couldn't find much
> info. I would appreciate it if someone can provide
> details or some pointers in this regard. Specifically,
>
>
> 1) Are there some design considerations/assumptions
> behind the algorithm? In general, how was the
> algorithm arrived at?

If you mean tcp_hashfn() from 2.4 kernel, I don't believe so.
In fact, if you analyze it, it is not even a very good nor
efficient hash function. For example, it goes to great pains
to permute upper order bits in the local address, which for
most connections will be a constant value.

> 2) What happens if there are collisions? I am assuming
> that each entry in the array will point to a linked
> list of structures. Is there any limit on the length
> of this list?

The hash value is used to index a hash bucket that has
established TCP sockets attached in a linked (collision)
list. Because the number of input values is limited, the
number of collisions is bounded. What the actual bound is
can be determined by analysis or using numerical methods.

Because local address has an extremely limited range, local
port number has a limited range (for dynamic assigment) and
the maximum number of connections will be limited by other
factors (e.g. system maximum number of open file descriptors)
the practical bound is much smaller than the theoretical bound
on the length of the collision list. Because it is not a very
good hash function, the bound on collisions for a given range
of input values might be greater than if a better function were
used. Also, TCP allocates rather large connection hash tables
at startup further reducing the size of collision lists.

There are two typcial uses of TCP: well known port numbers
upon which listening servers exist and outgoing client connections.
Outgoing client connections will almost always use a unique port
number. The hash function does not really exploit this fact.

--brian

--
Brian F. G. Bidulock ? The reasonable man adapts himself to the ?
[email protected] ? world; the unreasonable one persists in ?
http://www.openss7.org/ ? trying to adapt the world to himself. ?
? Therefore all progress depends on the ?
? unreasonable man. -- George Bernard Shaw ?

2006-05-31 07:10:08

by David Miller

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

From: "Brian F. G. Bidulock" <[email protected]>
Date: Tue, 30 May 2006 23:55:26 -0600

> For example, it goes to great pains to permute upper order bits in
> the local address, which for most connections will be a constant
> value.

Consider an apache server hosting thousands of virtual
hosts. The local address will be different for every
such host.

Please drop linux-kernel in any future replies, this
discussion belongs on netdev not linux-kernel. Thanks.

2006-05-31 07:45:43

by Brian F. G. Bidulock

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

David,

On Wed, 31 May 2006, David Miller wrote:

> From: "Brian F. G. Bidulock" <[email protected]>
> Date: Tue, 30 May 2006 23:55:26 -0600
>
> > For example, it goes to great pains to permute upper order bits in
> > the local address, which for most connections will be a constant
> > value.
>
> Consider an apache server hosting thousands of virtual
> hosts. The local address will be different for every
> such host.
>

If you mean named virtual hosts, no. They have the same
addresses.

If you mean actual hosts (with an IP address), perhaps in
the low order bits (host number), but unlikely in the high
order bits of the local address (network mask bits).

Also, in such a case the local port number will be rather
constant (80, etc); a condition also not exploited by the
function.

Also consider that the function simply folds the values
rather than permuting bits across the key field by shifting
by some other value than a multiple of 8 between XOR
operations. This will result in a longer collision list
because the entropy of the key value has not been
sufficiently reduced.

It might sound like I'm complaining, but I'm not. The
function works for me. But from a purist point of view,
the hash function is not as efficient as it could be and
there is room for improvement.

2006-05-31 07:49:34

by David Miller

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

From: "Brian F. G. Bidulock" <[email protected]>
Date: Wed, 31 May 2006 01:45:40 -0600

> It might sound like I'm complaining, but I'm not. The
> function works for me. But from a purist point of view,
> the hash function is not as efficient as it could be and
> there is room for improvement.

For sure and there are plans afoot to move over to
dynamic table sizing and the Jenkins hash function.

2006-05-31 08:00:13

by Brian F. G. Bidulock

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

David,

On Wed, 31 May 2006, David Miller wrote:
>
> For sure and there are plans afoot to move over to
> dynamic table sizing and the Jenkins hash function.

Yes, that could be far more efficient.

2006-05-31 08:49:57

by Brian F. G. Bidulock

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

David,

On Wed, 31 May 2006, David Miller wrote:
>
> For sure and there are plans afoot to move over to
> dynamic table sizing and the Jenkins hash function.

Just a suggestion, but I have an approach that stands to be
faster than Jenkins deriving from the verification tag approach
that I took for SCTP (OpenSS7 SCTP not lksctp).

TCP uses a cryptographic hash function to select its initial
sequence number (SCTP does the same for vtag). Last I checked
it was taken from an MD4 swirled entropy pool and further
combined with the local and remote IP addresses and port
numbers.

Each received segment contains a sequence number that is offset
from the initial sequence number but is expected to appear
within the current window. Most of the high order bits of an
in-window sequence number are predicatable at any point in time
and, due to cryptographic strength, are more efficient than
Jenkins, ... and they are right there in the received packet.

The approach would take the high order bits of the received
sequence number and use them to index a separate sequence number
keyed established hash (which could be dynamic). As the window
changes, the socket would have to be removed and reinserted into
this hash, but the repositioning would be infrequent. Out of
window segments would fail to find a socket, but could fall back
to the current established hash, or even the bind hash. A layer
of caching could increase the hash lookup speed further for
noisy senders.

This would be faster than a Jenkins hash approach because it
would not be necessary to calculate the hash function at all for
in-window segments. Per packet overheads would decrease and
better small packet performance would be experienced (i.e. your
http server). It has better hash coverage because MD4 and other
cryptographic algorithms used for initial sequence number
selection have far better properties than Jenkins.

What do you think?

2006-05-31 09:02:12

by David Miller

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

From: "Brian F. G. Bidulock" <[email protected]>
Date: Wed, 31 May 2006 02:49:54 -0600

> This would be faster than a Jenkins hash approach because it
> would not be necessary to calculate the hash function at all for
> in-window segments. Per packet overheads would decrease and
> better small packet performance would be experienced (i.e. your
> http server). It has better hash coverage because MD4 and other
> cryptographic algorithms used for initial sequence number
> selection have far better properties than Jenkins.
>
> What do you think?

I don't know how practical this is. The 4GB sequence space
wraps very fast on 10 gigabit, so we'd be rehashing a bit
and 100 gigabit would make things worse whenever that shows
up.

It is, however, definitely an interesting idea.

We also need the pure traditional hashes for net channels. I don't
see how we could use your scheme for net channels, because we are just
hashing in the interrupt handler of the network device driver in order
to get a queue to tack the packet onto, we're not interpreting the
sequence numbers and thus would not able to maintain the sequence
space based hashing state.

On a 3ghz cpu, the jenkins hash is essentially free. Even on slower
cpus, jhash_2words for example is just 20 cycles on a sparc64 chip.
It's ~40 integer instructions and they all pair up perfectly to
dual issue. We'd probably use jhash_3words() for TCP ipv4 which
would get us into the 30 cycle range.

A few years ago when I introduced jenkins into the tree, I thought
it's execution cost might be an issue. I really don't anymore.

2006-05-31 09:03:13

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

On Wed, May 31, 2006 at 02:00:09AM -0600, Brian F. G. Bidulock ([email protected]) wrote:
> > For sure and there are plans afoot to move over to
> > dynamic table sizing and the Jenkins hash function.
>
> Yes, that could be far more efficient.

In theory practice and theory are the same, but in practice they are
different.

Current simple XOR hash used in socket selection code is just bloody good!
Jenkins hash unfortunately has _significant_ artefacts which were found
in netchannel [1] hash selection analysis [2].
And Jenkins hash is far too slower.

1. Netchannel.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=netchannel

2. Compared Jenkins hash with XOR hash used in TCP socket selection
code.
http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14

--
Evgeniy Polyakov

2006-05-31 09:12:17

by David Miller

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

From: Evgeniy Polyakov <[email protected]>
Date: Wed, 31 May 2006 13:03:02 +0400

> Current simple XOR hash used in socket selection code is just bloody
> good! Jenkins hash unfortunately has _significant_ artefacts which
> were found in netchannel [1] hash selection analysis [2]. And
> Jenkins hash is far too slower.

Yes, it wins from a simplicity standpoint and it performs quite well.
It was tuned from real socket data sets, but from systems running many
many years ago :)

FreeBSD even adopted this hash into their PCB hashing code at one
point.

I think it will need to be changed nevertheless. Even though this
hash works on established sockets, it is attackable just like the
routing hash used to be. If an attacker has sufficient resources, he
can make hash chains in the TCP established hash table very long. As
the years pass, it becomes easier and easier for one to have enough
computing power at their disposal to carry out this kind of attack.

So something like Jenkins with a random hash input become more and
more critical. And this requires some kind of way to rehash, RCU
table locking opens the door for that. Current locking scheme is
too tightly bound for that kind of flexibility.

I wish Ben L. would resubmit the TCP hash locking stuff he said he'd
work on. :)

2006-05-31 09:39:49

by Brian F. G. Bidulock

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

David,

On Wed, 31 May 2006, David Miller wrote:
>
> I don't know how practical this is. The 4GB sequence space
> wraps very fast on 10 gigabit, so we'd be rehashing a bit
> and 100 gigabit would make things worse whenever that shows
> up.

It works better for SCTP, because the vtags are constant. No
rehashing is required there.

But, also consider that rehashing is only required for senders
sending at a high data rate. (http clients will likely never
have to be rehashed.) These packets will typically be large and
per-packet overheads will be overwhelmed by per-byte overheads.

Also, the rehashing is orderly and simple, the entry is simply
bumped to the next sequential hash slot and the socket hash
structure can already be cached at the time the action is
performed. Rehashing, although a bother, would take little
time, and could simply be added as part of TCP's existing window
calculations.

>
> It is, however, definitely an interesting idea.
>
> We also need the pure traditional hashes for net channels. I don't
> see how we could use your scheme for net channels, because we are just
> hashing in the interrupt handler of the network device driver in order
> to get a queue to tack the packet onto, we're not interpreting the
> sequence numbers and thus would not able to maintain the sequence
> space based hashing state.

Under SCTP I still have the traditional established hash for
lookups of out of the blue packets and packets containing
invalid verification tags. Really long lookups would invite DoS
attacks on these.

>
> On a 3ghz cpu, the jenkins hash is essentially free. Even on slower
> cpus, jhash_2words for example is just 20 cycles on a sparc64 chip.
> It's ~40 integer instructions and they all pair up perfectly to
> dual issue. We'd probably use jhash_3words() for TCP ipv4 which
> would get us into the 30 cycle range.

But you could throw away all 30 cycles, plus the stacking and
unstacking of registers to get in and out of the algorithm.
Some architectures might benefit more.

Well, I thought you might find it interesting. Perhaps somebody
reading this will experiment with it. For SCTP it is one of a
number of techniques that allows OpenSS7 SCTP to drastically
outperform lksctp.

--brian

2006-05-31 09:44:38

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

On Wed, May 31, 2006 at 02:12:43AM -0700, David Miller ([email protected]) wrote:
> I think it will need to be changed nevertheless. Even though this
> hash works on established sockets, it is attackable just like the
> routing hash used to be. If an attacker has sufficient resources, he
> can make hash chains in the TCP established hash table very long. As
> the years pass, it becomes easier and easier for one to have enough
> computing power at their disposal to carry out this kind of attack.

Jenkins hash is very complex compared to tcp XOR one, and even simple
test showed it's bottlenecks in some setups, so it's tuning will be
quite challenging both from mathematical and engineering points of view.

And socket code actually differs from routing cache, since the former is
limited to 64k connections in a time, while routing cache can grow to
unpredictibe sizes.

> So something like Jenkins with a random hash input become more and
> more critical. And this requires some kind of way to rehash, RCU
> table locking opens the door for that. Current locking scheme is
> too tightly bound for that kind of flexibility.
>
> I wish Ben L. would resubmit the TCP hash locking stuff he said he'd
> work on. :)

It was good approach indeed.

--
Evgeniy Polyakov

2006-05-31 09:51:27

by Brian F. G. Bidulock

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

Evgeniy,

On Wed, 31 May 2006, Evgeniy Polyakov wrote:
> 2. Compared Jenkins hash with XOR hash used in TCP socket selection code.
> http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14

Two problems with the comparison:

Port numbers can be collected into a 32 bit register in network
byte order directly from the TCP packet without taking two 16 bit
values and shifting and or'ing them.

Worse: he folded the jenkins algorith result with

h ^= h >> 16;
h ^= h >> 8;

Destroying the coverage of the function.

I, for one, am not suprised that artifacts appeared in the comparison
as a result of this destruction of the coverage of the hashing function.

2006-05-31 09:52:52

by Brian F. G. Bidulock

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

Evgeniy,

On Wed, 31 May 2006, Evgeniy Polyakov wrote:
>
> 1. Netchannel.
> http://tservice.net.ru/~s0mbre/old/?section=projects&item=netchannel

This one refers to the erroneous result below.

>
> 2. Compared Jenkins hash with XOR hash used in TCP socket selection
> code.
> http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14

2006-05-31 10:58:43

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

On Wed, May 31, 2006 at 03:51:24AM -0600, Brian F. G. Bidulock ([email protected]) wrote:
> Evgeniy,
>
> On Wed, 31 May 2006, Evgeniy Polyakov wrote:
> > 2. Compared Jenkins hash with XOR hash used in TCP socket selection code.
> > http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14
>
> Two problems with the comparison:
>
> Port numbers can be collected into a 32 bit register in network
> byte order directly from the TCP packet without taking two 16 bit
> values and shifting and or'ing them.

They are.

u32 ports;

ports = lport;
ports <<= 16;
ports |= fport;

> Worse: he folded the jenkins algorith result with
>
> h ^= h >> 16;
> h ^= h >> 8;
>
> Destroying the coverage of the function.

It was done to simulate socket code which uses the same folding.
Leaving 32bit space is just wrong, consider hash table size with that
index.

> I, for one, am not suprised that artifacts appeared in the comparison
> as a result of this destruction of the coverage of the hashing function.

It is comparison of the approach used in TCP hashing code, it is not full
mathematical analysis. And in that case jenkins hash already not good.
I'm sure it can be tuned, but it does require a lot of iterations, while
XOR one "just works".

--
Evgeniy Polyakov

2006-05-31 11:05:21

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

On Wed, May 31, 2006 at 02:58:18PM +0400, Evgeniy Polyakov ([email protected]) wrote:
> On Wed, May 31, 2006 at 03:51:24AM -0600, Brian F. G. Bidulock ([email protected]) wrote:
> > Evgeniy,
> >
> > On Wed, 31 May 2006, Evgeniy Polyakov wrote:
> > > 2. Compared Jenkins hash with XOR hash used in TCP socket selection code.
> > > http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14
> >
> > Two problems with the comparison:
> >
> > Port numbers can be collected into a 32 bit register in network
> > byte order directly from the TCP packet without taking two 16 bit
> > values and shifting and or'ing them.
>
> They are.
>
> u32 ports;
>
> ports = lport;
> ports <<= 16;
> ports |= fport;

Using network or host byte order does not affect hash distribution,
that shifting was coded to simulate other types of mixing ports,
which actually never showed different results.

> > Worse: he folded the jenkins algorith result with
> >
> > h ^= h >> 16;
> > h ^= h >> 8;
> >
> > Destroying the coverage of the function.
>
> It was done to simulate socket code which uses the same folding.
> Leaving 32bit space is just wrong, consider hash table size with that
> index.
>
> > I, for one, am not suprised that artifacts appeared in the comparison
> > as a result of this destruction of the coverage of the hashing function.
>
> It is comparison of the approach used in TCP hashing code, it is not full
> mathematical analysis. And in that case jenkins hash already not good.
> I'm sure it can be tuned, but it does require a lot of iterations, while
> XOR one "just works".

--
Evgeniy Polyakov

2006-05-31 13:06:32

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

On Wed, May 31, 2006 at 03:04:59PM +0400, Evgeniy Polyakov ([email protected]) wrote:
> On Wed, May 31, 2006 at 02:58:18PM +0400, Evgeniy Polyakov ([email protected]) wrote:
> > On Wed, May 31, 2006 at 03:51:24AM -0600, Brian F. G. Bidulock ([email protected]) wrote:
> > > Evgeniy,
> > >
> > > On Wed, 31 May 2006, Evgeniy Polyakov wrote:
> > > > 2. Compared Jenkins hash with XOR hash used in TCP socket selection code.
> > > > http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14
> > >
> > > Two problems with the comparison:
> > >
> > > Port numbers can be collected into a 32 bit register in network
> > > byte order directly from the TCP packet without taking two 16 bit
> > > values and shifting and or'ing them.
> >
> > They are.
> >
> > u32 ports;
> >
> > ports = lport;
> > ports <<= 16;
> > ports |= fport;
>
> Using network or host byte order does not affect hash distribution,
> that shifting was coded to simulate other types of mixing ports,
> which actually never showed different results.
>
> > > Worse: he folded the jenkins algorith result with
> > >
> > > h ^= h >> 16;
> > > h ^= h >> 8;
> > >
> > > Destroying the coverage of the function.
> >
> > It was done to simulate socket code which uses the same folding.
> > Leaving 32bit space is just wrong, consider hash table size with that
> > index.

Btw, that probably requires some clarification.
Since hash table size is definitely less than returned hash value, so
higher bits are removed, for that case above folding is done both in
XOR hash and my test case.
It is possible to just remove higher bits, but fairly ditributed parts
being xored produce still fairly distributed value.

--
Evgeniy Polyakov

2006-05-31 18:30:12

by Brian F. G. Bidulock

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

Evgeniy,

On Wed, 31 May 2006, Evgeniy Polyakov wrote:
> > > > Worse: he folded the jenkins algorith result with
> > > >
> > > > h ^= h >> 16;
> > > > h ^= h >> 8;
> > > >
> > > > Destroying the coverage of the function.
> > >
> > > It was done to simulate socket code which uses the same folding.
> > > Leaving 32bit space is just wrong, consider hash table size with that
> > > index.
>
> Btw, that probably requires some clarification.
> Since hash table size is definitely less than returned hash value, so
> higher bits are removed, for that case above folding is done both in
> XOR hash and my test case.
> It is possible to just remove higher bits, but fairly ditributed parts
> being xored produce still fairly distributed value.

> > > > h ^= h >> 16;
> > > > h ^= h >> 8;

This does not remove high order bits in either function.
Your comparison results are simply invalid with these two lines in place.

--brian

2006-05-31 18:40:58

by David Miller

[permalink] [raw]
Subject: Re: Question about tcp hash function tcp_hashfn()

From: Evgeniy Polyakov <[email protected]>
Date: Wed, 31 May 2006 14:58:18 +0400

> On Wed, May 31, 2006 at 03:51:24AM -0600, Brian F. G. Bidulock ([email protected]) wrote:
> > Worse: he folded the jenkins algorith result with
> >
> > h ^= h >> 16;
> > h ^= h >> 8;
> >
> > Destroying the coverage of the function.
>
> It was done to simulate socket code which uses the same folding.
> Leaving 32bit space is just wrong, consider hash table size with that
> index.

You absolutely show not do this shifting on the jenkins hash
result, you destroy the distribution entirely. Just mask
it with the hash mask and that's all you need to do.

Brian is right, this is absolutely critical to using the Jenkins hash
correctly. You're "unmixing" the bits it worked so hard to mix.