From: Jean-Philippe Aumasson Subject: Re: [PATCH] siphash: add cryptographically secure hashtable function Date: Sat, 10 Dec 2016 18:13:01 +0000 Message-ID: References: <20161209183659.25727-1-Jason@zx2c4.com> Reply-To: kernel-hardening@lists.openwall.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=f403045ec68edc8e54054351d2d9 Cc: LKML , kernel-hardening@lists.openwall.com, linux-crypto@vger.kernel.org, Rusty Russell , Linus Torvalds , "Daniel J . Bernstein" , linux@sciencehorizons.net To: Vegard Nossum , "Jason A. Donenfeld" Return-path: List-Post: List-Help: List-Unsubscribe: List-Subscribe: In-Reply-To: List-Id: linux-crypto.vger.kernel.org --f403045ec68edc8e54054351d2d9 Content-Type: text/plain; charset=UTF-8 SipHash co-designer here. SipHash is secure when it takes a secret key/seed as parameter, meaning that its output values are unpredictable. Concretely, when SipHash produces 64-bit output values then you've a chance 1/2^64 to guess the hash value of a given message, provided that the key/seed is kept secret. That's the standard security definition of a pseudorandom function (PRF), which is typically instantiated with a MAC such as HMAC-somehash. With djb we demonstrated that this security notion is sufficient to protect from hash-flooding attacks wherein an attacker creates many different input values that hash to a same value and therefore may DoS the underlying data structure. I admit that the naming is confusing: "SipHash" is not a hash function, strictly speaking. In crypto we only call hash function algorithms that are unkeyed. PRFs/MACs are sometimes called keyed hash functions though. On Sat, Dec 10, 2016 at 3:17 PM Vegard Nossum wrote: > On 9 December 2016 at 19:36, Jason A. Donenfeld wrote: > > SipHash is a 64-bit keyed hash function that is actually a > > cryptographically secure PRF, like HMAC. Except SipHash is super fast, > > and is meant to be used as a hashtable keyed lookup function. > > > > SipHash isn't just some new trendy hash function. It's been around for a > > while, and there really isn't anything that comes remotely close to > > being useful in the way SipHash is. With that said, why do we need this? > > > > There are a variety of attacks known as "hashtable poisoning" in which an > > attacker forms some data such that the hash of that data will be the > > same, and then preceeds to fill up all entries of a hashbucket. This is > > a realistic and well-known denial-of-service vector. > > > > Linux developers already seem to be aware that this is an issue, and > > various places that use hash tables in, say, a network context, use a > > non-cryptographically secure function (usually jhash) and then try to > > twiddle with the key on a time basis (or in many cases just do nothing > > and hope that nobody notices). While this is an admirable attempt at > > solving the problem, it doesn't actually fix it. SipHash fixes it. > > Could you give some more concrete details/examples? Here's the IPv4 > hash table from include/net/inet_sock.h / net/ipv4/inet_hashtables.c: > > static inline unsigned int __inet_ehashfn(const __be32 laddr, > const __u16 lport, > const __be32 faddr, > const __be16 fport, > u32 initval) > { > return jhash_3words((__force __u32) laddr, > (__force __u32) faddr, > ((__u32) lport) << 16 | (__force __u32)fport, > initval); > } > > static u32 inet_ehashfn(const struct net *net, const __be32 laddr, > const __u16 lport, const __be32 faddr, > const __be16 fport) > { > static u32 inet_ehash_secret __read_mostly; > > net_get_random_once(&inet_ehash_secret, sizeof(inet_ehash_secret)); > > return __inet_ehashfn(laddr, lport, faddr, fport, > inet_ehash_secret + net_hash_mix(net)); > } > > There's a 32-bit secret random salt (inet_ehash_secret) which means > that in practice, inet_ehashfn() will select 1 out of 2^32 different > hash functions at random each time you boot the kernel; without > knowing which one it selected, how can a local or remote attacker can > force IPv4 connections/whatever to go into a single hash bucket? > > It is not possible to obtain the secret salt directly (except by > reading from kernel memory, in which case you've lost already), nor is > it possible to obtain the result of inet_ehashfn() other than (maybe) > by a timing attack where you somehow need to detect that two > connections went into the same hash bucket and work backwards from > that to figure out how to land more connections into into the same > bucket -- but if they can do that, you've also already lost. > > The same pattern is used for IPv6 hashtables and the dentry cache. > > I suppose that using a hash function proven to be cryptographically > secure gives a hard guarantee (under some assumptions) that the > salt/key will give enough diversity between the (in the example above) > 2^32 different hash functions that you cannot improve your chances of > guessing that two values will map to the same bucket regardless of the > salt/key. However, I am a bit doubtful that using a cryptographically > secure hash function will make much of a difference as long as the > attacker doesn't actually have any way to get the output/result of the > hash function (and given that the hash function isn't completely > trivial, of course). > > I am happy to be proven wrong, but you make it sound very easy to > exploit the current situation, so I would just like to ask whether you > have a concrete way to do that? > > > Vegard > > > There are a modicum of places in the kernel that are vulnerable to > > hashtable poisoning attacks, either via userspace vectors or network > > vectors, and there's not a reliable mechanism inside the kernel at the > > moment to fix it. The first step toward fixing these issues is actually > > getting a secure primitive into the kernel for developers to use. Then > > we can, bit by bit, port things over to it as deemed appropriate. > > > > Dozens of languages are already using this internally for their hash > > tables. Some of the BSDs already use this in their kernels. SipHash is > > a widely known high-speed solution to a widely known problem, and it's > > time we catch-up. > --f403045ec68edc8e54054351d2d9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
SipHash co-designer here.

SipHash is se= cure when it takes a secret key/seed as parameter, meaning that its output = values are unpredictable. Concretely, when SipHash produces 64-bit output v= alues then you've a chance 1/2^64 to guess the hash value of a given me= ssage, provided that the key/seed is kept secret. That's the standard s= ecurity definition of a pseudorandom function (PRF), which is typically ins= tantiated with a MAC such as HMAC-somehash.=C2=A0

= With djb we demonstrated that this security notion is sufficient to protect= from hash-flooding attacks wherein an attacker creates many different inpu= t values that hash to a same value and therefore may DoS the underlying dat= a structure.

I admit that the naming is confusing:= "SipHash" is not a hash function, strictly speaking. In crypto w= e only call hash function algorithms that are unkeyed. PRFs/MACs are someti= mes called keyed hash functions though.



On Sat, Dec 10, 2016 at 3:17 PM V= egard Nossum <vegard.nossum@g= mail.com> wrote:
On 9 Decemb= er 2016 at 19:36, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> SipHash is a 64-bit keyed hash function that is actually a
> cryptographically secure PRF, like HMAC. Except SipHash is super fast,=
> and is meant to be used as a hashtable keyed lookup function.
>
> SipHash isn't just some new trendy hash function. It's been ar= ound for a
> while, and there really isn't anything that comes remotely close t= o
> being useful in the way SipHash is. With that said, why do we need thi= s?
>
> There are a variety of attacks known as "hashtable poisoning"= ; in which an
> attacker forms some data such that the hash of that data will be the > same, and then preceeds to fill up all entries of a hashbucket. This i= s
> a realistic and well-known denial-of-service vector.
>
> Linux developers already seem to be aware that this is an issue, and > various places that use hash tables in, say, a network context, use a<= br class=3D"gmail_msg"> > non-cryptographically secure function (usually jhash) and then try to<= br class=3D"gmail_msg"> > twiddle with the key on a time basis (or in many cases just do nothing=
> and hope that nobody notices). While this is an admirable attempt at > solving the problem, it doesn't actually fix it. SipHash fixes it.=

Could you give some more concrete details/examples? Here's the IPv4
hash table from include/net/inet_sock.h / net/ipv4/inet_hashtables.c:

static inline unsigned int __inet_ehashfn(const __be32 laddr,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0co= nst __u16 lport,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0co= nst __be32 faddr,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0co= nst __be16 fport,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0u3= 2 initval)
{
=C2=A0 =C2=A0 =C2=A0 =C2=A0return jhash_3words((__force __u32) laddr,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0(__force __u32) faddr,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0((__u32) lport) << 16 | (__force __u32)fport,=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0initval);
}

static u32 inet_ehashfn(const struct net *net, const __be32 laddr,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0const __u16 lport, const __be32 faddr,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0const __be16 fport)
{
=C2=A0 =C2=A0 =C2=A0 =C2=A0static u32 inet_ehash_secret __read_mostly;

=C2=A0 =C2=A0 =C2=A0 =C2=A0net_get_random_once(&inet_ehash_secret, size= of(inet_ehash_secret));

=C2=A0 =C2=A0 =C2=A0 =C2=A0return __inet_ehashfn(laddr, lport, faddr, fport= ,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0inet_ehash_secret + net_hash_mix(net));
}

There's a 32-bit secret random salt (inet_ehash_secret) which means
that in practice, inet_ehashfn() will select 1 out of 2^32 different
hash functions at random each time you boot the kernel; without
knowing which one it selected, how can a local or remote attacker can
force IPv4 connections/whatever to go into a single hash bucket?

It is not possible to obtain the secret salt directly (except by
reading from kernel memory, in which case you've lost already), nor is<= br class=3D"gmail_msg"> it possible to obtain the result of inet_ehashfn() other than (maybe)
by a timing attack where you somehow need to detect that two
connections went into the same hash bucket and work backwards from
that to figure out how to land more connections into into the same
bucket -- but if they can do that, you've also already lost.

The same pattern is used for IPv6 hashtables and the dentry cache.

I suppose that using a hash function proven to be cryptographically
secure gives a hard guarantee (under some assumptions) that the
salt/key will give enough diversity between the (in the example above)
2^32 different hash functions that you cannot improve your chances of
guessing that two values will map to the same bucket regardless of the
salt/key. However, I am a bit doubtful that using a cryptographically
secure hash function will make much of a difference as long as the
attacker doesn't actually have any way to get the output/result of the<= br class=3D"gmail_msg"> hash function (and given that the hash function isn't completely
trivial, of course).

I am happy to be proven wrong, but you make it sound very easy to
exploit the current situation, so I would just like to ask whether you
have a concrete way to do that?


Vegard

> There are a modicum of places in the kernel that are vulnerable to
> hashtable poisoning attacks, either via userspace vectors or network > vectors, and there's not a reliable mechanism inside the kernel at= the
> moment to fix it. The first step toward fixing these issues is actuall= y
> getting a secure primitive into the kernel for developers to use. Then=
> we can, bit by bit, port things over to it as deemed appropriate.
>
> Dozens of languages are already using this internally for their hash > tables. Some of the BSDs already use this in their kernels. SipHash is=
> a widely known high-speed solution to a widely known problem, and it&#= 39;s
> time we catch-up.
--f403045ec68edc8e54054351d2d9--