2014-09-21 09:53:42

by George Spelvin

[permalink] [raw]
Subject: [RFC] mke2fs -E hash_alg=siphash: any interest?

I was figuring out which directory hash I wanted on a new file system,
and wasn't really thrilled with any of them.

SipHash beats the current options on 32-bit x86:
* Testing strings up to 4096 bytes:
Test siphash24a: Elapsed = 20.754133
Test siphash24: Elapsed = 24.751991
Test teahash: Elapsed = 68.409799
Test md4hash: Elapsed = 42.375412
* Testing strings up to 8 bytes:
Test siphash24a: Elapsed = 0.001247
Test siphash24: Elapsed = 0.001181
Test teahash: Elapsed = 0.001944
Test md4hash: Elapsed = 0.001421
* Testing strings up to 16 bytes:
Test siphash24a: Elapsed = 0.003101
Test siphash24: Elapsed = 0.003125
Test teahash: Elapsed = 0.004847
Test md4hash: Elapsed = 0.003728
* Testing strings up to 24 bytes:
Test siphash24a: Elapsed = 0.005415
Test siphash24: Elapsed = 0.005622
Test teahash: Elapsed = 0.009881
Test md4hash: Elapsed = 0.006789

And obviously does even better on 64 bits:
* Testing strings up to 4096 bytes:
Test siphash24a: Elapsed = 5.222064
Test siphash24: Elapsed = 5.200439
Test teahash: Elapsed = 61.071545
Test md4hash: Elapsed = 39.958463
* Testing strings up to 8 bytes:
Test siphash24a: Elapsed = 0.000394
Test siphash24: Elapsed = 0.000320
Test teahash: Elapsed = 0.001690
Test md4hash: Elapsed = 0.001216
* Testing strings up to 16 bytes:
Test siphash24a: Elapsed = 0.000920
Test siphash24: Elapsed = 0.000829
Test teahash: Elapsed = 0.004279
Test md4hash: Elapsed = 0.003262
* Testing strings up to 24 bytes:
Test siphash24a: Elapsed = 0.001580
Test siphash24: Elapsed = 0.001448
Test teahash: Elapsed = 0.008824
Test md4hash: Elapsed = 0.005956


This is with the following test code, which started with me testing the
"read word at a time, but don't memory fault" optimization, but I turned
into a cheesy benchmark.

Basically, it offers security similar to teahash with a faster, and better
studied, primitive designed specifically for this application.

I'm thinking of turning this into a patch for ext2utils and fs/ext4.

Could I ask what the general level of interest is? On a scale of "hell,
no, not more support burden!" to "thank you, I've been meaning to find
time to add that!"

I'm planning on using the "siphash24" code, which is slightly slower than
the more unrolled "siphash24a" code on 32-bit, but about 55% the size:

Function size 32-bit 64-bit
siphash24 1043 381
siphash24a 1955 661

I'm assuming DX_HASH_SIPHASH_2_4=6 would be the right value.


-- 8< --- Current ugly test/benchmark code --- 8< ---
#include <stdint.h>
#include <string.h>

#if 0 /* Linux kernel */

#include <bitops.h> /* For rol64 */
#include <asm/unaligned.h>

#else /* Emulation */

#include <endian.h>

#define __le64_to_cpu le64toh
#define __get_unaligned_cpu64(p) __le64_to_cpu(*(uint64_t const *)(p))

#define rol64(x,s) ((x) << (s) | (x) >> (64-(s)))

#define __pure __attribute__((pure))

#endif


#define SIP_MIX(a,b,s) ( (a) += (b), (b) = rol64(b, s), (b) ^= (a) )

#define SIP_ROUND(a,b,c,d) \
( SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32) )

#define SIP_MIXIN(a,b,c,d,m) \
( (d) ^= (m), SIP_ROUND(a,b,c,d), SIP_ROUND(a,b,c,d), (a) ^= (m) )


/*
* This is the "fast" version.
* i386: 1955 bytes, 23.789444 seconds
* x86-64: 661 bytes, 5.203469 seconds
*/
uint64_t __pure
siphash24a(char const *msg, int len, uint32_t const seed[4])
{
uint64_t a = 0x736f6d6570736575; /* somepseu */
uint64_t b = 0x646f72616e646f6d; /* dorandom */
uint64_t c = 0x6c7967656e657261; /* lygenera */
uint64_t d = 0x7465646279746573; /* tedbytes */
uint64_t m;
unsigned r;

/* Mix in the 128-bit hash seed */
if (seed) {
m = seed[0] | (uint64_t)seed[1] << 32;
a ^= m; c ^= m;
m = seed[2] | (uint64_t)seed[3] << 32;
b ^= m; d ^= m;
}

/* Mix in all the complete words, until r < 8 */
for (r = len; r >= 8; r -= 8, msg += 8) {
m = __get_unaligned_cpu64(msg);
SIP_MIXIN(a, b, c, d, m);
}

/*
* The last partial word is tricky. We'd like to do one fetch, but
* we can't just to a 64-bit fetch and mask, because that
* might hit a protection boundary. Fortunately, we know that
* protection boundaries are 64-bit aligned, so we can
* consider only three cases:
* - The remainder occupies zero words
* - The remainder fits into one word
* - The remainder straddles two words
*/
if (!r) {
m = 0;
} else {
unsigned offset = (unsigned)(unsigned long)msg & 7;

if (offset + r > 8) {
m = __get_unaligned_cpu64(msg);
} else {
m = __le64_to_cpu(*(uint64_t const *)(msg - offset));
m >>= 8*offset;
}
m &= ((uint64_t)1 << 8*r) - 1;
}
/* Padding is one byte of length mod 256 */
m |= (uint64_t)len << 56;
SIP_MIXIN(a, b, c, d, m);

/* Finalization */
c ^= 0xff;
SIP_ROUND(a, b, c, d);
SIP_ROUND(a, b, c, d);
SIP_ROUND(a, b, c, d);
SIP_ROUND(a, b, c, d);

return a ^ b ^ c ^ d;
}

/*
* A more space-efficient version.
* i386: 1043 bytes, 28.400234 seconds
* x86-64: 381 bytes, 5.160729 seconds
*
* Hey, same speed or faster; I like that!
* But 20% slower on 32-bit, sigh.
*/
uint64_t __pure
siphash24(char const *msg, int len, uint32_t const seed[4])
{
uint64_t a = 0x736f6d6570736575; /* somepseu */
uint64_t b = 0x646f72616e646f6d; /* dorandom */
uint64_t c = 0x6c7967656e657261; /* lygenera */
uint64_t d = 0x7465646279746573; /* tedbytes */
uint64_t m = 0;
unsigned r = (len + 1) / 8 + 2; /* Number of double-rounds to perform */

/* Mix in the 128-bit hash seed */
if (seed) {
m = seed[2] | (uint64_t)seed[3] << 32;
b ^= m;
d ^= m;
m = seed[0] | (uint64_t)seed[1] << 32;
/* a ^= m; is done in loop below */
c ^= m;
}

/*
* By using the same SIP mixing code for all iterations, we
* save space, at the expense of some branch prediction. But
* branch prediction is hard because of variable length anyway.
*/
do {
a ^= m;

switch (--r) {
unsigned bytes;

default: /* Full words */
d ^= m = __get_unaligned_cpu64(msg);
msg += 8;
break;
case 2: /* Final partial word */
/*
* We'd like to do one 64-bit fetch rather than mess
* around with bytes, but reading past the end might hit
* a protection boundary. Fortunately, we know that
* protection boundaries are aligned, so we can
* consider only three cases:
* - The remainder occupies zero words
* - The remainder fits into one word
* - The remainder straddles two words
*/
bytes = len & 7;

if (bytes == 0) {
m = 0;
} else {
unsigned offset = (unsigned)(unsigned long)msg & 7;

if (offset + bytes > 8) {
m = __get_unaligned_cpu64(msg);
} else {
m = __le64_to_cpu(*(uint64_t const *)(msg - offset));
m >>= 8*offset;
}
m &= ((uint64_t)1 << 8*bytes) - 1;
}
/* We could use OR or +, but XOR lets the compiler rearrange */
d ^= m ^= (uint64_t)len << 56;
break;
case 1:
m = 0;
c ^= 0xff; /* Beginning of finalization */
/*FALLTHROUGH*/
case 0:
break;
}

SIP_ROUND(a, b, c, d);
SIP_ROUND(a, b, c, d);
} while (r);

return a ^ b ^ c ^ d;
}

#undef SIP_MIXIN
#undef SIP_ROUND
#undef SIP_MIX

/*
* Keyed 32-bit hash function using TEA in a Davis-Meyer function
* H0 = Key
* Hi = E Mi(Hi-1) + Hi-1
*
* (see Applied Cryptography, 2nd edition, p448).
*
* Jeremy Fitzhardinge <[email protected]> 1998
*
* This code is made available under the terms of the GPL
*/
#define DELTA 0x9E3779B9

static void TEA_transform(uint32_t buf[2], uint32_t const in[4])
{
uint32_t sum = 0;
uint32_t b0 = buf[0], b1 = buf[1];
uint32_t a = in[0], b = in[1], c = in[2], d = in[3];
int n = 16;

do {
sum += DELTA;
b0 += ((b1 << 4)+a) ^ (b1+sum) ^ ((b1 >> 5)+b);
b1 += ((b0 << 4)+c) ^ (b0+sum) ^ ((b0 >> 5)+d);
} while (--n);

buf[0] += b0;
buf[1] += b1;
}

#undef DELTA

static void str2hashbuf(char const *msg, int len, uint32_t *buf, int num)
{
uint32_t pad, val;
int i;
unsigned char const *ucp = (unsigned char const *)msg;


pad = (uint32_t)len | (uint32_t)len << 8;
pad |= pad << 16;

val = pad;
if (len > num*4)
len = num * 4;
for (i = 0; i < len; i++) {
val = (val << 8) + ucp[i];
if (i % 4 == 3) {
*buf++ = val;
val = pad;
num--;
}
}
while (--num >= 0) {
*buf++ = val;
val = pad;
}
}

uint64_t __pure
teahash(char const *msg, int len, uint32_t const seed[4])
{
static uint32_t const init[4] = {
0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476 };
uint32_t buf[2], in[4];

memcpy(buf, seed ? seed : init, sizeof(buf));

while (len > 0) {
str2hashbuf(msg, len, in, 4);
TEA_transform(buf, in);
len -= 16;
msg += 16;
}
return (uint64_t)buf[1] << 32 | buf[0];
}

/* F, G and H are basic MD4 functions: selection, majority, parity */
#define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
#define G(x, y, z) (((x) & (y)) + (((x) ^ (y)) & (z)))
#define H(x, y, z) ((x) ^ (y) ^ (z))

/*
* The generic round function. The application is so specific that
* we don't bother protecting all the arguments with parens, as is generally
* good macro practice, in favor of extra legibility.
* Rotation is separate from addition to prevent recomputation
*/
#define ROUND(f, a, b, c, d, x, s) \
(a += f(b, c, d) + x, a = (a << s) | (a >> (32-s)))
#define K1 0
#define K2 013240474631UL
#define K3 015666365641UL

/*
* Basic cut-down MD4 transform. Returns only 32 bits of result.
*/
static void halfMD4Transform (uint32_t buf[4], uint32_t const in[8])
{
uint32_t a = buf[0], b = buf[1], c = buf[2], d = buf[3];

/* Round 1 */
ROUND(F, a, b, c, d, in[0] + K1, 3);
ROUND(F, d, a, b, c, in[1] + K1, 7);
ROUND(F, c, d, a, b, in[2] + K1, 11);
ROUND(F, b, c, d, a, in[3] + K1, 19);
ROUND(F, a, b, c, d, in[4] + K1, 3);
ROUND(F, d, a, b, c, in[5] + K1, 7);
ROUND(F, c, d, a, b, in[6] + K1, 11);
ROUND(F, b, c, d, a, in[7] + K1, 19);

/* Round 2 */
ROUND(G, a, b, c, d, in[1] + K2, 3);
ROUND(G, d, a, b, c, in[3] + K2, 5);
ROUND(G, c, d, a, b, in[5] + K2, 9);
ROUND(G, b, c, d, a, in[7] + K2, 13);
ROUND(G, a, b, c, d, in[0] + K2, 3);
ROUND(G, d, a, b, c, in[2] + K2, 5);
ROUND(G, c, d, a, b, in[4] + K2, 9);
ROUND(G, b, c, d, a, in[6] + K2, 13);

/* Round 3 */
ROUND(H, a, b, c, d, in[3] + K3, 3);
ROUND(H, d, a, b, c, in[7] + K3, 9);
ROUND(H, c, d, a, b, in[2] + K3, 11);
ROUND(H, b, c, d, a, in[6] + K3, 15);
ROUND(H, a, b, c, d, in[1] + K3, 3);
ROUND(H, d, a, b, c, in[5] + K3, 9);
ROUND(H, c, d, a, b, in[0] + K3, 11);
ROUND(H, b, c, d, a, in[4] + K3, 15);

buf[0] += a;
buf[1] += b;
buf[2] += c;
buf[3] += d;
}

#undef ROUND
#undef F
#undef G
#undef H
#undef K1
#undef K2
#undef K3

uint64_t __pure
md4hash(char const *msg, int len, uint32_t const seed[4])
{
static uint32_t const init[4] = {
0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476 };
uint32_t buf[4], in[8];

memcpy(buf, seed ? seed : init, sizeof(buf));

while (len > 0) {
str2hashbuf(msg, len, in, 8);
halfMD4Transform(buf, in);
len -= 32;
msg += 32;
}
return (uint64_t)buf[2] << 32 | buf[1];
}

#include <string.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/time.h>

#define PAGESIZE 4096

uint64_t time_test(char const *buf, int len, int maxlen, uint64_t f(char const *, int, uint32_t const [4]))
{
struct timeval tv1, tv2;
uint64_t sum = 0;
int i, j;

if (gettimeofday(&tv1, NULL) < 0) {
perror("gettimeofday");
return 0;
}

for (i = 1; i < maxlen; i++)
for (j = 0; j < len - i; j++)
sum += f(buf+j, i, NULL);

if (gettimeofday(&tv2, NULL) < 0) {
perror("gettimeofday");
return 0;
}
tv2.tv_sec -= tv1.tv_sec;
tv2.tv_usec -= tv1.tv_usec;
if (tv2.tv_usec < 0) {
tv2.tv_sec--;
tv2.tv_usec += 1000000;
}
printf("Elapsed = %u.%06u\n", (unsigned)tv2.tv_sec,
(unsigned)tv2.tv_usec);
return sum;
}

int
main(void)
{
char *buf;
int i, j;

/* The test vector in the SipHash specification */
static uint32_t const key[4] = {
0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c
};
static char const message[15] = "\0\1\2\3\4\5\6\7\10\11\12\13\14\15\16";
uint64_t const hexpected = 0xa129ca6149be45e5;
uint64_t htest = siphash24(message, sizeof message, key);

printf("Test hash = %016llx\n"
"Expected = %016llx\n", htest, hexpected);
if (htest != hexpected)
return 1;

buf = mmap(NULL, 2*PAGESIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (buf == MAP_FAILED) {
perror("mmap");
return 1;
}
if (munmap(buf + PAGESIZE, PAGESIZE) < 0) {
perror("munmap");
return 1;
}

for (i = 0; i < 8; i++) {
char *p = buf + PAGESIZE - sizeof message - i;
memcpy (p, message, sizeof message);
htest = siphash24(p, sizeof message, key);
printf("Test %-4d = %016llx\n", i, htest);
if (htest != hexpected)
return 1;

}

memset(buf, 'x', PAGESIZE);

for (i = 0; i < PAGESIZE; i++) {
uint64_t const h1 = siphash24(buf, i, NULL);

if ((i & 15) == 15) {
printf("\rHash(%u) = %016llx", i, h1);
fflush(stdout);
}

for (j = 1; j < PAGESIZE - i; j++) {
uint64_t const h2 = siphash24(buf+j, i, NULL);

if (h2 != h1) {
printf("\nERROR: hash@%u = %016llx\n", j, h2);
return 1;
}
}
}
putchar('\n');
for (i = 0; i < 4; i++) {
int maxlen = i ? 8 * i : 4096;
printf("* Testing strings up to %d bytes:\n", maxlen);
fputs("Test siphash24a: ", stdout);
time_test(buf, PAGESIZE, maxlen, siphash24a);
fputs("Test siphash24: ", stdout);
time_test(buf, PAGESIZE, maxlen, siphash24);
fputs("Test teahash: ", stdout);
time_test(buf, PAGESIZE, maxlen, teahash);
fputs("Test md4hash: ", stdout);
time_test(buf, PAGESIZE, maxlen, md4hash);
}

return 0;
}


2014-09-21 17:55:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

On Sun, Sep 21, 2014 at 05:53:39AM -0400, George Spelvin wrote:
>
> Basically, it offers security similar to teahash with a faster, and better
> studied, primitive designed specifically for this application.
>
> I'm thinking of turning this into a patch for ext2utils and fs/ext4.
>
> Could I ask what the general level of interest is? On a scale of "hell,
> no, not more support burden!" to "thank you, I've been meaning to find
> time to add that!"

I'm certainly not against adding a new hash function. The reality is
that it would be quite a while before we could turn it on by default,
because of the backwards compatibility concerns.

The question I would ask is whether we can show an anctual performance
improvement with the hash being used in situ. Let's give it the best
possible chance of making a difference; let's assume a RAM disk with a
very metadata intensive benchmark, with journalling turned off. What
sort of difference would we see, either in terms of system CPU time,
wall clock time, etc.?

The results of such a benchmark would certainly make a difference in
how aggressively we might try to phase in a new hash algorithm.

Cheers,

- Ted

2014-09-21 21:04:19

by George Spelvin

[permalink] [raw]
Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

> I'm certainly not against adding a new hash function. The reality is
> that it would be quite a while before we could turn it on by default,
> because of the backwards compatibility concerns.

Well, yes, obviously! My itch is just that I want to use it myself;
I prefer it for security and cleanliness reasons. The benchmarks are
mostly to prove that it isn't slower.

> The question I would ask is whether we can show an anctual performance
> improvement with the hash being used in situ.

I quite agree, but I'll have to have a working patch before such
a test can be made.

One things I'm coming across immediately that I have to ask for
design guidance on is the hash algorithm number assignment:

- Should I leave room for more hashes with a signed/unsigned distinction,
or should I assume that's a historical kludge that won't be perpetuated?
SipHash is defined on a byte string, so there isn't really a signed
version.
- Should I use a new EXT2_HASH_SIPHASH_62 = 6, or should I
renumber the (internal-only) EXT2_HASH_*_UNSIGNED values and use
EXT2_HASH_SIPHASH_4_2 = 4?

None of this is truly final, but it would make my life easier if I
didn't have to change it on my test filesystems too often.

2014-09-21 22:08:28

by TR Reardon

[permalink] [raw]
Subject: RE: [RFC] mke2fs -E hash_alg=siphash: any interest?

Is protection against hash DoS important for local on-disk format? I can't come up?
with a scenario, but maybe there is one. ?The kinds of DoS contemplated are?
really Google-scale, not really at the scale of ext4 directories, imo.

I ask because if hash perf is the main goal here, then CityHash and particularly SpookyHash?
are better candidates. The latter has good performance even on legacy ARMv5 hardware.?

+Reardon


----------------------------------------
> Date: Sun, 21 Sep 2014 17:04:16 -0400
> From: [email protected]
> To: [email protected]; [email protected]
> Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?
> CC: [email protected]
>
>> I'm certainly not against adding a new hash function. The reality is
>> that it would be quite a while before we could turn it on by default,
>> because of the backwards compatibility concerns.
>
> Well, yes, obviously! My itch is just that I want to use it myself;
> I prefer it for security and cleanliness reasons. The benchmarks are
> mostly to prove that it isn't slower.
>
>> The question I would ask is whether we can show an anctual performance
>> improvement with the hash being used in situ.
>
> I quite agree, but I'll have to have a working patch before such
> a test can be made.
>
> One things I'm coming across immediately that I have to ask for
> design guidance on is the hash algorithm number assignment:
>
> - Should I leave room for more hashes with a signed/unsigned distinction,
> or should I assume that's a historical kludge that won't be perpetuated?
> SipHash is defined on a byte string, so there isn't really a signed
> version.
> - Should I use a new EXT2_HASH_SIPHASH_62 = 6, or should I
> renumber the (internal-only) EXT2_HASH_*_UNSIGNED values and use
> EXT2_HASH_SIPHASH_4_2 = 4?
>
> None of this is truly final, but it would make my life easier if I
> didn't have to change it on my test filesystems too often.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
-

2014-09-22 01:17:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

On Sun, Sep 21, 2014 at 05:04:16PM -0400, [email protected] wrote:
> One things I'm coming across immediately that I have to ask for
> design guidance on is the hash algorithm number assignment:
>
> - Should I leave room for more hashes with a signed/unsigned distinction,
> or should I assume that's a historical kludge that won't be perpetuated?
> SipHash is defined on a byte string, so there isn't really a signed
> version.

It's a historical kludge that shouldn't be needed for new checksums;
just make sure it is defined correctly and that you test interop
between Big-Endian and Little-Endian machines. The awfulness we had
was because there were already a large number of file systems out in
the field on PowerPC and Intel machines, so we needed to preserve
compatibility as much as possible, such that newer kernels could
correctly understand filesystems that had originally been created on
the opposite-endian system.

Cheers,

- Ted

2014-09-22 02:31:07

by George Spelvin

[permalink] [raw]
Subject: RE: [RFC] mke2fs -E hash_alg=siphash: any interest?

> Is protection against hash DoS important for local on-disk format?
>
> I can't come up with a scenario, but maybe there is one. The kinds of
> DoS contemplated are really Google-scale, not really at the scale of
> ext4 directories,

Yes. It's the standard hash collision attack: if someone can force too
many hash collisions, they can force the hash tree to have pessimal
performance, including excessive disk space allocation in an attempt to
split directory pages.

In fact, I'm not sure if the code copes with more than 4096 bytes of
directory entry with a single hash value or it will cause some sort of
error.

I'm sure Ted knows the details, but the entire reason that cryptographic
hashes are used by ext3/4 is that it's a potential problem.

Anyway, in addition to being a local DoS, plenty of systems create files
with network-controllable names, like PHP session IDs. (The alphabet
and length are controlled, but that still leaves a lot of flexibility.)

SipHash is designed and widely used for exactly this.

> I ask because if hash perf is the main goal here, then CityHash and
> particularly SpookyHash are better candidates. The latter has good
> performance even on legacy ARMv5 hardware.

Both SipHash and SpookyHash are 64-bit ARX (add, rotate, XOR) designs,
so they have very similar performance characteristics.

SipHash does more mix operations than SpookyHash, both for zero-length
messages (24 vs. 11) and long messages (8 per 8 bytes, vs. 12 per 32
bytes), but is actually reasonably secure.

But in the typical-filename-length range, they aren't too far off from
each other.

2014-09-22 17:09:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

On Sun, Sep 21, 2014 at 10:31:05PM -0400, George Spelvin wrote:
>
> Yes. It's the standard hash collision attack: if someone can force too
> many hash collisions, they can force the hash tree to have pessimal
> performance, including excessive disk space allocation in an attempt to
> split directory pages.
>
> In fact, I'm not sure if the code copes with more than 4096 bytes of
> directory entry with a single hash value or it will cause some sort of
> error.

It does cope correctly, but it means that we degrade to doing a linear
search. At one point we had a test where we forced the hash algorithm
to always return "42" to check those code paths.

The real problem is that the htree implementation only supports a tree
dpeth of at most two levels, and it doesn't support collapsing the
tree to reduce space after deleting a large number of files.

The first would require making a compatibility-impacting change; the
second one would not necessarily require that, but it's also not
entirely trivial. We've gotten away with not making any changes here,
partially because using a keyed hash allows us to avoid intentional
attempts to hit those corner cases.

If someone was interested in doing some work in that space, there is
definitely room where we could improve there.

Cheers,

- Ted

2014-09-22 23:14:16

by George Spelvin

[permalink] [raw]
Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

>> In fact, I'm not sure if the code copes with more than 4096 bytes of
>> directory entry with a single hash value or it will cause some sort of
>> error.

> It does cope correctly, but it means that we degrade to doing a linear
> search. At one point we had a test where we forced the hash algorithm
> to always return "42" to check those code paths.

Shame on me for not RTFS before impugning your code.
I just knew it broke the optimization.

Anyway, I'm working on patches. Here's the start, which I hope is small
enough (2 patches: kernel, then e2fsprogs):

commit a4a2ee4011b2a6c844058d3a80d60fdc26f1607b
Author: George Spelvin <[email protected]>
Date: Mon Sep 22 22:40:30 2014 +0000

ex4: Introduce DX_HASH_UNSIGNED_DELTA

This makes it easier to add adidtional hash algorithms in future.

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1bbe7c31..cda8dd9c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1216,7 +1216,7 @@ struct ext4_sb_info {
u32 s_next_generation;
u32 s_hash_seed[4];
int s_def_hash_version;
- int s_hash_unsigned; /* 3 if hash should be signed, 0 if not */
+ int s_hash_unsigned; /* DX_HASH_UNSIGNED_DELTA, or 0 if signed */
struct percpu_counter s_freeclusters_counter;
struct percpu_counter s_freeinodes_counter;
struct percpu_counter s_dirs_counter;
@@ -1737,9 +1737,18 @@ static inline __le16 ext4_rec_len_to_disk(unsigned len, unsigned blocksize)
#define DX_HASH_LEGACY 0
#define DX_HASH_HALF_MD4 1
#define DX_HASH_TEA 2
-#define DX_HASH_LEGACY_UNSIGNED 3
-#define DX_HASH_HALF_MD4_UNSIGNED 4
-#define DX_HASH_TEA_UNSIGNED 5
+
+/*
+ * For historical reasons, the first three hash algorithms
+ * have signed and unsigned variants. So, for internal
+ * use only, define some extra values outside the range of
+ * what's allowed on disk.
+ */
+#define DX_HASH_UNSIGNED_DELTA 3
+
+#define DX_HASH_LEGACY_UNSIGNED (DX_HASH_LEGACY + DX_HASH_UNSIGNED_DELTA)
+#define DX_HASH_HALF_MD4_UNSIGNED (DX_HASH_HALF_MD4 + DX_HASH_UNSIGNED_DELTA)
+#define DX_HASH_TEA_UNSIGNED (DX_HASH_TEA + DX_HASH_UNSIGNED_DELTA)

static inline u32 ext4_chksum(struct ext4_sb_info *sbi, u32 crc,
const void *address, unsigned int length)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4566c2fe..9a9c0bbb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3721,16 +3721,17 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
for (i = 0; i < 4; i++)
sbi->s_hash_seed[i] = le32_to_cpu(es->s_hash_seed[i]);
sbi->s_def_hash_version = es->s_def_hash_version;
- if (EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_DIR_INDEX)) {
+ if (sbi->s_def_hash_version <= DX_HASH_TEA &&
+ EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_DIR_INDEX)) {
i = le32_to_cpu(es->s_flags);
if (i & EXT2_FLAGS_UNSIGNED_HASH)
- sbi->s_hash_unsigned = 3;
+ sbi->s_hash_unsigned = DX_HASH_UNSIGNED_DELTA;
else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
#ifdef __CHAR_UNSIGNED__
if (!(sb->s_flags & MS_RDONLY))
es->s_flags |=
cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
- sbi->s_hash_unsigned = 3;
+ sbi->s_hash_unsigned = DX_HASH_UNSIGNED_DELTA;
#else
if (!(sb->s_flags & MS_RDONLY))
es->s_flags |=



commit a493f2d7e43922d0795089d7058812a607028d90
Author: George Spelvin <[email protected]>
Date: Mon Sep 22 22:54:32 2014 +0000

Add EXT2_HASH_UNSIGNED instead of magic constant "3".

This can be changed later, allowing additional hash algorithms.

diff --git a/debugfs/htree.c b/debugfs/htree.c
index 4f0118d..4ab1a4c 100644
--- a/debugfs/htree.c
+++ b/debugfs/htree.c
@@ -68,7 +68,7 @@ static void htree_dump_leaf_node(ext2_filsys fs, ext2_ino_t ino,
hash_alg = rootnode->hash_version;
if ((hash_alg <= EXT2_HASH_TEA) &&
(fs->super->s_flags & EXT2_FLAGS_UNSIGNED_HASH))
- hash_alg += 3;
+ hash_alg += EXT2_HASH_UNSIGNED;

while (offset < fs->blocksize) {
dirent = (struct ext2_dir_entry *) (buf + offset);
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 0b9c5c5..454bbdc 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1003,7 +1003,7 @@ inline_read_fail:
dx_dir->hashversion = root->hash_version;
if ((dx_dir->hashversion <= EXT2_HASH_TEA) &&
(fs->super->s_flags & EXT2_FLAGS_UNSIGNED_HASH))
- dx_dir->hashversion += 3;
+ dx_dir->hashversion += EXT2_HASH_UNSIGNED;
dx_dir->depth = root->indirect_levels + 1;
} else if ((dirent->inode == 0) &&
(rec_len == fs->blocksize) &&
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index e37e871..e48feb7 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -138,7 +138,7 @@ static int fill_dir_block(ext2_filsys fs,
hash_alg = fs->super->s_def_hash_version;
if ((hash_alg <= EXT2_HASH_TEA) &&
(fs->super->s_flags & EXT2_FLAGS_UNSIGNED_HASH))
- hash_alg += 3;
+ hash_alg += EXT2_HASH_UNSIGNED;
/* While the directory block is "hot", index it. */
dir_offset = 0;
while (dir_offset < fs->blocksize) {
@@ -375,7 +375,7 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
hash_alg = fs->super->s_def_hash_version;
if ((hash_alg <= EXT2_HASH_TEA) &&
(fs->super->s_flags & EXT2_FLAGS_UNSIGNED_HASH))
- hash_alg += 3;
+ hash_alg += EXT2_HASH_UNSIGNED;

for (i=1; i < fd->num_array; i++) {
ent = fd->harray + i;
diff --git a/lib/ext2fs/dirhash.c b/lib/ext2fs/dirhash.c
index c4ac94e..ef7820c 100644
--- a/lib/ext2fs/dirhash.c
+++ b/lib/ext2fs/dirhash.c
@@ -197,7 +197,7 @@ errcode_t ext2fs_dirhash(int version, const char *name, int len,
const char *p;
int i;
__u32 in[8], buf[4];
- int unsigned_flag = 0;
+ int unsigned_flag = (version >= EXT2_HASH_UNSIGNED);

/* Initialize the default seed for the hash checksum functions */
buf[0] = 0x67452301;
@@ -216,16 +216,12 @@ errcode_t ext2fs_dirhash(int version, const char *name, int len,
}

switch (version) {
- case EXT2_HASH_LEGACY_UNSIGNED:
- unsigned_flag++;
- /* fallthrough */
case EXT2_HASH_LEGACY:
+ case EXT2_HASH_LEGACY_UNSIGNED:
hash = dx_hack_hash(name, len, unsigned_flag);
break;
- case EXT2_HASH_HALF_MD4_UNSIGNED:
- unsigned_flag++;
- /* fallthrough */
case EXT2_HASH_HALF_MD4:
+ case EXT2_HASH_HALF_MD4_UNSIGNED:
p = name;
while (len > 0) {
str2hashbuf(p, len, in, 8, unsigned_flag);
@@ -236,10 +232,8 @@ errcode_t ext2fs_dirhash(int version, const char *name, int len,
minor_hash = buf[2];
hash = buf[1];
break;
- case EXT2_HASH_TEA_UNSIGNED:
- unsigned_flag++;
- /* fallthrough */
case EXT2_HASH_TEA:
+ case EXT2_HASH_TEA_UNSIGNED:
p = name;
while (len > 0) {
str2hashbuf(p, len, in, 4, unsigned_flag);
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index f9a4bdb..53df88c 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -226,9 +226,18 @@ struct ext2_dx_root_info {
#define EXT2_HASH_LEGACY 0
#define EXT2_HASH_HALF_MD4 1
#define EXT2_HASH_TEA 2
-#define EXT2_HASH_LEGACY_UNSIGNED 3 /* reserved for userspace lib */
-#define EXT2_HASH_HALF_MD4_UNSIGNED 4 /* reserved for userspace lib */
-#define EXT2_HASH_TEA_UNSIGNED 5 /* reserved for userspace lib */
+
+/*
+ * For historical reasons, the first three hash algorithms
+ * have signed and unsigned variants. So, for internal
+ * use only, define some extra values outside the range of
+ * what's allowed on disk.
+ */
+#define EXT2_HASH_UNSIGNED 3
+
+#define EXT2_HASH_LEGACY_UNSIGNED (EXT2_HASH_UNSIGNED + EXT2_HASH_LEGACY)
+#define EXT2_HASH_HALF_MD4_UNSIGNED (EXT2_HASH_UNSIGNED + EXT2_HASH_HALF_MD4)
+#define EXT2_HASH_TEA_UNSIGNED (EXT2_HASH_UNSIGNED + EXT2_HASH_TEA)

#define EXT2_HASH_FLAG_INCOMPAT 0x1


2014-09-23 22:26:07

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

On Sep 21, 2014, at 7:55 PM, Theodore Ts'o <[email protected]> wrote:
> On Sun, Sep 21, 2014 at 05:53:39AM -0400, George Spelvin wrote:
>>
>> Basically, it offers security similar to teahash with a faster, and better studied, primitive designed specifically for this application.
>>
>> I'm thinking of turning this into a patch for ext2utils and fs/ext4.
>>
>> Could I ask what the general level of interest is? On a scale of "hell,
>> no, not more support burden!" to "thank you, I've been meaning to find
>> time to add that!"
>
> I'm certainly not against adding a new hash function. The reality is
> that it would be quite a while before we could turn it on by default,
> because of the backwards compatibility concerns.
>
> The question I would ask is whether we can show an anctual performance
> improvement with the hash being used in situ. Let's give it the best
> possible chance of making a difference; let's assume a RAM disk with a
> very metadata intensive benchmark, with journalling turned off. What
> sort of difference would we see, either in terms of system CPU time,
> wall clock time, etc.?
>
> The results of such a benchmark would certainly make a difference in
> how aggressively we might try to phase in a new hash algorithm.

Now that the patches are available, it makes sense to run some
directory-intensive benchmark to see whether the improved hash
function actually shows improved performance. The hash may be
somewhat faster, but since this is only hashing the filename and
not KB/MB of data, it isn't clear whether this is going to improve
observable performance of directory operations.

I'm not sure what a suitable benchmark for this is, however. It
needs to be doing filename lookups to exercise the hashing, but
in the workloads that I can think of there is always a lot more
work after the name is looked up (e.g. open(), stat(), etc) on
the filename. Some possibilities include "ls -l" or "mv A/* B/".
It may be the only way to see the difference is via oprofile.

It also isn't clear whether the strength of siphash is significantly
better than "halfmd4", which is already cryptographically-strong.
Since the filename hash is also a function of the filesystem-unique
s_hash_seed, mounting an "attack" on a directory needs to be specific
to a particular filesystem, and isn't portable to other filesystems.

Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2014-09-23 23:00:24

by George Spelvin

[permalink] [raw]
Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

> Now that the patches are available, it makes sense to run some
> directory-intensive benchmark to see whether the improved hash
> function actually shows improved performance. The hash may be
> somewhat faster, but since this is only hashing the filename and
> not KB/MB of data, it isn't clear whether this is going to improve
> observable performance of directory operations.

That's basically my current task, and why my v1 is kind of a draft
just to introduce the idea and flush out any comments on my choice of
identifier names and stuff like that.

Personally, I just like the cleanliness of using a primitive designed for
the purpose, but I benchmarked it to ensure it wouldn't be any *slower*.

> I'm not sure what a suitable benchmark for this is, however. It
> needs to be doing filename lookups to exercise the hashing, but
> in the workloads that I can think of there is always a lot more
> work after the name is looked up (e.g. open(), stat(), etc) on
> the filename. Some possibilities include "ls -l" or "mv A/* B/".
> It may be the only way to see the difference is via oprofile.

It's worse than that. The dcache has an great hit rate, and you have to
force misses. But if you actually hit the disk a lot, that will dwarf
hashing performance into unmeasurability.

So it requires a very cleverly designed benchmark to highlight it.

> It also isn't clear whether the strength of siphash is significantly
> better than "halfmd4", which is already cryptographically-strong.
> Since the filename hash is also a function of the filesystem-unique
> s_hash_seed, mounting an "attack" on a directory needs to be specific
> to a particular filesystem, and isn't portable to other filesystems.

There are two definitions of "stronger":

1) The unknowable truth, and
2) It has been subjected to a lot of analysis and appears to hold up well.

By criterion 2, SipHash *is* significantly stronger: it's presented at
crypto conferences, been studied, and is widely used.

halfmd4 a very ad-hoc primitive that I don't think anyone's looked at
seriously.

It's not obviously terrible, and it's possible that halfmd4 is more work
to break, but we won't know until someone with cryptanalytic skill takes
a swing at it.

2014-09-23 23:22:10

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

On Tue, Sep 23, 2014 at 07:00:23PM -0400, George Spelvin wrote:
> It's worse than that. The dcache has an great hit rate, and you have to
> force misses. But if you actually hit the disk a lot, that will dwarf
> hashing performance into unmeasurability.
>
> So it requires a very cleverly designed benchmark to highlight it.

Well, yes. That's why I suggested doing something with a RAM disk.
Perhaps creating a huge number of zero length files, then unmounting
the the file system and remounting it, and then deleting the huge
number of zero length files.

If that doesn't show an improvement, then it's unlikely any real world
use case would likely show an improvement....

> By criterion 2, SipHash *is* significantly stronger: it's presented at
> crypto conferences, been studied, and is widely used.
>
> halfmd4 a very ad-hoc primitive that I don't think anyone's looked at
> seriously.
>
> It's not obviously terrible, and it's possible that halfmd4 is more work
> to break, but we won't know until someone with cryptanalytic skill takes
> a swing at it.

The other thing to consider is what you get if you manage to crack the
crypto, which is that you might be able to force the worst case
performance, and possibly cause a directory creation to fail with an
ENOENT if the huge number of hash collisions cause the two-level htree
to overfill.

Neither is going to get you a huge amount, so it this decreases the
incentive for someone to spend a lot of effort trying to attack the
system. I'm quite certain though that if there is some way such a
failure could cause an Iranian nuclear centrifuge to fail
catastrophically, our friends at Fort Meade would have absolutely no
problems finding an attack. After all, they did for MD5. :-)

Cheers,

- Ted

2014-09-24 00:37:57

by George Spelvin

[permalink] [raw]
Subject: Re: [RFC] mke2fs -E hash_alg=siphash: any interest?

> Well, yes. That's why I suggested doing something with a RAM disk.
> Perhaps creating a huge number of zero length files, then unmounting
> the the file system and remounting it, and then deleting the huge
> number of zero length files.
>
> If that doesn't show an improvement, then it's unlikely any real world
> use case would likely show an improvement....

I've been using a file on a persistent file system, but that's
how I've been doing my validation testing. I'll beat on it some more,
thanks!

> The other thing to consider is what you get if you manage to crack the
> crypto, which is that you might be able to force the worst case
> performance, and possibly cause a directory creation to fail with an
> ENOENT if the huge number of hash collisions cause the two-level htree
> to overfill.

Yes, it's just a DoS, but sometimes that's enough. As the world comes
to depend more and more on computer servers providing services, denial
thereof becomes powerful.

> Neither is going to get you a huge amount, so it this decreases the
> incentive for someone to spend a lot of effort trying to attack the
> system. I'm quite certain though that if there is some way such a
> failure could cause an Iranian nuclear centrifuge to fail
> catastrophically, our friends at Fort Meade would have absolutely no
> problems finding an attack. After all, they did for MD5. :-)

Just remember that *that* was, at its heart, a DoS attack. It actually
damaged equipment, and was effective for far longer because it was
sufficiently surreptitious, but its entire goal was denying the "service"
of gas centrifugation.

halfmd4 is, honestly, probably good enough. It's just that file systems
have a long lifetime, and I'd rather use something well tested.
That's what created this itch; I was picking the parameters for a new
large file system and didn't like the available options.