2005-01-21 21:53:08

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 0/12] random pt4: Moving and sharing code

This series focuses on moving and sharing code in
/drivers/char/random.c. It applies on top of -mm2 which contains my
earlier patches.

New bitop:
1 Create new rol32/ror32 bitops
2 Use them throughout the tree

Share SHA code in lib:
3 Kill the SHA1 variants
4 Cleanup SHA1 interface
5 Move SHA1 code to lib/
6 Replace SHA1 with faster version
7 Update cryptolib to use SHA1 from lib

Share halfmd4 hash:
8 Move halfMD4 to lib
9 Kill duplicate halfMD4 in ext3 htree

Move network random bits:
10 Simplify and shrink syncookie code
11 Move syncookies to net/
12 Move other tcp/ip bits to net/


2005-01-21 21:45:21

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 5/12] random pt4: Move SHA code to lib/

Move random SHA code to lib/.

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/lib/Makefile
===================================================================
--- rnd2.orig/lib/Makefile 2005-01-20 09:41:30.000000000 -0800
+++ rnd2/lib/Makefile 2005-01-20 12:20:08.424413615 -0800
@@ -5,7 +5,7 @@
lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \
bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \
kobject.o kref.o idr.o div64.o parser.o int_sqrt.o \
- bitmap.o extable.o kobject_uevent.o prio_tree.o
+ bitmap.o extable.o kobject_uevent.o prio_tree.o sha1.o

ifeq ($(CONFIG_DEBUG_KOBJECT),y)
CFLAGS_kobject.o += -DDEBUG
Index: rnd2/drivers/char/random.c
===================================================================
--- rnd2.orig/drivers/char/random.c 2005-01-20 09:41:30.000000000 -0800
+++ rnd2/drivers/char/random.c 2005-01-20 12:20:08.424413615 -0800
@@ -218,9 +218,6 @@
* Any flaws in the design are solely my responsibility, and should
* not be attributed to the Phil, Colin, or any of authors of PGP.
*
- * The code for SHA transform was taken from Peter Gutmann's
- * implementation, which has been placed in the public domain.
- *
* Further background information on this topic may be obtained from
* RFC 1750, "Randomness Recommendations for Security", by Donald
* Eastlake, Steve Crocker, and Jeff Schiller.
@@ -242,6 +239,7 @@
#include <linux/interrupt.h>
#include <linux/spinlock.h>
#include <linux/percpu.h>
+#include <linux/cryptohash.h>

#include <asm/processor.h>
#include <asm/uaccess.h>
@@ -671,116 +669,7 @@

EXPORT_SYMBOL(add_disk_randomness);

-#define HASH_BUFFER_SIZE 5
#define EXTRACT_SIZE 10
-#define HASH_EXTRA_SIZE 80
-
-/*
- * SHA transform algorithm, taken from code written by Peter Gutmann,
- * and placed in the public domain.
- */
-
-/* The SHA f()-functions. */
-
-#define f1(x,y,z) (z ^ (x & (y ^ z))) /* Rounds 0-19: x ? y : z */
-#define f2(x,y,z) (x ^ y ^ z) /* Rounds 20-39: XOR */
-#define f3(x,y,z) ((x & y) + (z & (x ^ y))) /* Rounds 40-59: majority */
-#define f4(x,y,z) (x ^ y ^ z) /* Rounds 60-79: XOR */
-
-/* The SHA Mysterious Constants */
-
-#define K1 0x5A827999L /* Rounds 0-19: sqrt(2) * 2^30 */
-#define K2 0x6ED9EBA1L /* Rounds 20-39: sqrt(3) * 2^30 */
-#define K3 0x8F1BBCDCL /* Rounds 40-59: sqrt(5) * 2^30 */
-#define K4 0xCA62C1D6L /* Rounds 60-79: sqrt(10) * 2^30 */
-
-/*
- * sha_transform: single block SHA1 transform
- *
- * @digest: 160 bit digest to update
- * @data: 512 bytes of data to hash
- * @W: 80 words of workspace
- *
- * This function generates a SHA1 digest for a single. Be warned, it
- * does not handle padding and message digest, do not confuse it with
- * the full FIPS 180-1 digest algorithm for variable length messages.
- */
-static void sha_transform(__u32 digest[5], const char *data, __u32 W[80])
-{
- __u32 A, B, C, D, E;
- __u32 TEMP;
- int i;
-
- memset(W, 0, sizeof(W));
- for (i = 0; i < 16; i++)
- W[i] = be32_to_cpu(((const __u32 *)data)[i]);
- /*
- * Do the preliminary expansion of 16 to 80 words. Doing it
- * out-of-line line this is faster than doing it in-line on
- * register-starved machines like the x86, and not really any
- * slower on real processors.
- */
- for (i = 0; i < 64; i++) {
- TEMP = W[i] ^ W[i+2] ^ W[i+8] ^ W[i+13];
- W[i+16] = rol32(TEMP, 1);
- }
-
- /* Set up first buffer and local data buffer */
- A = digest[ 0 ];
- B = digest[ 1 ];
- C = digest[ 2 ];
- D = digest[ 3 ];
- E = digest[ 4 ];
-
- /* Heavy mangling, in 4 sub-rounds of 20 iterations each. */
- for (i = 0; i < 80; i++) {
- if (i < 40) {
- if (i < 20)
- TEMP = f1(B, C, D) + K1;
- else
- TEMP = f2(B, C, D) + K2;
- } else {
- if (i < 60)
- TEMP = f3(B, C, D) + K3;
- else
- TEMP = f4(B, C, D) + K4;
- }
- TEMP += rol32(A, 5) + E + W[i];
- E = D; D = C; C = rol32(B, 30); B = A; A = TEMP;
- }
-
- /* Build message digest */
- digest[0] += A;
- digest[1] += B;
- digest[2] += C;
- digest[3] += D;
- digest[4] += E;
-
- /* W is wiped by the caller */
-}
-
-#undef f1
-#undef f2
-#undef f3
-#undef f4
-#undef K1
-#undef K2
-#undef K3
-#undef K4
-
-/*
- * sha_init: initialize the vectors for a SHA1 digest
- *
- * @buf: vector to initialize
- */
-static void sha_init(__u32 *buf)
-{
- buf[0] = 0x67452301;
- buf[1] = 0xefcdab89;
- buf[2] = 0x98badcfe;
- buf[3] = 0x10325476;
- buf[4] = 0xc3d2e1f0;
-}

/*********************************************************************
*
@@ -870,7 +759,7 @@
static void extract_buf(struct entropy_store *r, __u8 *out)
{
int i, x;
- __u32 data[16], buf[85];
+ __u32 data[16], buf[5 + SHA_WORKSPACE_WORDS];

sha_init(buf);
/*
@@ -1736,12 +1625,12 @@
#define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)

static int syncookie_init;
-static __u32 syncookie_secret[2][16-3+HASH_BUFFER_SIZE];
+static __u32 syncookie_secret[2][16-3+SHA_DIGEST_WORDS];

__u32 secure_tcp_syn_cookie(__u32 saddr, __u32 daddr, __u16 sport,
__u16 dport, __u32 sseq, __u32 count, __u32 data)
{
- __u32 tmp[16 + HASH_BUFFER_SIZE + HASH_EXTRA_SIZE];
+ __u32 tmp[16 + 5 + SHA_WORKSPACE_WORDS];
__u32 seq;

/*
@@ -1793,7 +1682,7 @@
__u32 check_tcp_syn_cookie(__u32 cookie, __u32 saddr, __u32 daddr, __u16 sport,
__u16 dport, __u32 sseq, __u32 count, __u32 maxdiff)
{
- __u32 tmp[16 + HASH_BUFFER_SIZE + HASH_EXTRA_SIZE];
+ __u32 tmp[16 + 5 + SHA_WORKSPACE_WORDS];
__u32 diff;

if (syncookie_init == 0)
Index: rnd2/include/linux/cryptohash.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ rnd2/include/linux/cryptohash.h 2005-01-20 12:20:08.424413615 -0800
@@ -0,0 +1,10 @@
+#ifndef __CRYPTOHASH_H
+#define __CRYPTOHASH_H
+
+#define SHA_DIGEST_WORDS 5
+#define SHA_WORKSPACE_WORDS 80
+
+void sha_init(__u32 *buf);
+void sha_transform(__u32 *digest, const char *data, __u32 *W);
+
+#endif
Index: rnd2/lib/sha1.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ rnd2/lib/sha1.c 2005-01-20 12:20:14.925584786 -0800
@@ -0,0 +1,102 @@
+/*
+ * SHA transform algorithm, taken from code written by Peter Gutmann,
+ * and placed in the public domain.
+ */
+
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/cryptohash.h>
+
+/* The SHA f()-functions. */
+
+#define f1(x,y,z) (z ^ (x & (y ^ z))) /* Rounds 0-19: x ? y : z */
+#define f2(x,y,z) (x ^ y ^ z) /* Rounds 20-39: XOR */
+#define f3(x,y,z) ((x & y) + (z & (x ^ y))) /* Rounds 40-59: majority */
+#define f4(x,y,z) (x ^ y ^ z) /* Rounds 60-79: XOR */
+
+/* The SHA Mysterious Constants */
+
+#define K1 0x5A827999L /* Rounds 0-19: sqrt(2) * 2^30 */
+#define K2 0x6ED9EBA1L /* Rounds 20-39: sqrt(3) * 2^30 */
+#define K3 0x8F1BBCDCL /* Rounds 40-59: sqrt(5) * 2^30 */
+#define K4 0xCA62C1D6L /* Rounds 60-79: sqrt(10) * 2^30 */
+
+/*
+ * sha_transform: single block SHA1 transform
+ *
+ * @digest: 160 bit digest to update
+ * @data: 512 bits of data to hash
+ * @W: 80 words of workspace
+ *
+ * This function generates a SHA1 digest for a single. Be warned, it
+ * does not handle padding and message digest, do not confuse it with
+ * the full FIPS 180-1 digest algorithm for variable length messages.
+ */
+void sha_transform(__u32 *digest, const char *data, __u32 *W)
+{
+ __u32 A, B, C, D, E;
+ __u32 TEMP;
+ int i;
+
+ memset(W, 0, sizeof(W));
+ for (i = 0; i < 16; i++)
+ W[i] = be32_to_cpu(((const __u32 *)data)[i]);
+ /*
+ * Do the preliminary expansion of 16 to 80 words. Doing it
+ * out-of-line line this is faster than doing it in-line on
+ * register-starved machines like the x86, and not really any
+ * slower on real processors.
+ */
+ for (i = 0; i < 64; i++) {
+ TEMP = W[i] ^ W[i+2] ^ W[i+8] ^ W[i+13];
+ W[i+16] = rol32(TEMP, 1);
+ }
+
+ /* Set up first buffer and local data buffer */
+ A = digest[ 0 ];
+ B = digest[ 1 ];
+ C = digest[ 2 ];
+ D = digest[ 3 ];
+ E = digest[ 4 ];
+
+ /* Heavy mangling, in 4 sub-rounds of 20 iterations each. */
+ for (i = 0; i < 80; i++) {
+ if (i < 40) {
+ if (i < 20)
+ TEMP = f1(B, C, D) + K1;
+ else
+ TEMP = f2(B, C, D) + K2;
+ } else {
+ if (i < 60)
+ TEMP = f3(B, C, D) + K3;
+ else
+ TEMP = f4(B, C, D) + K4;
+ }
+ TEMP += rol32(A, 5) + E + W[i];
+ E = D; D = C; C = rol32(B, 30); B = A; A = TEMP;
+ }
+
+ /* Build message digest */
+ digest[0] += A;
+ digest[1] += B;
+ digest[2] += C;
+ digest[3] += D;
+ digest[4] += E;
+
+ /* W is wiped by the caller */
+}
+
+/*
+ * sha_init: initialize the vectors for a SHA1 digest
+ *
+ * @buf: vector to initialize
+ */
+void sha_init(__u32 *buf)
+{
+ buf[0] = 0x67452301;
+ buf[1] = 0xefcdab89;
+ buf[2] = 0x98badcfe;
+ buf[3] = 0x10325476;
+ buf[4] = 0xc3d2e1f0;
+}
+

2005-01-21 21:48:23

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 11/12] random pt4: Move syncookies to net/

Move syncookie code off to networking land.

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/drivers/char/random.c
===================================================================
--- rnd2.orig/drivers/char/random.c 2005-01-20 10:16:13.830687244 -0800
+++ rnd2/drivers/char/random.c 2005-01-20 10:16:43.345924372 -0800
@@ -366,10 +366,6 @@
* hash; hash collisions will occur no more often than chance.
*/

-#ifdef CONFIG_SYN_COOKIES
-static __u32 syncookie_secret[2][16-3+SHA_WORKSPACE_WORDS];
-#endif
-
/*
* Static global variables
*/
@@ -901,9 +897,6 @@
init_std_data(&input_pool);
init_std_data(&blocking_pool);
init_std_data(&nonblocking_pool);
-#ifdef CONFIG_SYN_COOKIES
- get_random_bytes(syncookie_secret, sizeof(syncookie_secret));
-#endif
return 0;
}
module_init(rand_initialize);
@@ -1578,78 +1571,4 @@
return half_md4_transform(hash, keyptr->secret);
}

-#ifdef CONFIG_SYN_COOKIES
-/*
- * Secure SYN cookie computation. This is the algorithm worked out by
- * Dan Bernstein and Eric Schenk.
- *
- * For linux I implement the 1 minute counter by looking at the jiffies clock.
- * The count is passed in as a parameter, so this code doesn't much care.
- */
-
-#define COOKIEBITS 24 /* Upper bits store count */
-#define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
-
-static u32 cookie_hash(u32 saddr, u32 daddr, u32 sport, u32 dport,
- u32 count, int c)
-{
- __u32 tmp[16 + 5 + SHA_WORKSPACE_WORDS];
-
- memcpy(tmp + 3, syncookie_secret[c], sizeof(syncookie_secret[c]));
- tmp[0] = saddr;
- tmp[1] = daddr;
- tmp[2] = (sport << 16) + dport;
- tmp[3] = count;
- sha_transform(tmp + 16, tmp);
-
- return tmp[17];
-}
-
-__u32 secure_tcp_syn_cookie(__u32 saddr, __u32 daddr, __u16 sport,
- __u16 dport, __u32 sseq, __u32 count, __u32 data)
-{
- /*
- * Compute the secure sequence number.
- * The output should be:
- * HASH(sec1,saddr,sport,daddr,dport,sec1) + sseq + (count * 2^24)
- * + (HASH(sec2,saddr,sport,daddr,dport,count,sec2) % 2^24).
- * Where sseq is their sequence number and count increases every
- * minute by 1.
- * As an extra hack, we add a small "data" value that encodes the
- * MSS into the second hash value.
- */
-
- return (cookie_hash(saddr, daddr, sport, dport, 0, 0) +
- sseq + (count << COOKIEBITS) +
- ((cookie_hash(saddr, daddr, sport, dport, count, 1) + data)
- & COOKIEMASK));
-}
-
-/*
- * This retrieves the small "data" value from the syncookie.
- * If the syncookie is bad, the data returned will be out of
- * range. This must be checked by the caller.
- *
- * The count value used to generate the cookie must be within
- * "maxdiff" if the current (passed-in) "count". The return value
- * is (__u32)-1 if this test fails.
- */
-__u32 check_tcp_syn_cookie(__u32 cookie, __u32 saddr, __u32 daddr, __u16 sport,
- __u16 dport, __u32 sseq, __u32 count, __u32 maxdiff)
-{
- __u32 diff;
-
- /* Strip away the layers from the cookie */
- cookie -= cookie_hash(saddr, daddr, sport, dport, 0, 0) + sseq;
-
- /* Cookie is now reduced to (count * 2^24) ^ (hash % 2^24) */
- diff = (count - (cookie >> COOKIEBITS)) & ((__u32)-1 >> COOKIEBITS);
- if (diff >= maxdiff)
- return (__u32)-1;
-
- return (cookie -
- cookie_hash(saddr, daddr, sport, dport, count - diff, 1))
- & COOKIEMASK; /* Leaving the data behind */
-}
-#endif
#endif /* CONFIG_INET */
Index: rnd2/net/ipv4/syncookies.c
===================================================================
--- rnd2.orig/net/ipv4/syncookies.c 2005-01-20 10:15:30.628195094 -0800
+++ rnd2/net/ipv4/syncookies.c 2005-01-20 10:16:24.202364968 -0800
@@ -17,11 +17,88 @@
#include <linux/tcp.h>
#include <linux/slab.h>
#include <linux/random.h>
+#include <linux/cryptohash.h>
#include <linux/kernel.h>
#include <net/tcp.h>

extern int sysctl_tcp_syncookies;

+static __u32 syncookie_secret[2][16-3+SHA_DIGEST_WORDS];
+
+static __init int init_syncookies(void)
+{
+ get_random_bytes(syncookie_secret, sizeof(syncookie_secret));
+ return 0;
+}
+module_init(init_syncookies);
+
+#define COOKIEBITS 24 /* Upper bits store count */
+#define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
+
+static u32 cookie_hash(u32 saddr, u32 daddr, u32 sport, u32 dport,
+ u32 count, int c)
+{
+ __u32 tmp[16 + 5 + SHA_WORKSPACE_WORDS];
+
+ memcpy(tmp + 3, syncookie_secret[c], sizeof(syncookie_secret[c]));
+ tmp[0] = saddr;
+ tmp[1] = daddr;
+ tmp[2] = (sport << 16) + dport;
+ tmp[3] = count;
+ sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
+
+ return tmp[17];
+}
+
+static __u32 secure_tcp_syn_cookie(__u32 saddr, __u32 daddr, __u16 sport,
+ __u16 dport, __u32 sseq, __u32 count,
+ __u32 data)
+{
+ /*
+ * Compute the secure sequence number.
+ * The output should be:
+ * HASH(sec1,saddr,sport,daddr,dport,sec1) + sseq + (count * 2^24)
+ * + (HASH(sec2,saddr,sport,daddr,dport,count,sec2) % 2^24).
+ * Where sseq is their sequence number and count increases every
+ * minute by 1.
+ * As an extra hack, we add a small "data" value that encodes the
+ * MSS into the second hash value.
+ */
+
+ return (cookie_hash(saddr, daddr, sport, dport, 0, 0) +
+ sseq + (count << COOKIEBITS) +
+ ((cookie_hash(saddr, daddr, sport, dport, count, 1) + data)
+ & COOKIEMASK));
+}
+
+/*
+ * This retrieves the small "data" value from the syncookie.
+ * If the syncookie is bad, the data returned will be out of
+ * range. This must be checked by the caller.
+ *
+ * The count value used to generate the cookie must be within
+ * "maxdiff" if the current (passed-in) "count". The return value
+ * is (__u32)-1 if this test fails.
+ */
+static __u32 check_tcp_syn_cookie(__u32 cookie, __u32 saddr, __u32 daddr,
+ __u16 sport, __u16 dport, __u32 sseq,
+ __u32 count, __u32 maxdiff)
+{
+ __u32 diff;
+
+ /* Strip away the layers from the cookie */
+ cookie -= cookie_hash(saddr, daddr, sport, dport, 0, 0) + sseq;
+
+ /* Cookie is now reduced to (count * 2^24) ^ (hash % 2^24) */
+ diff = (count - (cookie >> COOKIEBITS)) & ((__u32) - 1 >> COOKIEBITS);
+ if (diff >= maxdiff)
+ return (__u32)-1;
+
+ return (cookie -
+ cookie_hash(saddr, daddr, sport, dport, count - diff, 1))
+ & COOKIEMASK; /* Leaving the data behind */
+}
+
/*
* This table has to be sorted and terminated with (__u16)-1.
* XXX generate a better table.
Index: rnd2/include/linux/random.h
===================================================================
--- rnd2.orig/include/linux/random.h 2005-01-20 10:15:30.628195094 -0800
+++ rnd2/include/linux/random.h 2005-01-20 10:16:24.202364968 -0800
@@ -55,14 +55,6 @@
extern u32 secure_tcp_port_ephemeral(__u32 saddr, __u32 daddr, __u16 dport);
extern __u32 secure_tcp_sequence_number(__u32 saddr, __u32 daddr,
__u16 sport, __u16 dport);
-extern __u32 secure_tcp_syn_cookie(__u32 saddr, __u32 daddr,
- __u16 sport, __u16 dport,
- __u32 sseq, __u32 count,
- __u32 data);
-extern __u32 check_tcp_syn_cookie(__u32 cookie, __u32 saddr,
- __u32 daddr, __u16 sport,
- __u16 dport, __u32 sseq,
- __u32 count, __u32 maxdiff);
extern __u32 secure_tcpv6_sequence_number(__u32 *saddr, __u32 *daddr,
__u16 sport, __u16 dport);

2005-01-21 21:53:10

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 3/12] random pt4: Kill the SHA variants

Kill the unrolled SHA variants, they're unused and duplicate code in
cryptoapi.

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/drivers/char/random.c
===================================================================
--- rnd2.orig/drivers/char/random.c 2005-01-20 12:22:16.709058715 -0800
+++ rnd2/drivers/char/random.c 2005-01-20 12:28:27.979725732 -0800
@@ -698,9 +698,6 @@
#define EXTRACT_SIZE 10
#define HASH_EXTRA_SIZE 80

-/* Various size/speed tradeoffs are available. Choose 0..3. */
-#define SHA_CODE_SIZE 0
-
/*
* SHA transform algorithm, taken from code written by Peter Gutmann,
* and placed in the public domain.
@@ -720,9 +717,6 @@
#define K3 0x8F1BBCDCL /* Rounds 40-59: sqrt(5) * 2^30 */
#define K4 0xCA62C1D6L /* Rounds 60-79: sqrt(10) * 2^30 */

-#define subRound(a, b, c, d, e, f, k, data) \
- (e += rol32(a, 5) + f(b, c, d) + k + data, b = rol32(b, 30))
-
static void sha_transform(__u32 digest[85], __u32 const data[16])
{
__u32 A, B, C, D, E; /* Local vars */
@@ -750,11 +744,6 @@
E = digest[ 4 ];

/* Heavy mangling, in 4 sub-rounds of 20 iterations each. */
-#if SHA_CODE_SIZE == 0
- /*
- * Approximately 50% of the speed of the largest version, but
- * takes up 1/16 the space. Saves about 6k on an i386 kernel.
- */
for (i = 0; i < 80; i++) {
if (i < 40) {
if (i < 20)
@@ -770,139 +759,6 @@
TEMP += rol32(A, 5) + E + W[i];
E = D; D = C; C = rol32(B, 30); B = A; A = TEMP;
}
-#elif SHA_CODE_SIZE == 1
- for (i = 0; i < 20; i++) {
- TEMP = f1(B, C, D) + K1 + rol32(A, 5) + E + W[i];
- E = D; D = C; C = rol32(B, 30); B = A; A = TEMP;
- }
- for (; i < 40; i++) {
- TEMP = f2(B, C, D) + K2 + rol32(A, 5) + E + W[i];
- E = D; D = C; C = rol32(B, 30); B = A; A = TEMP;
- }
- for (; i < 60; i++) {
- TEMP = f3(B, C, D) + K3 + rol32(A, 5) + E + W[i];
- E = D; D = C; C = rol22(B, 30); B = A; A = TEMP;
- }
- for (; i < 80; i++) {
- TEMP = f4(B, C, D) + K4 + rol32(A, 5) + E + W[i];
- E = D; D = C; C = rol32(B, 30); B = A; A = TEMP;
- }
-#elif SHA_CODE_SIZE == 2
- for (i = 0; i < 20; i += 5) {
- subRound(A, B, C, D, E, f1, K1, W[i ]);
- subRound(E, A, B, C, D, f1, K1, W[i+1]);
- subRound(D, E, A, B, C, f1, K1, W[i+2]);
- subRound(C, D, E, A, B, f1, K1, W[i+3]);
- subRound(B, C, D, E, A, f1, K1, W[i+4]);
- }
- for (; i < 40; i += 5) {
- subRound(A, B, C, D, E, f2, K2, W[i ]);
- subRound(E, A, B, C, D, f2, K2, W[i+1]);
- subRound(D, E, A, B, C, f2, K2, W[i+2]);
- subRound(C, D, E, A, B, f2, K2, W[i+3]);
- subRound(B, C, D, E, A, f2, K2, W[i+4]);
- }
- for (; i < 60; i += 5) {
- subRound(A, B, C, D, E, f3, K3, W[i ]);
- subRound(E, A, B, C, D, f3, K3, W[i+1]);
- subRound(D, E, A, B, C, f3, K3, W[i+2]);
- subRound(C, D, E, A, B, f3, K3, W[i+3]);
- subRound(B, C, D, E, A, f3, K3, W[i+4]);
- }
- for (; i < 80; i += 5) {
- subRound(A, B, C, D, E, f4, K4, W[i ]);
- subRound(E, A, B, C, D, f4, K4, W[i+1]);
- subRound(D, E, A, B, C, f4, K4, W[i+2]);
- subRound(C, D, E, A, B, f4, K4, W[i+3]);
- subRound(B, C, D, E, A, f4, K4, W[i+4]);
- }
-#elif SHA_CODE_SIZE == 3 /* Really large version */
- subRound(A, B, C, D, E, f1, K1, W[ 0]);
- subRound(E, A, B, C, D, f1, K1, W[ 1]);
- subRound(D, E, A, B, C, f1, K1, W[ 2]);
- subRound(C, D, E, A, B, f1, K1, W[ 3]);
- subRound(B, C, D, E, A, f1, K1, W[ 4]);
- subRound(A, B, C, D, E, f1, K1, W[ 5]);
- subRound(E, A, B, C, D, f1, K1, W[ 6]);
- subRound(D, E, A, B, C, f1, K1, W[ 7]);
- subRound(C, D, E, A, B, f1, K1, W[ 8]);
- subRound(B, C, D, E, A, f1, K1, W[ 9]);
- subRound(A, B, C, D, E, f1, K1, W[10]);
- subRound(E, A, B, C, D, f1, K1, W[11]);
- subRound(D, E, A, B, C, f1, K1, W[12]);
- subRound(C, D, E, A, B, f1, K1, W[13]);
- subRound(B, C, D, E, A, f1, K1, W[14]);
- subRound(A, B, C, D, E, f1, K1, W[15]);
- subRound(E, A, B, C, D, f1, K1, W[16]);
- subRound(D, E, A, B, C, f1, K1, W[17]);
- subRound(C, D, E, A, B, f1, K1, W[18]);
- subRound(B, C, D, E, A, f1, K1, W[19]);
-
- subRound(A, B, C, D, E, f2, K2, W[20]);
- subRound(E, A, B, C, D, f2, K2, W[21]);
- subRound(D, E, A, B, C, f2, K2, W[22]);
- subRound(C, D, E, A, B, f2, K2, W[23]);
- subRound(B, C, D, E, A, f2, K2, W[24]);
- subRound(A, B, C, D, E, f2, K2, W[25]);
- subRound(E, A, B, C, D, f2, K2, W[26]);
- subRound(D, E, A, B, C, f2, K2, W[27]);
- subRound(C, D, E, A, B, f2, K2, W[28]);
- subRound(B, C, D, E, A, f2, K2, W[29]);
- subRound(A, B, C, D, E, f2, K2, W[30]);
- subRound(E, A, B, C, D, f2, K2, W[31]);
- subRound(D, E, A, B, C, f2, K2, W[32]);
- subRound(C, D, E, A, B, f2, K2, W[33]);
- subRound(B, C, D, E, A, f2, K2, W[34]);
- subRound(A, B, C, D, E, f2, K2, W[35]);
- subRound(E, A, B, C, D, f2, K2, W[36]);
- subRound(D, E, A, B, C, f2, K2, W[37]);
- subRound(C, D, E, A, B, f2, K2, W[38]);
- subRound(B, C, D, E, A, f2, K2, W[39]);
-
- subRound(A, B, C, D, E, f3, K3, W[40]);
- subRound(E, A, B, C, D, f3, K3, W[41]);
- subRound(D, E, A, B, C, f3, K3, W[42]);
- subRound(C, D, E, A, B, f3, K3, W[43]);
- subRound(B, C, D, E, A, f3, K3, W[44]);
- subRound(A, B, C, D, E, f3, K3, W[45]);
- subRound(E, A, B, C, D, f3, K3, W[46]);
- subRound(D, E, A, B, C, f3, K3, W[47]);
- subRound(C, D, E, A, B, f3, K3, W[48]);
- subRound(B, C, D, E, A, f3, K3, W[49]);
- subRound(A, B, C, D, E, f3, K3, W[50]);
- subRound(E, A, B, C, D, f3, K3, W[51]);
- subRound(D, E, A, B, C, f3, K3, W[52]);
- subRound(C, D, E, A, B, f3, K3, W[53]);
- subRound(B, C, D, E, A, f3, K3, W[54]);
- subRound(A, B, C, D, E, f3, K3, W[55]);
- subRound(E, A, B, C, D, f3, K3, W[56]);
- subRound(D, E, A, B, C, f3, K3, W[57]);
- subRound(C, D, E, A, B, f3, K3, W[58]);
- subRound(B, C, D, E, A, f3, K3, W[59]);
-
- subRound(A, B, C, D, E, f4, K4, W[60]);
- subRound(E, A, B, C, D, f4, K4, W[61]);
- subRound(D, E, A, B, C, f4, K4, W[62]);
- subRound(C, D, E, A, B, f4, K4, W[63]);
- subRound(B, C, D, E, A, f4, K4, W[64]);
- subRound(A, B, C, D, E, f4, K4, W[65]);
- subRound(E, A, B, C, D, f4, K4, W[66]);
- subRound(D, E, A, B, C, f4, K4, W[67]);
- subRound(C, D, E, A, B, f4, K4, W[68]);
- subRound(B, C, D, E, A, f4, K4, W[69]);
- subRound(A, B, C, D, E, f4, K4, W[70]);
- subRound(E, A, B, C, D, f4, K4, W[71]);
- subRound(D, E, A, B, C, f4, K4, W[72]);
- subRound(C, D, E, A, B, f4, K4, W[73]);
- subRound(B, C, D, E, A, f4, K4, W[74]);
- subRound(A, B, C, D, E, f4, K4, W[75]);
- subRound(E, A, B, C, D, f4, K4, W[76]);
- subRound(D, E, A, B, C, f4, K4, W[77]);
- subRound(C, D, E, A, B, f4, K4, W[78]);
- subRound(B, C, D, E, A, f4, K4, W[79]);
-#else
-#error Illegal SHA_CODE_SIZE
-#endif

/* Build message digest */
digest[0] += A;
@@ -923,7 +779,6 @@
#undef K2
#undef K3
#undef K4
-#undef subRound

/*********************************************************************
*

2005-01-21 21:53:09

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 7/12] random pt4: Update cryptolib to use SHA fro lib

Drop the cryptolib SHA implementation and use the faster and much
smaller SHA implementation from lib/. Saves about 5K. This also saves
time by doing one memset per update call rather than one per SHA
block.

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/crypto/sha1.c
===================================================================
--- rnd2.orig/crypto/sha1.c 2005-01-20 12:24:48.135753454 -0800
+++ rnd2/crypto/sha1.c 2005-01-20 12:30:22.143171131 -0800
@@ -4,8 +4,7 @@
* SHA1 Secure Hash Algorithm.
*
* Derived from cryptoapi implementation, adapted for in-place
- * scatterlist interface. Originally based on the public domain
- * implementation written by Steve Reid.
+ * scatterlist interface.
*
* Copyright (c) Alan Smithee.
* Copyright (c) Andrew McDonald <[email protected]>
@@ -13,7 +12,7 @@
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
+ * Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*
*/
@@ -21,84 +20,19 @@
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/crypto.h>
+#include <linux/cryptohash.h>
#include <asm/scatterlist.h>
#include <asm/byteorder.h>

#define SHA1_DIGEST_SIZE 20
#define SHA1_HMAC_BLOCK_SIZE 64

-/* blk0() and blk() perform the initial expand. */
-/* I got the idea of expanding during the round function from SSLeay */
-# define blk0(i) block32[i]
-
-#define blk(i) (block32[i&15] = rol32(block32[(i+13)&15]^block32[(i+8)&15] \
- ^block32[(i+2)&15]^block32[i&15],1))
-
-/* (R0+R1), R2, R3, R4 are the different operations used in SHA1 */
-#define R0(v,w,x,y,z,i) z+=((w&(x^y))^y)+blk0(i)+0x5A827999+rol32(v,5); \
- w=rol32(w,30);
-#define R1(v,w,x,y,z,i) z+=((w&(x^y))^y)+blk(i)+0x5A827999+rol32(v,5); \
- w=rol32(w,30);
-#define R2(v,w,x,y,z,i) z+=(w^x^y)+blk(i)+0x6ED9EBA1+rol32(v,5);w=rol32(w,30);
-#define R3(v,w,x,y,z,i) z+=(((w|x)&y)|(w&x))+blk(i)+0x8F1BBCDC+rol32(v,5); \
- w=rol32(w,30);
-#define R4(v,w,x,y,z,i) z+=(w^x^y)+blk(i)+0xCA62C1D6+rol32(v,5);w=rol32(w,30);
-
struct sha1_ctx {
u64 count;
u32 state[5];
u8 buffer[64];
};

-/* Hash a single 512-bit block. This is the core of the algorithm. */
-static void sha1_transform(u32 *state, const u8 *in)
-{
- u32 a, b, c, d, e;
- u32 block32[16];
-
- /* convert/copy data to workspace */
- for (a = 0; a < sizeof(block32)/sizeof(u32); a++)
- block32[a] = be32_to_cpu (((const u32 *)in)[a]);
-
- /* Copy context->state[] to working vars */
- a = state[0];
- b = state[1];
- c = state[2];
- d = state[3];
- e = state[4];
-
- /* 4 rounds of 20 operations each. Loop unrolled. */
- R0(a,b,c,d,e, 0); R0(e,a,b,c,d, 1); R0(d,e,a,b,c, 2); R0(c,d,e,a,b, 3);
- R0(b,c,d,e,a, 4); R0(a,b,c,d,e, 5); R0(e,a,b,c,d, 6); R0(d,e,a,b,c, 7);
- R0(c,d,e,a,b, 8); R0(b,c,d,e,a, 9); R0(a,b,c,d,e,10); R0(e,a,b,c,d,11);
- R0(d,e,a,b,c,12); R0(c,d,e,a,b,13); R0(b,c,d,e,a,14); R0(a,b,c,d,e,15);
- R1(e,a,b,c,d,16); R1(d,e,a,b,c,17); R1(c,d,e,a,b,18); R1(b,c,d,e,a,19);
- R2(a,b,c,d,e,20); R2(e,a,b,c,d,21); R2(d,e,a,b,c,22); R2(c,d,e,a,b,23);
- R2(b,c,d,e,a,24); R2(a,b,c,d,e,25); R2(e,a,b,c,d,26); R2(d,e,a,b,c,27);
- R2(c,d,e,a,b,28); R2(b,c,d,e,a,29); R2(a,b,c,d,e,30); R2(e,a,b,c,d,31);
- R2(d,e,a,b,c,32); R2(c,d,e,a,b,33); R2(b,c,d,e,a,34); R2(a,b,c,d,e,35);
- R2(e,a,b,c,d,36); R2(d,e,a,b,c,37); R2(c,d,e,a,b,38); R2(b,c,d,e,a,39);
- R3(a,b,c,d,e,40); R3(e,a,b,c,d,41); R3(d,e,a,b,c,42); R3(c,d,e,a,b,43);
- R3(b,c,d,e,a,44); R3(a,b,c,d,e,45); R3(e,a,b,c,d,46); R3(d,e,a,b,c,47);
- R3(c,d,e,a,b,48); R3(b,c,d,e,a,49); R3(a,b,c,d,e,50); R3(e,a,b,c,d,51);
- R3(d,e,a,b,c,52); R3(c,d,e,a,b,53); R3(b,c,d,e,a,54); R3(a,b,c,d,e,55);
- R3(e,a,b,c,d,56); R3(d,e,a,b,c,57); R3(c,d,e,a,b,58); R3(b,c,d,e,a,59);
- R4(a,b,c,d,e,60); R4(e,a,b,c,d,61); R4(d,e,a,b,c,62); R4(c,d,e,a,b,63);
- R4(b,c,d,e,a,64); R4(a,b,c,d,e,65); R4(e,a,b,c,d,66); R4(d,e,a,b,c,67);
- R4(c,d,e,a,b,68); R4(b,c,d,e,a,69); R4(a,b,c,d,e,70); R4(e,a,b,c,d,71);
- R4(d,e,a,b,c,72); R4(c,d,e,a,b,73); R4(b,c,d,e,a,74); R4(a,b,c,d,e,75);
- R4(e,a,b,c,d,76); R4(d,e,a,b,c,77); R4(c,d,e,a,b,78); R4(b,c,d,e,a,79);
- /* Add the working vars back into context.state[] */
- state[0] += a;
- state[1] += b;
- state[2] += c;
- state[3] += d;
- state[4] += e;
- /* Wipe variables */
- a = b = c = d = e = 0;
- memset (block32, 0x00, sizeof block32);
-}
-
static void sha1_init(void *ctx)
{
struct sha1_ctx *sctx = ctx;
@@ -115,19 +49,21 @@
{
struct sha1_ctx *sctx = ctx;
unsigned int i, j;
+ u32 temp[SHA_WORKSPACE_WORDS];

j = (sctx->count >> 3) & 0x3f;
sctx->count += len << 3;

if ((j + len) > 63) {
memcpy(&sctx->buffer[j], data, (i = 64-j));
- sha1_transform(sctx->state, sctx->buffer);
+ sha_transform(sctx->state, sctx->buffer, temp);
for ( ; i + 63 < len; i += 64) {
- sha1_transform(sctx->state, &data[i]);
+ sha_transform(sctx->state, &data[i], temp);
}
j = 0;
}
else i = 0;
+ memset(temp, 0, sizeof(temp));
memcpy(&sctx->buffer[j], &data[i], len - i);
}

2005-01-21 21:58:30

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 6/12] random pt4: Replace SHA with faster version

A replacement SHA routine that's slightly larger, but over twice as
fast. It's also faster and much smaller than the cryptolib version.

size speed buffer size
original: 350B 2.3us 320B
cryptolib: 5776B 1.2us 80B
this code: 466B 1.0us 320B
alternate: 2112B 1.0us 80B

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd/lib/sha1.c
===================================================================
--- rnd.orig/lib/sha1.c 2005-01-12 21:27:15.445196197 -0800
+++ rnd/lib/sha1.c 2005-01-12 21:28:24.051449644 -0800
@@ -1,18 +1,16 @@
/*
- * SHA transform algorithm, taken from code written by Peter Gutmann,
- * and placed in the public domain.
+ * SHA transform algorithm, originally taken from code written by
+ * Peter Gutmann, and placed in the public domain.
*/

#include <linux/kernel.h>
-#include <linux/string.h>
#include <linux/cryptohash.h>

/* The SHA f()-functions. */

-#define f1(x,y,z) (z ^ (x & (y ^ z))) /* Rounds 0-19: x ? y : z */
-#define f2(x,y,z) (x ^ y ^ z) /* Rounds 20-39: XOR */
-#define f3(x,y,z) ((x & y) + (z & (x ^ y))) /* Rounds 40-59: majority */
-#define f4(x,y,z) (x ^ y ^ z) /* Rounds 60-79: XOR */
+#define f1(x,y,z) (z ^ (x & (y ^ z))) /* x ? y : z */
+#define f2(x,y,z) (x ^ y ^ z) /* XOR */
+#define f3(x,y,z) ((x & y) + (z & (x ^ y))) /* majority */

/* The SHA Mysterious Constants */

@@ -26,64 +24,53 @@
*
* @digest: 160 bit digest to update
* @data: 512 bits of data to hash
- * @W: 80 words of workspace
+ * @W: 80 words of workspace, caller should clear
*
* This function generates a SHA1 digest for a single. Be warned, it
* does not handle padding and message digest, do not confuse it with
* the full FIPS 180-1 digest algorithm for variable length messages.
*/
-void sha_transform(__u32 *digest, const char *data, __u32 *W)
+void sha_transform(__u32 *digest, const char *in, __u32 *W)
{
- __u32 A, B, C, D, E;
- __u32 TEMP;
- int i;
+ __u32 a, b, c, d, e, t, i;

- memset(W, 0, sizeof(W));
for (i = 0; i < 16; i++)
- W[i] = be32_to_cpu(((const __u32 *)data)[i]);
- /*
- * Do the preliminary expansion of 16 to 80 words. Doing it
- * out-of-line line this is faster than doing it in-line on
- * register-starved machines like the x86, and not really any
- * slower on real processors.
- */
- for (i = 0; i < 64; i++) {
- TEMP = W[i] ^ W[i+2] ^ W[i+8] ^ W[i+13];
- W[i+16] = rol32(TEMP, 1);
+ W[i] = be32_to_cpu(((const __u32 *)in)[i]);
+
+ for (i = 0; i < 64; i++)
+ W[i+16] = rol32(W[i+13] ^ W[i+8] ^ W[i+2] ^ W[i], 1);
+
+ a = digest[0];
+ b = digest[1];
+ c = digest[2];
+ d = digest[3];
+ e = digest[4];
+
+ for (i = 0; i < 20; i++) {
+ t = f1(b, c, d) + K1 + rol32(a, 5) + e + W[i];
+ e = d; d = c; c = rol32(b, 30); b = a; a = t;
}

- /* Set up first buffer and local data buffer */
- A = digest[ 0 ];
- B = digest[ 1 ];
- C = digest[ 2 ];
- D = digest[ 3 ];
- E = digest[ 4 ];
-
- /* Heavy mangling, in 4 sub-rounds of 20 iterations each. */
- for (i = 0; i < 80; i++) {
- if (i < 40) {
- if (i < 20)
- TEMP = f1(B, C, D) + K1;
- else
- TEMP = f2(B, C, D) + K2;
- } else {
- if (i < 60)
- TEMP = f3(B, C, D) + K3;
- else
- TEMP = f4(B, C, D) + K4;
- }
- TEMP += rol32(A, 5) + E + W[i];
- E = D; D = C; C = rol32(B, 30); B = A; A = TEMP;
+ for (; i < 40; i ++) {
+ t = f2(b, c, d) + K2 + rol32(a, 5) + e + W[i];
+ e = d; d = c; c = rol32(b, 30); b = a; a = t;
}

- /* Build message digest */
- digest[0] += A;
- digest[1] += B;
- digest[2] += C;
- digest[3] += D;
- digest[4] += E;
+ for (; i < 60; i ++) {
+ t = f3(b, c, d) + K3 + rol32(a, 5) + e + W[i];
+ e = d; d = c; c = rol32(b, 30); b = a; a = t;
+ }
+
+ for (; i < 80; i ++) {
+ t = f2(b, c, d) + K4 + rol32(a, 5) + e + W[i];
+ e = d; d = c; c = rol32(b, 30); b = a; a = t;
+ }

- /* W is wiped by the caller */
+ digest[0] += a;
+ digest[1] += b;
+ digest[2] += c;
+ digest[3] += d;
+ digest[4] += e;
}

/*

2005-01-21 22:02:27

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 8/12] random pt4: Move halfmd4 to lib

Move half-MD4 hash to /lib where we can share it with htree.

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/drivers/char/random.c
===================================================================
--- rnd2.orig/drivers/char/random.c 2005-01-20 09:42:08.922390869 -0800
+++ rnd2/drivers/char/random.c 2005-01-20 09:50:02.465019321 -0800
@@ -1325,47 +1325,6 @@
#define K2 013240474631UL
#define K3 015666365641UL

-/*
- * Basic cut-down MD4 transform. Returns only 32 bits of result.
- */
-static __u32 halfMD4Transform (__u32 const buf[4], __u32 const in[8])
-{
- __u32 a = buf[0], b = buf[1], c = buf[2], d = buf[3];
-
- /* Round 1 */
- ROUND(F, a, b, c, d, in[0] + K1, 3);
- ROUND(F, d, a, b, c, in[1] + K1, 7);
- ROUND(F, c, d, a, b, in[2] + K1, 11);
- ROUND(F, b, c, d, a, in[3] + K1, 19);
- ROUND(F, a, b, c, d, in[4] + K1, 3);
- ROUND(F, d, a, b, c, in[5] + K1, 7);
- ROUND(F, c, d, a, b, in[6] + K1, 11);
- ROUND(F, b, c, d, a, in[7] + K1, 19);
-
- /* Round 2 */
- ROUND(G, a, b, c, d, in[1] + K2, 3);
- ROUND(G, d, a, b, c, in[3] + K2, 5);
- ROUND(G, c, d, a, b, in[5] + K2, 9);
- ROUND(G, b, c, d, a, in[7] + K2, 13);
- ROUND(G, a, b, c, d, in[0] + K2, 3);
- ROUND(G, d, a, b, c, in[2] + K2, 5);
- ROUND(G, c, d, a, b, in[4] + K2, 9);
- ROUND(G, b, c, d, a, in[6] + K2, 13);
-
- /* Round 3 */
- ROUND(H, a, b, c, d, in[3] + K3, 3);
- ROUND(H, d, a, b, c, in[7] + K3, 9);
- ROUND(H, c, d, a, b, in[2] + K3, 11);
- ROUND(H, b, c, d, a, in[6] + K3, 15);
- ROUND(H, a, b, c, d, in[1] + K3, 3);
- ROUND(H, d, a, b, c, in[5] + K3, 9);
- ROUND(H, c, d, a, b, in[0] + K3, 11);
- ROUND(H, b, c, d, a, in[4] + K3, 15);
-
- return buf[1] + b; /* "most hashed" word */
- /* Alternative: return sum of all words? */
-}
-
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

static __u32 twothirdsMD4Transform (__u32 const buf[4], __u32 const in[12])
@@ -1550,7 +1509,7 @@
hash[2]=(sport << 16) + dport;
hash[3]=keyptr->secret[11];

- seq = halfMD4Transform(hash, keyptr->secret) & HASH_MASK;
+ seq = half_md4_transform(hash, keyptr->secret) & HASH_MASK;
seq += keyptr->count;
/*
* As close as possible to RFC 793, which
@@ -1591,7 +1550,7 @@
hash[2] = keyptr->secret[10];
hash[3] = keyptr->secret[11];

- return halfMD4Transform(hash, keyptr->secret);
+ return half_md4_transform(hash, keyptr->secret);
}

/* Generate secure starting point for ephemeral TCP port search */
@@ -1609,7 +1568,7 @@
hash[2] = dport ^ keyptr->secret[10];
hash[3] = keyptr->secret[11];

- return halfMD4Transform(hash, keyptr->secret);
+ return half_md4_transform(hash, keyptr->secret);
}

#ifdef CONFIG_SYN_COOKIES
Index: rnd2/lib/halfmd4.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ rnd2/lib/halfmd4.c 2005-01-20 09:47:02.741932065 -0800
@@ -0,0 +1,61 @@
+#include <linux/kernel.h>
+#include <linux/cryptohash.h>
+
+/* F, G and H are basic MD4 functions: selection, majority, parity */
+#define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
+#define G(x, y, z) (((x) & (y)) + (((x) ^ (y)) & (z)))
+#define H(x, y, z) ((x) ^ (y) ^ (z))
+
+/*
+ * The generic round function. The application is so specific that
+ * we don't bother protecting all the arguments with parens, as is generally
+ * good macro practice, in favor of extra legibility.
+ * Rotation is separate from addition to prevent recomputation
+ */
+#define ROUND(f, a, b, c, d, x, s) \
+ (a += f(b, c, d) + x, a = (a << s) | (a >> (32 - s)))
+#define K1 0
+#define K2 013240474631UL
+#define K3 015666365641UL
+
+/*
+ * Basic cut-down MD4 transform. Returns only 32 bits of result.
+ */
+__u32 half_md4_transform(__u32 const buf[4], __u32 const in[8])
+{
+ __u32 a = buf[0], b = buf[1], c = buf[2], d = buf[3];
+
+ /* Round 1 */
+ ROUND(F, a, b, c, d, in[0] + K1, 3);
+ ROUND(F, d, a, b, c, in[1] + K1, 7);
+ ROUND(F, c, d, a, b, in[2] + K1, 11);
+ ROUND(F, b, c, d, a, in[3] + K1, 19);
+ ROUND(F, a, b, c, d, in[4] + K1, 3);
+ ROUND(F, d, a, b, c, in[5] + K1, 7);
+ ROUND(F, c, d, a, b, in[6] + K1, 11);
+ ROUND(F, b, c, d, a, in[7] + K1, 19);
+
+ /* Round 2 */
+ ROUND(G, a, b, c, d, in[1] + K2, 3);
+ ROUND(G, d, a, b, c, in[3] + K2, 5);
+ ROUND(G, c, d, a, b, in[5] + K2, 9);
+ ROUND(G, b, c, d, a, in[7] + K2, 13);
+ ROUND(G, a, b, c, d, in[0] + K2, 3);
+ ROUND(G, d, a, b, c, in[2] + K2, 5);
+ ROUND(G, c, d, a, b, in[4] + K2, 9);
+ ROUND(G, b, c, d, a, in[6] + K2, 13);
+
+ /* Round 3 */
+ ROUND(H, a, b, c, d, in[3] + K3, 3);
+ ROUND(H, d, a, b, c, in[7] + K3, 9);
+ ROUND(H, c, d, a, b, in[2] + K3, 11);
+ ROUND(H, b, c, d, a, in[6] + K3, 15);
+ ROUND(H, a, b, c, d, in[1] + K3, 3);
+ ROUND(H, d, a, b, c, in[5] + K3, 9);
+ ROUND(H, c, d, a, b, in[0] + K3, 11);
+ ROUND(H, b, c, d, a, in[4] + K3, 15);
+
+ return buf[1] + b; /* "most hashed" word */
+ /* Alternative: return sum of all words? */
+}
+
Index: rnd2/lib/Makefile
===================================================================
--- rnd2.orig/lib/Makefile 2005-01-20 09:42:08.920391124 -0800
+++ rnd2/lib/Makefile 2005-01-20 09:47:44.147653284 -0800
@@ -5,7 +5,8 @@
lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \
bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \
kobject.o kref.o idr.o div64.o parser.o int_sqrt.o \
- bitmap.o extable.o kobject_uevent.o prio_tree.o sha1.o
+ bitmap.o extable.o kobject_uevent.o prio_tree.o \
+ sha1.o halfmd4.o

ifeq ($(CONFIG_DEBUG_KOBJECT),y)
CFLAGS_kobject.o += -DDEBUG
Index: rnd2/include/linux/cryptohash.h
===================================================================
--- rnd2.orig/include/linux/cryptohash.h 2005-01-20 09:42:09.077371110 -0800
+++ rnd2/include/linux/cryptohash.h 2005-01-20 09:47:02.986900834 -0800
@@ -7,4 +7,6 @@
void sha_init(__u32 *buf);
void sha_transform(__u32 *digest, const char *data, __u32 *W);

+__u32 half_md4_transform(__u32 const buf[4], __u32 const in[8]);
+
#endif

2005-01-21 22:02:29

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 10/12] random pt4: Simplify and shrink syncookie code

Simplify syncookie initialization
Refactor syncookie code with separate hash function

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/drivers/char/random.c
===================================================================
--- rnd2.orig/drivers/char/random.c 2005-01-20 10:12:26.633652402 -0800
+++ rnd2/drivers/char/random.c 2005-01-20 10:16:13.830687244 -0800
@@ -366,6 +366,10 @@
* hash; hash collisions will occur no more often than chance.
*/

+#ifdef CONFIG_SYN_COOKIES
+static __u32 syncookie_secret[2][16-3+SHA_WORKSPACE_WORDS];
+#endif
+
/*
* Static global variables
*/
@@ -897,6 +901,9 @@
init_std_data(&input_pool);
init_std_data(&blocking_pool);
init_std_data(&nonblocking_pool);
+#ifdef CONFIG_SYN_COOKIES
+ get_random_bytes(syncookie_secret, sizeof(syncookie_secret));
+#endif
return 0;
}
module_init(rand_initialize);
@@ -1583,23 +1590,24 @@
#define COOKIEBITS 24 /* Upper bits store count */
#define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)

-static int syncookie_init;
-static __u32 syncookie_secret[2][16-3+SHA_DIGEST_WORDS];
-
-__u32 secure_tcp_syn_cookie(__u32 saddr, __u32 daddr, __u16 sport,
- __u16 dport, __u32 sseq, __u32 count, __u32 data)
+static u32 cookie_hash(u32 saddr, u32 daddr, u32 sport, u32 dport,
+ u32 count, int c)
{
__u32 tmp[16 + 5 + SHA_WORKSPACE_WORDS];
- __u32 seq;

- /*
- * Pick two random secrets the first time we need a cookie.
- */
- if (syncookie_init == 0) {
- get_random_bytes(syncookie_secret, sizeof(syncookie_secret));
- syncookie_init = 1;
- }
+ memcpy(tmp + 3, syncookie_secret[c], sizeof(syncookie_secret[c]));
+ tmp[0] = saddr;
+ tmp[1] = daddr;
+ tmp[2] = (sport << 16) + dport;
+ tmp[3] = count;
+ sha_transform(tmp + 16, tmp);

+ return tmp[17];
+}
+
+__u32 secure_tcp_syn_cookie(__u32 saddr, __u32 daddr, __u16 sport,
+ __u16 dport, __u32 sseq, __u32 count, __u32 data)
+{
/*
* Compute the secure sequence number.
* The output should be:
@@ -1611,22 +1619,10 @@
* MSS into the second hash value.
*/

- memcpy(tmp + 3, syncookie_secret[0], sizeof(syncookie_secret[0]));
- tmp[0]=saddr;
- tmp[1]=daddr;
- tmp[2]=(sport << 16) + dport;
- sha_transform(tmp+16, (__u8 *)tmp, tmp + 16 + 5);
- seq = tmp[17] + sseq + (count << COOKIEBITS);
-
- memcpy(tmp + 3, syncookie_secret[1], sizeof(syncookie_secret[1]));
- tmp[0]=saddr;
- tmp[1]=daddr;
- tmp[2]=(sport << 16) + dport;
- tmp[3] = count; /* minute counter */
- sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
-
- /* Add in the second hash and the data */
- return seq + ((tmp[17] + data) & COOKIEMASK);
+ return (cookie_hash(saddr, daddr, sport, dport, 0, 0) +
+ sseq + (count << COOKIEBITS) +
+ ((cookie_hash(saddr, daddr, sport, dport, count, 1) + data)
+ & COOKIEMASK));
}

/*
@@ -1641,33 +1637,19 @@
__u32 check_tcp_syn_cookie(__u32 cookie, __u32 saddr, __u32 daddr, __u16 sport,
__u16 dport, __u32 sseq, __u32 count, __u32 maxdiff)
{
- __u32 tmp[16 + 5 + SHA_WORKSPACE_WORDS];
__u32 diff;

- if (syncookie_init == 0)
- return (__u32)-1; /* Well, duh! */
-
/* Strip away the layers from the cookie */
- memcpy(tmp + 3, syncookie_secret[0], sizeof(syncookie_secret[0]));
- tmp[0]=saddr;
- tmp[1]=daddr;
- tmp[2]=(sport << 16) + dport;
- sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
- cookie -= tmp[17] + sseq;
- /* Cookie is now reduced to (count * 2^24) ^ (hash % 2^24) */
+ cookie -= cookie_hash(saddr, daddr, sport, dport, 0, 0) + sseq;

+ /* Cookie is now reduced to (count * 2^24) ^ (hash % 2^24) */
diff = (count - (cookie >> COOKIEBITS)) & ((__u32)-1 >> COOKIEBITS);
if (diff >= maxdiff)
return (__u32)-1;

- memcpy(tmp+3, syncookie_secret[1], sizeof(syncookie_secret[1]));
- tmp[0] = saddr;
- tmp[1] = daddr;
- tmp[2] = (sport << 16) + dport;
- tmp[3] = count - diff; /* minute counter */
- sha_transform(tmp + 16, tmp);
-
- return (cookie - tmp[17]) & COOKIEMASK; /* Leaving the data behind */
+ return (cookie -
+ cookie_hash(saddr, daddr, sport, dport, count - diff, 1))
+ & COOKIEMASK; /* Leaving the data behind */
}
#endif
#endif /* CONFIG_INET */

2005-01-21 22:06:55

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 1/12] random pt4: Create new rol32/ror32 bitops

Add rol32 and ror32 bitops to bitops.h

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/drivers/char/random.c
===================================================================
--- rnd2.orig/drivers/char/random.c 2005-01-19 22:59:59.000000000 -0800
+++ rnd2/drivers/char/random.c 2005-01-20 09:18:47.000000000 -0800
@@ -374,11 +374,6 @@
static DECLARE_WAIT_QUEUE_HEAD(random_read_wait);
static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);

-static inline __u32 rol32(__u32 word, int shift)
-{
- return (word << shift) | (word >> (32 - shift));
-}
-
#if 0
static int debug = 0;
module_param(debug, bool, 0644);
Index: rnd2/include/linux/bitops.h
===================================================================
--- rnd2.orig/include/linux/bitops.h 2005-01-19 22:57:54.000000000 -0800
+++ rnd2/include/linux/bitops.h 2005-01-20 09:20:24.641660815 -0800
@@ -134,4 +134,26 @@
return sizeof(w) == 4 ? generic_hweight32(w) : generic_hweight64(w);
}

+/*
+ * rol32 - rotate a 32-bit value left
+ *
+ * @word: value to rotate
+ * @shift: bits to roll
+ */
+static inline __u32 rol32(__u32 word, int shift)
+{
+ return (word << shift) | (word >> (32 - shift));
+}
+
+/*
+ * ror32 - rotate a 32-bit value right
+ *
+ * @word: value to rotate
+ * @shift: bits to roll
+ */
+static inline __u32 ror32(__u32 word, int shift)
+{
+ return (word >> shift) | (word << (32 - shift));
+}
+
#endif

2005-01-21 22:03:29

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 2/12] random pt4: Use them throughout the tree

Move users of private rotl/rotr functions to rol32/ror32. Crypto bits
verified with tcrypt.

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/crypto/michael_mic.c
===================================================================
--- rnd2.orig/crypto/michael_mic.c 2004-04-03 19:37:36.000000000 -0800
+++ rnd2/crypto/michael_mic.c 2005-01-20 09:20:50.000000000 -0800
@@ -24,18 +24,6 @@
};


-static inline u32 rotl(u32 val, int bits)
-{
- return (val << bits) | (val >> (32 - bits));
-}
-
-
-static inline u32 rotr(u32 val, int bits)
-{
- return (val >> bits) | (val << (32 - bits));
-}
-
-
static inline u32 xswap(u32 val)
{
return ((val & 0x00ff00ff) << 8) | ((val & 0xff00ff00) >> 8);
@@ -44,13 +32,13 @@

#define michael_block(l, r) \
do { \
- r ^= rotl(l, 17); \
+ r ^= rol32(l, 17); \
l += r; \
r ^= xswap(l); \
l += r; \
- r ^= rotl(l, 3); \
+ r ^= rol32(l, 3); \
l += r; \
- r ^= rotr(l, 2); \
+ r ^= ror32(l, 2); \
l += r; \
} while (0)

Index: rnd2/fs/ncpfs/ncpsign_kernel.c
===================================================================
--- rnd2.orig/fs/ncpfs/ncpsign_kernel.c 2004-04-03 19:36:55.000000000 -0800
+++ rnd2/fs/ncpfs/ncpsign_kernel.c 2005-01-20 09:20:50.000000000 -0800
@@ -11,10 +11,9 @@

#include <linux/string.h>
#include <linux/ncp.h>
+#include <linux/bitops.h>
#include "ncpsign_kernel.h"

-#define rol32(i,c) (((((i)&0xffffffff)<<c)&0xffffffff)| \
- (((i)&0xffffffff)>>(32-c)))
/* i386: 32-bit, little endian, handles mis-alignment */
#ifdef __i386__
#define GET_LE32(p) (*(int *)(p))
Index: rnd2/crypto/cast5.c
===================================================================
--- rnd2.orig/crypto/cast5.c 2004-04-03 19:37:07.000000000 -0800
+++ rnd2/crypto/cast5.c 2005-01-20 09:20:50.000000000 -0800
@@ -567,14 +567,11 @@
0xeaee6801, 0x8db2a283, 0xea8bf59e
};

-
-#define rol(n,x) ( ((x) << (n)) | ((x) >> (32-(n))) )
-
-#define F1(D,m,r) ( (I = ((m) + (D))), (I=rol((r),I)), \
+#define F1(D,m,r) ( (I = ((m) + (D))), (I=rol32(I,(r))), \
(((s1[I >> 24] ^ s2[(I>>16)&0xff]) - s3[(I>>8)&0xff]) + s4[I&0xff]) )
-#define F2(D,m,r) ( (I = ((m) ^ (D))), (I=rol((r),I)), \
+#define F2(D,m,r) ( (I = ((m) ^ (D))), (I=rol32(I,(r))), \
(((s1[I >> 24] - s2[(I>>16)&0xff]) + s3[(I>>8)&0xff]) ^ s4[I&0xff]) )
-#define F3(D,m,r) ( (I = ((m) - (D))), (I=rol((r),I)), \
+#define F3(D,m,r) ( (I = ((m) - (D))), (I=rol32(I,(r))), \
(((s1[I >> 24] + s2[(I>>16)&0xff]) ^ s3[(I>>8)&0xff]) - s4[I&0xff]) )


Index: rnd2/crypto/aes.c
===================================================================
--- rnd2.orig/crypto/aes.c 2005-01-19 22:50:34.000000000 -0800
+++ rnd2/crypto/aes.c 2005-01-20 09:20:50.000000000 -0800
@@ -64,23 +64,6 @@

#define AES_BLOCK_SIZE 16

-static inline
-u32 generic_rotr32 (const u32 x, const unsigned bits)
-{
- const unsigned n = bits % 32;
- return (x >> n) | (x << (32 - n));
-}
-
-static inline
-u32 generic_rotl32 (const u32 x, const unsigned bits)
-{
- const unsigned n = bits % 32;
- return (x << n) | (x >> (32 - n));
-}
-
-#define rotl generic_rotl32
-#define rotr generic_rotr32
-
/*
* #define byte(x, nr) ((unsigned char)((x) >> (nr*8)))
*/
@@ -191,26 +174,26 @@

t = p;
fl_tab[0][i] = t;
- fl_tab[1][i] = rotl (t, 8);
- fl_tab[2][i] = rotl (t, 16);
- fl_tab[3][i] = rotl (t, 24);
+ fl_tab[1][i] = rol32(t, 8);
+ fl_tab[2][i] = rol32(t, 16);
+ fl_tab[3][i] = rol32(t, 24);

t = ((u32) ff_mult (2, p)) |
((u32) p << 8) |
((u32) p << 16) | ((u32) ff_mult (3, p) << 24);

ft_tab[0][i] = t;
- ft_tab[1][i] = rotl (t, 8);
- ft_tab[2][i] = rotl (t, 16);
- ft_tab[3][i] = rotl (t, 24);
+ ft_tab[1][i] = rol32(t, 8);
+ ft_tab[2][i] = rol32(t, 16);
+ ft_tab[3][i] = rol32(t, 24);

p = isb_tab[i];

t = p;
il_tab[0][i] = t;
- il_tab[1][i] = rotl (t, 8);
- il_tab[2][i] = rotl (t, 16);
- il_tab[3][i] = rotl (t, 24);
+ il_tab[1][i] = rol32(t, 8);
+ il_tab[2][i] = rol32(t, 16);
+ il_tab[3][i] = rol32(t, 24);

t = ((u32) ff_mult (14, p)) |
((u32) ff_mult (9, p) << 8) |
@@ -218,9 +201,9 @@
((u32) ff_mult (11, p) << 24);

it_tab[0][i] = t;
- it_tab[1][i] = rotl (t, 8);
- it_tab[2][i] = rotl (t, 16);
- it_tab[3][i] = rotl (t, 24);
+ it_tab[1][i] = rol32(t, 8);
+ it_tab[2][i] = rol32(t, 16);
+ it_tab[3][i] = rol32(t, 24);
}
}

@@ -232,14 +215,14 @@
w = star_x(v); \
t = w ^ (x); \
(y) = u ^ v ^ w; \
- (y) ^= rotr(u ^ t, 8) ^ \
- rotr(v ^ t, 16) ^ \
- rotr(t,24)
+ (y) ^= ror32(u ^ t, 8) ^ \
+ ror32(v ^ t, 16) ^ \
+ ror32(t,24)

/* initialise the key schedule from the user supplied key */

#define loop4(i) \
-{ t = rotr(t, 8); t = ls_box(t) ^ rco_tab[i]; \
+{ t = ror32(t, 8); t = ls_box(t) ^ rco_tab[i]; \
t ^= E_KEY[4 * i]; E_KEY[4 * i + 4] = t; \
t ^= E_KEY[4 * i + 1]; E_KEY[4 * i + 5] = t; \
t ^= E_KEY[4 * i + 2]; E_KEY[4 * i + 6] = t; \
@@ -247,7 +230,7 @@
}

#define loop6(i) \
-{ t = rotr(t, 8); t = ls_box(t) ^ rco_tab[i]; \
+{ t = ror32(t, 8); t = ls_box(t) ^ rco_tab[i]; \
t ^= E_KEY[6 * i]; E_KEY[6 * i + 6] = t; \
t ^= E_KEY[6 * i + 1]; E_KEY[6 * i + 7] = t; \
t ^= E_KEY[6 * i + 2]; E_KEY[6 * i + 8] = t; \
@@ -257,7 +240,7 @@
}

#define loop8(i) \
-{ t = rotr(t, 8); ; t = ls_box(t) ^ rco_tab[i]; \
+{ t = ror32(t, 8); ; t = ls_box(t) ^ rco_tab[i]; \
t ^= E_KEY[8 * i]; E_KEY[8 * i + 8] = t; \
t ^= E_KEY[8 * i + 1]; E_KEY[8 * i + 9] = t; \
t ^= E_KEY[8 * i + 2]; E_KEY[8 * i + 10] = t; \
Index: rnd2/crypto/cast6.c
===================================================================
--- rnd2.orig/crypto/cast6.c 2004-04-03 19:36:27.000000000 -0800
+++ rnd2/crypto/cast6.c 2005-01-20 09:20:50.000000000 -0800
@@ -33,13 +33,11 @@
u8 Kr[12][4];
};

-#define rol(n,x) ( ((x) << (n)) | ((x) >> (32-(n))) )
-
-#define F1(D,r,m) ( (I = ((m) + (D))), (I=rol((r),I)), \
+#define F1(D,r,m) ( (I = ((m) + (D))), (I=rol32(I,(r))), \
(((s1[I >> 24] ^ s2[(I>>16)&0xff]) - s3[(I>>8)&0xff]) + s4[I&0xff]) )
-#define F2(D,r,m) ( (I = ((m) ^ (D))), (I=rol((r),I)), \
+#define F2(D,r,m) ( (I = ((m) ^ (D))), (I=rol32(I,(r))), \
(((s1[I >> 24] - s2[(I>>16)&0xff]) + s3[(I>>8)&0xff]) ^ s4[I&0xff]) )
-#define F3(D,r,m) ( (I = ((m) - (D))), (I=rol((r),I)), \
+#define F3(D,r,m) ( (I = ((m) - (D))), (I=rol32(I,(r))), \
(((s1[I >> 24] + s2[(I>>16)&0xff]) ^ s3[(I>>8)&0xff]) - s4[I&0xff]) )

static const u32 s1[256] = {
Index: rnd2/fs/xfs/xfs_da_btree.c
===================================================================
--- rnd2.orig/fs/xfs/xfs_da_btree.c 2005-01-19 22:51:17.000000000 -0800
+++ rnd2/fs/xfs/xfs_da_btree.c 2005-01-20 09:20:51.000000000 -0800
@@ -1627,40 +1627,38 @@
{
xfs_dahash_t hash;

-#define ROTL(x,y) (((x) << (y)) | ((x) >> (32 - (y))))
#ifdef SLOWVERSION
/*
* This is the old one-byte-at-a-time version.
*/
- for (hash = 0; namelen > 0; namelen--) {
- hash = *name++ ^ ROTL(hash, 7);
- }
+ for (hash = 0; namelen > 0; namelen--)
+ hash = *name++ ^ rol32(hash, 7);
+
return(hash);
#else
/*
* Do four characters at a time as long as we can.
*/
- for (hash = 0; namelen >= 4; namelen -= 4, name += 4) {
+ for (hash = 0; namelen >= 4; namelen -= 4, name += 4)
hash = (name[0] << 21) ^ (name[1] << 14) ^ (name[2] << 7) ^
- (name[3] << 0) ^ ROTL(hash, 7 * 4);
- }
+ (name[3] << 0) ^ rol32(hash, 7 * 4);
+
/*
* Now do the rest of the characters.
*/
switch (namelen) {
case 3:
return (name[0] << 14) ^ (name[1] << 7) ^ (name[2] << 0) ^
- ROTL(hash, 7 * 3);
+ rol32(hash, 7 * 3);
case 2:
- return (name[0] << 7) ^ (name[1] << 0) ^ ROTL(hash, 7 * 2);
+ return (name[0] << 7) ^ (name[1] << 0) ^ rol32(hash, 7 * 2);
case 1:
- return (name[0] << 0) ^ ROTL(hash, 7 * 1);
+ return (name[0] << 0) ^ rol32(hash, 7 * 1);
case 0:
return hash;
}
/* NOTREACHED */
#endif
-#undef ROTL
return 0; /* keep gcc happy */
}

Index: rnd2/crypto/serpent.c
===================================================================
--- rnd2.orig/crypto/serpent.c 2005-01-19 22:52:24.000000000 -0800
+++ rnd2/crypto/serpent.c 2005-01-20 14:28:18.000000000 -0800
@@ -31,11 +31,9 @@
#define SERPENT_BLOCK_SIZE 16

#define PHI 0x9e3779b9UL
-#define ROL(x,r) ((x) = ((x) << (r)) | ((x) >> (32-(r))))
-#define ROR(x,r) ((x) = ((x) >> (r)) | ((x) << (32-(r))))

#define keyiter(a,b,c,d,i,j) \
- b ^= d; b ^= c; b ^= a; b ^= PHI ^ i; ROL(b,11); k[j] = b;
+ b ^= d; b ^= c; b ^= a; b ^= PHI ^ i; b = rol32(b,11); k[j] = b;

#define loadkeys(x0,x1,x2,x3,i) \
x0=k[i]; x1=k[i+1]; x2=k[i+2]; x3=k[i+3];
@@ -48,24 +46,24 @@
x1 ^= k[4*(i)+1]; x0 ^= k[4*(i)+0];

#define LK(x0,x1,x2,x3,x4,i) \
- ROL(x0,13); \
- ROL(x2,3); x1 ^= x0; x4 = x0 << 3; \
+ x0=rol32(x0,13);\
+ x2=rol32(x2,3); x1 ^= x0; x4 = x0 << 3; \
x3 ^= x2; x1 ^= x2; \
- ROL(x1,1); x3 ^= x4; \
- ROL(x3,7); x4 = x1; \
+ x1=rol32(x1,1); x3 ^= x4; \
+ x3=rol32(x3,7); x4 = x1; \
x0 ^= x1; x4 <<= 7; x2 ^= x3; \
x0 ^= x3; x2 ^= x4; x3 ^= k[4*i+3]; \
- x1 ^= k[4*i+1]; ROL(x0,5); ROL(x2,22); \
+ x1 ^= k[4*i+1]; x0=rol32(x0,5); x2=rol32(x2,22);\
x0 ^= k[4*i+0]; x2 ^= k[4*i+2];

#define KL(x0,x1,x2,x3,x4,i) \
x0 ^= k[4*i+0]; x1 ^= k[4*i+1]; x2 ^= k[4*i+2]; \
- x3 ^= k[4*i+3]; ROR(x0,5); ROR(x2,22); \
+ x3 ^= k[4*i+3]; x0=ror32(x0,5); x2=ror32(x2,22);\
x4 = x1; x2 ^= x3; x0 ^= x3; \
- x4 <<= 7; x0 ^= x1; ROR(x1,1); \
- x2 ^= x4; ROR(x3,7); x4 = x0 << 3; \
- x1 ^= x0; x3 ^= x4; ROR(x0,13); \
- x1 ^= x2; x3 ^= x2; ROR(x2,3);
+ x4 <<= 7; x0 ^= x1; x1=ror32(x1,1); \
+ x2 ^= x4; x3=ror32(x3,7); x4 = x0 << 3; \
+ x1 ^= x0; x3 ^= x4; x0=ror32(x0,13);\
+ x1 ^= x2; x3 ^= x2; x2=ror32(x2,3);

#define S0(x0,x1,x2,x3,x4) \
x4 = x3; \
Index: rnd2/crypto/sha1.c
===================================================================
--- rnd2.orig/crypto/sha1.c 2005-01-20 12:20:13.000000000 -0800
+++ rnd2/crypto/sha1.c 2005-01-20 14:28:35.000000000 -0800
@@ -27,27 +27,22 @@
#define SHA1_DIGEST_SIZE 20
#define SHA1_HMAC_BLOCK_SIZE 64

-static inline u32 rol(u32 value, u32 bits)
-{
- return (((value) << (bits)) | ((value) >> (32 - (bits))));
-}
-
/* blk0() and blk() perform the initial expand. */
/* I got the idea of expanding during the round function from SSLeay */
# define blk0(i) block32[i]

-#define blk(i) (block32[i&15] = rol(block32[(i+13)&15]^block32[(i+8)&15] \
+#define blk(i) (block32[i&15] = rol32(block32[(i+13)&15]^block32[(i+8)&15] \
^block32[(i+2)&15]^block32[i&15],1))

/* (R0+R1), R2, R3, R4 are the different operations used in SHA1 */
-#define R0(v,w,x,y,z,i) z+=((w&(x^y))^y)+blk0(i)+0x5A827999+rol(v,5); \
- w=rol(w,30);
-#define R1(v,w,x,y,z,i) z+=((w&(x^y))^y)+blk(i)+0x5A827999+rol(v,5); \
- w=rol(w,30);
-#define R2(v,w,x,y,z,i) z+=(w^x^y)+blk(i)+0x6ED9EBA1+rol(v,5);w=rol(w,30);
-#define R3(v,w,x,y,z,i) z+=(((w|x)&y)|(w&x))+blk(i)+0x8F1BBCDC+rol(v,5); \
- w=rol(w,30);
-#define R4(v,w,x,y,z,i) z+=(w^x^y)+blk(i)+0xCA62C1D6+rol(v,5);w=rol(w,30);
+#define R0(v,w,x,y,z,i) z+=((w&(x^y))^y)+blk0(i)+0x5A827999+rol32(v,5); \
+ w=rol32(w,30);
+#define R1(v,w,x,y,z,i) z+=((w&(x^y))^y)+blk(i)+0x5A827999+rol32(v,5); \
+ w=rol32(w,30);
+#define R2(v,w,x,y,z,i) z+=(w^x^y)+blk(i)+0x6ED9EBA1+rol32(v,5);w=rol32(w,30);
+#define R3(v,w,x,y,z,i) z+=(((w|x)&y)|(w&x))+blk(i)+0x8F1BBCDC+rol32(v,5); \
+ w=rol32(w,30);
+#define R4(v,w,x,y,z,i) z+=(w^x^y)+blk(i)+0xCA62C1D6+rol32(v,5);w=rol32(w,30);

struct sha1_ctx {
u64 count;
Index: rnd2/crypto/sha256.c
===================================================================
--- rnd2.orig/crypto/sha256.c 2005-01-19 22:52:24.000000000 -0800
+++ rnd2/crypto/sha256.c 2005-01-20 12:27:36.000000000 -0800
@@ -42,15 +42,10 @@
return (x & y) | (z & (x | y));
}

-static inline u32 RORu32(u32 x, u32 y)
-{
- return (x >> y) | (x << (32 - y));
-}
-
-#define e0(x) (RORu32(x, 2) ^ RORu32(x,13) ^ RORu32(x,22))
-#define e1(x) (RORu32(x, 6) ^ RORu32(x,11) ^ RORu32(x,25))
-#define s0(x) (RORu32(x, 7) ^ RORu32(x,18) ^ (x >> 3))
-#define s1(x) (RORu32(x,17) ^ RORu32(x,19) ^ (x >> 10))
+#define e0(x) (ror32(x, 2) ^ ror32(x,13) ^ ror32(x,22))
+#define e1(x) (ror32(x, 6) ^ ror32(x,11) ^ ror32(x,25))
+#define s0(x) (ror32(x, 7) ^ ror32(x,18) ^ (x >> 3))
+#define s1(x) (ror32(x,17) ^ ror32(x,19) ^ (x >> 10))

#define H0 0x6a09e667
#define H1 0xbb67ae85

2005-01-21 22:19:32

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 4/12] random pt4: Cleanup SHA interface

Clean up SHA hash function for moving to lib/
Do proper endian conversion
Provide sha_init function
Add kerneldoc

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/drivers/char/random.c
===================================================================
--- rnd2.orig/drivers/char/random.c 2005-01-20 12:28:27.979725732 -0800
+++ rnd2/drivers/char/random.c 2005-01-20 12:28:34.506893589 -0800
@@ -671,29 +671,6 @@

EXPORT_SYMBOL(add_disk_randomness);

-/******************************************************************
- *
- * Hash function definition
- *
- *******************************************************************/
-
-/*
- * This chunk of code defines a function
- * void sha_transform(__u32 digest[HASH_BUFFER_SIZE + HASH_EXTRA_SIZE],
- * __u32 const data[16])
- *
- * The function hashes the input data to produce a digest in the first
- * HASH_BUFFER_SIZE words of the digest[] array, and uses HASH_EXTRA_SIZE
- * more words for internal purposes. (This buffer is exported so the
- * caller can wipe it once rather than this code doing it each call,
- * and tacking it onto the end of the digest[] array is the quick and
- * dirty way of doing it.)
- *
- * For /dev/random purposes, the length of the data being hashed is
- * fixed in length, so appending a bit count in the usual way is not
- * cryptographically necessary.
- */
-
#define HASH_BUFFER_SIZE 5
#define EXTRACT_SIZE 10
#define HASH_EXTRA_SIZE 80
@@ -717,20 +694,32 @@
#define K3 0x8F1BBCDCL /* Rounds 40-59: sqrt(5) * 2^30 */
#define K4 0xCA62C1D6L /* Rounds 60-79: sqrt(10) * 2^30 */

-static void sha_transform(__u32 digest[85], __u32 const data[16])
+/*
+ * sha_transform: single block SHA1 transform
+ *
+ * @digest: 160 bit digest to update
+ * @data: 512 bytes of data to hash
+ * @W: 80 words of workspace
+ *
+ * This function generates a SHA1 digest for a single. Be warned, it
+ * does not handle padding and message digest, do not confuse it with
+ * the full FIPS 180-1 digest algorithm for variable length messages.
+ */
+static void sha_transform(__u32 digest[5], const char *data, __u32 W[80])
{
- __u32 A, B, C, D, E; /* Local vars */
+ __u32 A, B, C, D, E;
__u32 TEMP;
int i;
-#define W (digest + HASH_BUFFER_SIZE) /* Expanded data array */

+ memset(W, 0, sizeof(W));
+ for (i = 0; i < 16; i++)
+ W[i] = be32_to_cpu(((const __u32 *)data)[i]);
/*
* Do the preliminary expansion of 16 to 80 words. Doing it
* out-of-line line this is faster than doing it in-line on
* register-starved machines like the x86, and not really any
* slower on real processors.
*/
- memcpy(W, data, 16*sizeof(__u32));
for (i = 0; i < 64; i++) {
TEMP = W[i] ^ W[i+2] ^ W[i+8] ^ W[i+13];
W[i+16] = rol32(TEMP, 1);
@@ -768,7 +757,6 @@
digest[4] += E;

/* W is wiped by the caller */
-#undef W
}

#undef f1
@@ -780,6 +768,20 @@
#undef K3
#undef K4

+/*
+ * sha_init: initialize the vectors for a SHA1 digest
+ *
+ * @buf: vector to initialize
+ */
+static void sha_init(__u32 *buf)
+{
+ buf[0] = 0x67452301;
+ buf[1] = 0xefcdab89;
+ buf[2] = 0x98badcfe;
+ buf[3] = 0x10325476;
+ buf[4] = 0xc3d2e1f0;
+}
+
/*********************************************************************
*
* Entropy extraction routines
@@ -870,13 +872,7 @@
int i, x;
__u32 data[16], buf[85];

- /* Hash the pool to get the output */
- buf[0] = 0x67452301;
- buf[1] = 0xefcdab89;
- buf[2] = 0x98badcfe;
- buf[3] = 0x10325476;
- buf[4] = 0xc3d2e1f0;
-
+ sha_init(buf);
/*
* As we hash the pool, we mix intermediate values of
* the hash back into the pool. This eliminates
@@ -886,7 +882,7 @@
* function can be inverted.
*/
for (i = 0, x = 0; i < r->poolinfo->poolwords; i += 16, x+=2) {
- sha_transform(buf, r->pool+i);
+ sha_transform(buf, (__u8 *)r->pool+i, buf + 5);
add_entropy_words(r, &buf[x % 5], 1);
}

@@ -896,7 +892,7 @@
* final time.
*/
__add_entropy_words(r, &buf[x % 5], 1, data);
- sha_transform(buf, data);
+ sha_transform(buf, (__u8 *)data, buf + 5);

/*
* In case the hash function has some recognizable
@@ -1771,7 +1767,7 @@
tmp[0]=saddr;
tmp[1]=daddr;
tmp[2]=(sport << 16) + dport;
- sha_transform(tmp+16, tmp);
+ sha_transform(tmp+16, (__u8 *)tmp, tmp + 16 + 5);
seq = tmp[17] + sseq + (count << COOKIEBITS);

memcpy(tmp + 3, syncookie_secret[1], sizeof(syncookie_secret[1]));
@@ -1779,7 +1775,7 @@
tmp[1]=daddr;
tmp[2]=(sport << 16) + dport;
tmp[3] = count; /* minute counter */
- sha_transform(tmp + 16, tmp);
+ sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);

/* Add in the second hash and the data */
return seq + ((tmp[17] + data) & COOKIEMASK);
@@ -1808,7 +1804,7 @@
tmp[0]=saddr;
tmp[1]=daddr;
tmp[2]=(sport << 16) + dport;
- sha_transform(tmp + 16, tmp);
+ sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
cookie -= tmp[17] + sseq;
/* Cookie is now reduced to (count * 2^24) ^ (hash % 2^24) */

2005-01-21 21:58:22

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 12/12] random pt4: Move other tcp/ip bits to net/

Move remaining TCP bits from random.c to networking land.

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd2/include/net/tcp.h
===================================================================
--- rnd2.orig/include/net/tcp.h 2005-01-20 10:15:06.896220663 -0800
+++ rnd2/include/net/tcp.h 2005-01-20 10:16:59.315888375 -0800
@@ -2056,4 +2056,12 @@

return (cwnd != 0);
}
+
+/* from net/ipv4/random.c */
+
+extern u32 secure_tcp_port_ephemeral(__u32 saddr, __u32 daddr, __u16 dport);
+extern __u32 secure_tcp_sequence_number(__u32 saddr, __u32 daddr,
+ __u16 sport, __u16 dport);
+extern __u32 secure_tcpv6_sequence_number(__u32 *saddr, __u32 *daddr,
+ __u16 sport, __u16 dport);
#endif /* _TCP_H */
Index: rnd2/net/ipv4/Makefile
===================================================================
--- rnd2.orig/net/ipv4/Makefile 2005-01-20 10:15:06.896220663 -0800
+++ rnd2/net/ipv4/Makefile 2005-01-20 10:16:59.315888375 -0800
@@ -2,7 +2,7 @@
# Makefile for the Linux TCP/IP (INET) layer.
#

-obj-y := utils.o route.o inetpeer.o protocol.o \
+obj-y := utils.o random.o route.o inetpeer.o protocol.o \
ip_input.o ip_fragment.o ip_forward.o ip_options.o \
ip_output.o ip_sockglue.o \
tcp.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o tcp_minisocks.o \
Index: rnd2/drivers/char/random.c
===================================================================
--- rnd2.orig/drivers/char/random.c 2005-01-20 10:16:43.345924372 -0800
+++ rnd2/drivers/char/random.c 2005-01-20 10:16:59.317888120 -0800
@@ -1287,288 +1287,3 @@
{ .ctl_name = 0 }
};
#endif /* CONFIG_SYSCTL */
-
-/********************************************************************
- *
- * Random funtions for networking
- *
- ********************************************************************/
-
-#ifdef CONFIG_INET
-/*
- * TCP initial sequence number picking. This uses the random number
- * generator to pick an initial secret value. This value is hashed
- * along with the TCP endpoint information to provide a unique
- * starting point for each pair of TCP endpoints. This defeats
- * attacks which rely on guessing the initial TCP sequence number.
- * This algorithm was suggested by Steve Bellovin.
- *
- * Using a very strong hash was taking an appreciable amount of the total
- * TCP connection establishment time, so this is a weaker hash,
- * compensated for by changing the secret periodically.
- */
-
-/* F, G and H are basic MD4 functions: selection, majority, parity */
-#define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
-#define G(x, y, z) (((x) & (y)) + (((x) ^ (y)) & (z)))
-#define H(x, y, z) ((x) ^ (y) ^ (z))
-
-/*
- * The generic round function. The application is so specific that
- * we don't bother protecting all the arguments with parens, as is generally
- * good macro practice, in favor of extra legibility.
- * Rotation is separate from addition to prevent recomputation
- */
-#define ROUND(f, a, b, c, d, x, s) \
- (a += f(b, c, d) + x, a = (a << s) | (a >> (32 - s)))
-#define K1 0
-#define K2 013240474631UL
-#define K3 015666365641UL
-
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-
-static __u32 twothirdsMD4Transform (__u32 const buf[4], __u32 const in[12])
-{
- __u32 a = buf[0], b = buf[1], c = buf[2], d = buf[3];
-
- /* Round 1 */
- ROUND(F, a, b, c, d, in[ 0] + K1, 3);
- ROUND(F, d, a, b, c, in[ 1] + K1, 7);
- ROUND(F, c, d, a, b, in[ 2] + K1, 11);
- ROUND(F, b, c, d, a, in[ 3] + K1, 19);
- ROUND(F, a, b, c, d, in[ 4] + K1, 3);
- ROUND(F, d, a, b, c, in[ 5] + K1, 7);
- ROUND(F, c, d, a, b, in[ 6] + K1, 11);
- ROUND(F, b, c, d, a, in[ 7] + K1, 19);
- ROUND(F, a, b, c, d, in[ 8] + K1, 3);
- ROUND(F, d, a, b, c, in[ 9] + K1, 7);
- ROUND(F, c, d, a, b, in[10] + K1, 11);
- ROUND(F, b, c, d, a, in[11] + K1, 19);
-
- /* Round 2 */
- ROUND(G, a, b, c, d, in[ 1] + K2, 3);
- ROUND(G, d, a, b, c, in[ 3] + K2, 5);
- ROUND(G, c, d, a, b, in[ 5] + K2, 9);
- ROUND(G, b, c, d, a, in[ 7] + K2, 13);
- ROUND(G, a, b, c, d, in[ 9] + K2, 3);
- ROUND(G, d, a, b, c, in[11] + K2, 5);
- ROUND(G, c, d, a, b, in[ 0] + K2, 9);
- ROUND(G, b, c, d, a, in[ 2] + K2, 13);
- ROUND(G, a, b, c, d, in[ 4] + K2, 3);
- ROUND(G, d, a, b, c, in[ 6] + K2, 5);
- ROUND(G, c, d, a, b, in[ 8] + K2, 9);
- ROUND(G, b, c, d, a, in[10] + K2, 13);
-
- /* Round 3 */
- ROUND(H, a, b, c, d, in[ 3] + K3, 3);
- ROUND(H, d, a, b, c, in[ 7] + K3, 9);
- ROUND(H, c, d, a, b, in[11] + K3, 11);
- ROUND(H, b, c, d, a, in[ 2] + K3, 15);
- ROUND(H, a, b, c, d, in[ 6] + K3, 3);
- ROUND(H, d, a, b, c, in[10] + K3, 9);
- ROUND(H, c, d, a, b, in[ 1] + K3, 11);
- ROUND(H, b, c, d, a, in[ 5] + K3, 15);
- ROUND(H, a, b, c, d, in[ 9] + K3, 3);
- ROUND(H, d, a, b, c, in[ 0] + K3, 9);
- ROUND(H, c, d, a, b, in[ 4] + K3, 11);
- ROUND(H, b, c, d, a, in[ 8] + K3, 15);
-
- return buf[1] + b; /* "most hashed" word */
- /* Alternative: return sum of all words? */
-}
-#endif
-
-#undef ROUND
-#undef F
-#undef G
-#undef H
-#undef K1
-#undef K2
-#undef K3
-
-/* This should not be decreased so low that ISNs wrap too fast. */
-#define REKEY_INTERVAL (300 * HZ)
-/*
- * Bit layout of the tcp sequence numbers (before adding current time):
- * bit 24-31: increased after every key exchange
- * bit 0-23: hash(source,dest)
- *
- * The implementation is similar to the algorithm described
- * in the Appendix of RFC 1185, except that
- * - it uses a 1 MHz clock instead of a 250 kHz clock
- * - it performs a rekey every 5 minutes, which is equivalent
- * to a (source,dest) tulple dependent forward jump of the
- * clock by 0..2^(HASH_BITS+1)
- *
- * Thus the average ISN wraparound time is 68 minutes instead of
- * 4.55 hours.
- *
- * SMP cleanup and lock avoidance with poor man's RCU.
- * Manfred Spraul <[email protected]>
- *
- */
-#define COUNT_BITS 8
-#define COUNT_MASK ((1 << COUNT_BITS) - 1)
-#define HASH_BITS 24
-#define HASH_MASK ((1 << HASH_BITS) - 1)
-
-static struct keydata {
- __u32 count; /* already shifted to the final position */
- __u32 secret[12];
-} ____cacheline_aligned ip_keydata[2];
-
-static unsigned int ip_cnt;
-
-static void rekey_seq_generator(void *private_);
-
-static DECLARE_WORK(rekey_work, rekey_seq_generator, NULL);
-
-/*
- * Lock avoidance:
- * The ISN generation runs lockless - it's just a hash over random data.
- * State changes happen every 5 minutes when the random key is replaced.
- * Synchronization is performed by having two copies of the hash function
- * state and rekey_seq_generator always updates the inactive copy.
- * The copy is then activated by updating ip_cnt.
- * The implementation breaks down if someone blocks the thread
- * that processes SYN requests for more than 5 minutes. Should never
- * happen, and even if that happens only a not perfectly compliant
- * ISN is generated, nothing fatal.
- */
-static void rekey_seq_generator(void *private_)
-{
- struct keydata *keyptr = &ip_keydata[1 ^ (ip_cnt & 1)];
-
- get_random_bytes(keyptr->secret, sizeof(keyptr->secret));
- keyptr->count = (ip_cnt & COUNT_MASK) << HASH_BITS;
- smp_wmb();
- ip_cnt++;
- schedule_delayed_work(&rekey_work, REKEY_INTERVAL);
-}
-
-static inline struct keydata *get_keyptr(void)
-{
- struct keydata *keyptr = &ip_keydata[ip_cnt & 1];
-
- smp_rmb();
-
- return keyptr;
-}
-
-static __init int seqgen_init(void)
-{
- rekey_seq_generator(NULL);
- return 0;
-}
-late_initcall(seqgen_init);
-
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-__u32 secure_tcpv6_sequence_number(__u32 *saddr, __u32 *daddr,
- __u16 sport, __u16 dport)
-{
- struct timeval tv;
- __u32 seq;
- __u32 hash[12];
- struct keydata *keyptr = get_keyptr();
-
- /* The procedure is the same as for IPv4, but addresses are longer.
- * Thus we must use twothirdsMD4Transform.
- */
-
- memcpy(hash, saddr, 16);
- hash[4]=(sport << 16) + dport;
- memcpy(&hash[5],keyptr->secret,sizeof(__u32) * 7);
-
- seq = twothirdsMD4Transform(daddr, hash) & HASH_MASK;
- seq += keyptr->count;
-
- do_gettimeofday(&tv);
- seq += tv.tv_usec + tv.tv_sec * 1000000;
-
- return seq;
-}
-EXPORT_SYMBOL(secure_tcpv6_sequence_number);
-#endif
-
-__u32 secure_tcp_sequence_number(__u32 saddr, __u32 daddr,
- __u16 sport, __u16 dport)
-{
- struct timeval tv;
- __u32 seq;
- __u32 hash[4];
- struct keydata *keyptr = get_keyptr();
-
- /*
- * Pick a unique starting offset for each TCP connection endpoints
- * (saddr, daddr, sport, dport).
- * Note that the words are placed into the starting vector, which is
- * then mixed with a partial MD4 over random data.
- */
- hash[0]=saddr;
- hash[1]=daddr;
- hash[2]=(sport << 16) + dport;
- hash[3]=keyptr->secret[11];
-
- seq = half_md4_transform(hash, keyptr->secret) & HASH_MASK;
- seq += keyptr->count;
- /*
- * As close as possible to RFC 793, which
- * suggests using a 250 kHz clock.
- * Further reading shows this assumes 2 Mb/s networks.
- * For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
- * That's funny, Linux has one built in! Use it!
- * (Networks are faster now - should this be increased?)
- */
- do_gettimeofday(&tv);
- seq += tv.tv_usec + tv.tv_sec * 1000000;
-#if 0
- printk("init_seq(%lx, %lx, %d, %d) = %d\n",
- saddr, daddr, sport, dport, seq);
-#endif
- return seq;
-}
-
-EXPORT_SYMBOL(secure_tcp_sequence_number);
-
-/* The code below is shamelessly stolen from secure_tcp_sequence_number().
- * All blames to Andrey V. Savochkin <[email protected]>.
- */
-__u32 secure_ip_id(__u32 daddr)
-{
- struct keydata *keyptr;
- __u32 hash[4];
-
- keyptr = get_keyptr();
-
- /*
- * Pick a unique starting offset for each IP destination.
- * The dest ip address is placed in the starting vector,
- * which is then hashed with random data.
- */
- hash[0] = daddr;
- hash[1] = keyptr->secret[9];
- hash[2] = keyptr->secret[10];
- hash[3] = keyptr->secret[11];
-
- return half_md4_transform(hash, keyptr->secret);
-}
-
-/* Generate secure starting point for ephemeral TCP port search */
-u32 secure_tcp_port_ephemeral(__u32 saddr, __u32 daddr, __u16 dport)
-{
- struct keydata *keyptr = get_keyptr();
- u32 hash[4];
-
- /*
- * Pick a unique starting offset for each ephemeral port search
- * (saddr, daddr, dport) and 48bits of random data.
- */
- hash[0] = saddr;
- hash[1] = daddr;
- hash[2] = dport ^ keyptr->secret[10];
- hash[3] = keyptr->secret[11];
-
- return half_md4_transform(hash, keyptr->secret);
-}
-
-#endif /* CONFIG_INET */
Index: rnd2/include/net/ip.h
===================================================================
--- rnd2.orig/include/net/ip.h 2005-01-20 10:15:07.081197080 -0800
+++ rnd2/include/net/ip.h 2005-01-20 10:16:59.318887992 -0800
@@ -339,4 +339,8 @@
void __user *newval, size_t newlen,
void **context);

+/* from net/ipv4/random.c */
+
+extern __u32 secure_ip_id(__u32 daddr);
+
#endif /* _IP_H */
Index: rnd2/net/ipv4/inetpeer.c
===================================================================
--- rnd2.orig/net/ipv4/inetpeer.c 2005-01-20 10:15:06.895220790 -0800
+++ rnd2/net/ipv4/inetpeer.c 2005-01-20 10:16:59.319887865 -0800
@@ -21,6 +21,7 @@
#include <linux/mm.h>
#include <linux/net.h>
#include <net/inetpeer.h>
+#include <net/ip.h>

/*
* Theory of operations.
Index: rnd2/net/ipv4/random.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ rnd2/net/ipv4/random.c 2005-01-20 10:16:59.320887737 -0800
@@ -0,0 +1,275 @@
+/*
+ * TCP initial sequence number picking. This uses the random number
+ * generator to pick an initial secret value. This value is hashed
+ * along with the TCP endpoint information to provide a unique
+ * starting point for each pair of TCP endpoints. This defeats
+ * attacks which rely on guessing the initial TCP sequence number.
+ * This algorithm was suggested by Steve Bellovin.
+ *
+ * Using a very strong hash was taking an appreciable amount of the total
+ * TCP connection establishment time, so this is a weaker hash,
+ * compensated for by changing the secret periodically.
+ *
+ * Bit layout of the tcp sequence numbers (before adding current time):
+ * bit 24-31: increased after every key exchange
+ * bit 0-23: hash(source,dest)
+ *
+ * The implementation is similar to the algorithm described
+ * in the Appendix of RFC 1185, except that
+ * - it uses a 1 MHz clock instead of a 250 kHz clock
+ * - it performs a rekey every 5 minutes, which is equivalent
+ * to a (source,dest) tulple dependent forward jump of the
+ * clock by 0..2^(HASH_BITS+1)
+ *
+ * Thus the average ISN wraparound time is 68 minutes instead of
+ * 4.55 hours.
+ *
+ * SMP cleanup and lock avoidance with poor man's RCU.
+ * Manfred Spraul <[email protected]>
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/random.h>
+#include <linux/workqueue.h>
+#include <linux/jiffies.h>
+#include <linux/init.h>
+#include <linux/cryptohash.h>
+#include <linux/module.h>
+
+#define COUNT_BITS 8
+#define COUNT_MASK ((1 << COUNT_BITS) - 1)
+#define HASH_BITS 24
+#define HASH_MASK ((1 << HASH_BITS) - 1)
+
+/* This should not be decreased so low that ISNs wrap too fast. */
+#define REKEY_INTERVAL (300 * HZ)
+static void rekey_seq_generator(void *private_);
+static DECLARE_WORK(rekey_work, rekey_seq_generator, NULL);
+
+/*
+ * Lock avoidance:
+ * The ISN generation runs lockless - it's just a hash over random data.
+ * State changes happen every 5 minutes when the random key is replaced.
+ * Synchronization is performed by having two copies of the hash function
+ * state and rekey_seq_generator always updates the inactive copy.
+ * The copy is then activated by updating ip_cnt.
+ * The implementation breaks down if someone blocks the thread
+ * that processes SYN requests for more than 5 minutes. Should never
+ * happen, and even if that happens only a not perfectly compliant
+ * ISN is generated, nothing fatal.
+ */
+
+static struct keydata {
+ __u32 count; /* already shifted to the final position */
+ __u32 secret[12];
+} ____cacheline_aligned ip_keydata[2];
+
+static unsigned int ip_cnt;
+
+static void rekey_seq_generator(void *private_)
+{
+ struct keydata *keyptr = &ip_keydata[1 ^ (ip_cnt & 1)];
+
+ get_random_bytes(keyptr->secret, sizeof(keyptr->secret));
+ keyptr->count = (ip_cnt & COUNT_MASK) << HASH_BITS;
+ smp_wmb();
+ ip_cnt++;
+ schedule_delayed_work(&rekey_work, REKEY_INTERVAL);
+}
+
+static __init int seqgen_init(void)
+{
+ rekey_seq_generator(NULL);
+ return 0;
+}
+late_initcall(seqgen_init);
+
+static struct keydata *get_keyptr(void)
+{
+ struct keydata *keyptr = &ip_keydata[ip_cnt & 1];
+
+ smp_rmb();
+
+ return keyptr;
+}
+
+__u32 secure_tcp_sequence_number(__u32 saddr, __u32 daddr,
+ __u16 sport, __u16 dport)
+{
+ struct timeval tv;
+ __u32 seq;
+ __u32 hash[4];
+ struct keydata *keyptr = get_keyptr();
+
+ /*
+ * Pick a unique starting offset for each TCP connection endpoints
+ * (saddr, daddr, sport, dport).
+ * Note that the words are placed into the starting vector, which is
+ * then mixed with a partial MD4 over random data.
+ */
+ hash[0]=saddr;
+ hash[1]=daddr;
+ hash[2]=(sport << 16) + dport;
+ hash[3]=keyptr->secret[11];
+
+ seq = half_md4_transform(hash, keyptr->secret) & HASH_MASK;
+ seq += keyptr->count;
+ /*
+ * As close as possible to RFC 793, which
+ * suggests using a 250 kHz clock.
+ * Further reading shows this assumes 2 Mb/s networks.
+ * For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
+ * That's funny, Linux has one built in! Use it!
+ * (Networks are faster now - should this be increased?)
+ */
+ do_gettimeofday(&tv);
+ seq += tv.tv_usec + tv.tv_sec * 1000000;
+ return seq;
+}
+
+EXPORT_SYMBOL(secure_tcp_sequence_number);
+
+/* The code below is shamelessly stolen from secure_tcp_sequence_number().
+ * All blames to Andrey V. Savochkin <[email protected]>.
+ */
+__u32 secure_ip_id(__u32 daddr)
+{
+ struct keydata *keyptr;
+ __u32 hash[4];
+
+ keyptr = get_keyptr();
+
+ /*
+ * Pick a unique starting offset for each IP destination.
+ * The dest ip address is placed in the starting vector,
+ * which is then hashed with random data.
+ */
+ hash[0] = daddr;
+ hash[1] = keyptr->secret[9];
+ hash[2] = keyptr->secret[10];
+ hash[3] = keyptr->secret[11];
+
+ return half_md4_transform(hash, keyptr->secret);
+}
+
+/* Generate secure starting point for ephemeral TCP port search */
+u32 secure_tcp_port_ephemeral(__u32 saddr, __u32 daddr, __u16 dport)
+{
+ struct keydata *keyptr = get_keyptr();
+ u32 hash[4];
+
+ /*
+ * Pick a unique starting offset for each ephemeral port search
+ * (saddr, daddr, dport) and 48bits of random data.
+ */
+ hash[0] = saddr;
+ hash[1] = daddr;
+ hash[2] = dport ^ keyptr->secret[10];
+ hash[3] = keyptr->secret[11];
+
+ return half_md4_transform(hash, keyptr->secret);
+}
+
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+
+/* F, G and H are basic MD4 functions: selection, majority, parity */
+#define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
+#define G(x, y, z) (((x) & (y)) + (((x) ^ (y)) & (z)))
+#define H(x, y, z) ((x) ^ (y) ^ (z))
+
+/*
+ * The generic round function. The application is so specific that
+ * we don't bother protecting all the arguments with parens, as is generally
+ * good macro practice, in favor of extra legibility.
+ * Rotation is separate from addition to prevent recomputation
+ */
+#define ROUND(f, a, b, c, d, x, s) \
+ (a += f(b, c, d) + x, a = (a << s) | (a >> (32 - s)))
+#define K1 0
+#define K2 013240474631UL
+#define K3 015666365641UL
+
+static __u32 twothirdsMD4Transform (__u32 const buf[4], __u32 const in[12])
+{
+ __u32 a = buf[0], b = buf[1], c = buf[2], d = buf[3];
+
+ /* Round 1 */
+ ROUND(F, a, b, c, d, in[ 0] + K1, 3);
+ ROUND(F, d, a, b, c, in[ 1] + K1, 7);
+ ROUND(F, c, d, a, b, in[ 2] + K1, 11);
+ ROUND(F, b, c, d, a, in[ 3] + K1, 19);
+ ROUND(F, a, b, c, d, in[ 4] + K1, 3);
+ ROUND(F, d, a, b, c, in[ 5] + K1, 7);
+ ROUND(F, c, d, a, b, in[ 6] + K1, 11);
+ ROUND(F, b, c, d, a, in[ 7] + K1, 19);
+ ROUND(F, a, b, c, d, in[ 8] + K1, 3);
+ ROUND(F, d, a, b, c, in[ 9] + K1, 7);
+ ROUND(F, c, d, a, b, in[10] + K1, 11);
+ ROUND(F, b, c, d, a, in[11] + K1, 19);
+
+ /* Round 2 */
+ ROUND(G, a, b, c, d, in[ 1] + K2, 3);
+ ROUND(G, d, a, b, c, in[ 3] + K2, 5);
+ ROUND(G, c, d, a, b, in[ 5] + K2, 9);
+ ROUND(G, b, c, d, a, in[ 7] + K2, 13);
+ ROUND(G, a, b, c, d, in[ 9] + K2, 3);
+ ROUND(G, d, a, b, c, in[11] + K2, 5);
+ ROUND(G, c, d, a, b, in[ 0] + K2, 9);
+ ROUND(G, b, c, d, a, in[ 2] + K2, 13);
+ ROUND(G, a, b, c, d, in[ 4] + K2, 3);
+ ROUND(G, d, a, b, c, in[ 6] + K2, 5);
+ ROUND(G, c, d, a, b, in[ 8] + K2, 9);
+ ROUND(G, b, c, d, a, in[10] + K2, 13);
+
+ /* Round 3 */
+ ROUND(H, a, b, c, d, in[ 3] + K3, 3);
+ ROUND(H, d, a, b, c, in[ 7] + K3, 9);
+ ROUND(H, c, d, a, b, in[11] + K3, 11);
+ ROUND(H, b, c, d, a, in[ 2] + K3, 15);
+ ROUND(H, a, b, c, d, in[ 6] + K3, 3);
+ ROUND(H, d, a, b, c, in[10] + K3, 9);
+ ROUND(H, c, d, a, b, in[ 1] + K3, 11);
+ ROUND(H, b, c, d, a, in[ 5] + K3, 15);
+ ROUND(H, a, b, c, d, in[ 9] + K3, 3);
+ ROUND(H, d, a, b, c, in[ 0] + K3, 9);
+ ROUND(H, c, d, a, b, in[ 4] + K3, 11);
+ ROUND(H, b, c, d, a, in[ 8] + K3, 15);
+
+ return buf[1] + b; /* "most hashed" word */
+ /* Alternative: return sum of all words? */
+}
+#undef ROUND
+#undef F
+#undef G
+#undef H
+#undef K1
+#undef K2
+#undef K3
+
+__u32 secure_tcpv6_sequence_number(__u32 *saddr, __u32 *daddr,
+ __u16 sport, __u16 dport)
+{
+ struct timeval tv;
+ __u32 seq;
+ __u32 hash[12];
+ struct keydata *keyptr = get_keyptr();
+
+ /* The procedure is the same as for IPv4, but addresses are longer.
+ * Thus we must use twothirdsMD4Transform.
+ */
+
+ memcpy(hash, saddr, 16);
+ hash[4]=(sport << 16) + dport;
+ memcpy(&hash[5],keyptr->secret,sizeof(__u32) * 7);
+
+ seq = twothirdsMD4Transform(daddr, hash) & HASH_MASK;
+ seq += keyptr->count;
+
+ do_gettimeofday(&tv);
+ seq += tv.tv_usec + tv.tv_sec * 1000000;
+
+ return seq;
+}
+EXPORT_SYMBOL(secure_tcpv6_sequence_number);
+#endif
Index: rnd2/include/linux/random.h
===================================================================
--- rnd2.orig/include/linux/random.h 2005-01-20 10:16:24.202364968 -0800
+++ rnd2/include/linux/random.h 2005-01-20 10:16:59.321887610 -0800
@@ -51,13 +51,6 @@
extern void get_random_bytes(void *buf, int nbytes);
void generate_random_uuid(unsigned char uuid_out[16]);

-extern __u32 secure_ip_id(__u32 daddr);
-extern u32 secure_tcp_port_ephemeral(__u32 saddr, __u32 daddr, __u16 dport);
-extern __u32 secure_tcp_sequence_number(__u32 saddr, __u32 daddr,
- __u16 sport, __u16 dport);
-extern __u32 secure_tcpv6_sequence_number(__u32 *saddr, __u32 *daddr,
- __u16 sport, __u16 dport);
-
#ifndef MODULE
extern struct file_operations random_fops, urandom_fops;
#endif

2005-01-21 22:31:44

by Matt Mackall

[permalink] [raw]
Subject: [PATCH 9/12] random pt4: Kill duplicate halfmd4 in ext3 htree

Replace duplicate halfMD4 code with call to lib/

Signed-off-by: Matt Mackall <[email protected]>

Index: rnd/fs/ext3/hash.c
===================================================================
--- rnd.orig/fs/ext3/hash.c 2005-01-12 21:27:14.191356048 -0800
+++ rnd/fs/ext3/hash.c 2005-01-12 21:28:28.916829361 -0800
@@ -13,6 +13,7 @@
#include <linux/jbd.h>
#include <linux/sched.h>
#include <linux/ext3_fs.h>
+#include <linux/cryptohash.h>

#define DELTA 0x9E3779B9

@@ -33,73 +34,6 @@
buf[1] += b1;
}

-/* F, G and H are basic MD4 functions: selection, majority, parity */
-#define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
-#define G(x, y, z) (((x) & (y)) + (((x) ^ (y)) & (z)))
-#define H(x, y, z) ((x) ^ (y) ^ (z))
-
-/*
- * The generic round function. The application is so specific that
- * we don't bother protecting all the arguments with parens, as is generally
- * good macro practice, in favor of extra legibility.
- * Rotation is separate from addition to prevent recomputation
- */
-#define ROUND(f, a, b, c, d, x, s) \
- (a += f(b, c, d) + x, a = (a << s) | (a >> (32-s)))
-#define K1 0
-#define K2 013240474631UL
-#define K3 015666365641UL
-
-/*
- * Basic cut-down MD4 transform. Returns only 32 bits of result.
- */
-static void halfMD4Transform (__u32 buf[4], __u32 const in[])
-{
- __u32 a = buf[0], b = buf[1], c = buf[2], d = buf[3];
-
- /* Round 1 */
- ROUND(F, a, b, c, d, in[0] + K1, 3);
- ROUND(F, d, a, b, c, in[1] + K1, 7);
- ROUND(F, c, d, a, b, in[2] + K1, 11);
- ROUND(F, b, c, d, a, in[3] + K1, 19);
- ROUND(F, a, b, c, d, in[4] + K1, 3);
- ROUND(F, d, a, b, c, in[5] + K1, 7);
- ROUND(F, c, d, a, b, in[6] + K1, 11);
- ROUND(F, b, c, d, a, in[7] + K1, 19);
-
- /* Round 2 */
- ROUND(G, a, b, c, d, in[1] + K2, 3);
- ROUND(G, d, a, b, c, in[3] + K2, 5);
- ROUND(G, c, d, a, b, in[5] + K2, 9);
- ROUND(G, b, c, d, a, in[7] + K2, 13);
- ROUND(G, a, b, c, d, in[0] + K2, 3);
- ROUND(G, d, a, b, c, in[2] + K2, 5);
- ROUND(G, c, d, a, b, in[4] + K2, 9);
- ROUND(G, b, c, d, a, in[6] + K2, 13);
-
- /* Round 3 */
- ROUND(H, a, b, c, d, in[3] + K3, 3);
- ROUND(H, d, a, b, c, in[7] + K3, 9);
- ROUND(H, c, d, a, b, in[2] + K3, 11);
- ROUND(H, b, c, d, a, in[6] + K3, 15);
- ROUND(H, a, b, c, d, in[1] + K3, 3);
- ROUND(H, d, a, b, c, in[5] + K3, 9);
- ROUND(H, c, d, a, b, in[0] + K3, 11);
- ROUND(H, b, c, d, a, in[4] + K3, 15);
-
- buf[0] += a;
- buf[1] += b;
- buf[2] += c;
- buf[3] += d;
-}
-
-#undef ROUND
-#undef F
-#undef G
-#undef H
-#undef K1
-#undef K2
-#undef K3

/* The old legacy hash */
static __u32 dx_hack_hash (const char *name, int len)
@@ -187,7 +121,7 @@
p = name;
while (len > 0) {
str2hashbuf(p, len, in, 8);
- halfMD4Transform(buf, in);
+ half_md4_transform(buf, in);
len -= 32;
p += 32;
}
Index: rnd/include/linux/cryptohash.h
===================================================================
--- rnd.orig/include/linux/cryptohash.h 2005-01-12 21:28:27.412021208 -0800
+++ rnd/include/linux/cryptohash.h 2005-01-12 21:28:28.917829233 -0800
@@ -7,6 +7,6 @@
void sha_init(__u32 *buf);
void sha_transform(__u32 *digest, const char *data, __u32 *W);

-__u32 half_md4_transform(__u32 const buf[4], __u32 const in[8]);
+__u32 half_md4_transform(__u32 buf[4], __u32 const in[8]);

#endif
Index: rnd/lib/halfmd4.c
===================================================================
--- rnd.orig/lib/halfmd4.c 2005-01-12 21:28:27.410021462 -0800
+++ rnd/lib/halfmd4.c 2005-01-12 21:28:28.918829106 -0800
@@ -21,7 +21,7 @@
/*
* Basic cut-down MD4 transform. Returns only 32 bits of result.
*/
-__u32 half_md4_transform(__u32 const buf[4], __u32 const in[8])
+__u32 half_md4_transform(__u32 buf[4], __u32 const in[8])
{
__u32 a = buf[0], b = buf[1], c = buf[2], d = buf[3];

@@ -55,7 +55,11 @@
ROUND(H, c, d, a, b, in[0] + K3, 11);
ROUND(H, b, c, d, a, in[4] + K3, 15);

- return buf[1] + b; /* "most hashed" word */
- /* Alternative: return sum of all words? */
+ buf[0] += a;
+ buf[1] += b;
+ buf[2] += c;
+ buf[3] += d;
+
+ return buf[1]; /* "most hashed" word */
}

2005-01-23 02:14:15

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [PATCH 1/12] random pt4: Create new rol32/ror32 bitops

On Fri, 21 Jan 2005 at 15:41:06 -0600 Matt Mackall wrote:

> Add rol32 and ror32 bitops to bitops.h

Can you test this patch on top of yours? I did it on 2.6.10-ac10 but it
should apply OK. Compile tested and booted, but only random.c is using it
in my kernel.

x86-64 could use this too...

Add i386 bitops for rol32/ror32:
Signed-off-by: Chuck Ebbert <[email protected]>

--- 2.6.10-ac10/include/linux/bitops.h~orig 2005-01-22 11:31:20.130239000 -0500
+++ 2.6.10-ac10/include/linux/bitops.h 2005-01-22 11:34:55.740239000 -0500
@@ -129,6 +129,7 @@
return sizeof(w) == 4 ? generic_hweight32(w) : generic_hweight64(w);
}

+#ifndef __HAVE_ARCH_ROTATE_32
/*
* rol32 - rotate a 32-bit value left
*
@@ -150,5 +151,6 @@
{
return (word >> shift) | (word << (32 - shift));
}
+#endif /* ndef __HAVE_ARCH_ROTATE_32 */

#endif
--- 2.6.10-ac10/include/asm-i386/bitops.h~orig 2004-08-24 05:08:39.000000000 -0400
+++ 2.6.10-ac10/include/asm-i386/bitops.h 2005-01-22 11:42:12.010239000 -0500
@@ -431,6 +431,41 @@
#define hweight16(x) generic_hweight16(x)
#define hweight8(x) generic_hweight8(x)

+#define __HAVE_ARCH_ROTATE_32
+/*
+ * rol32 - rotate a 32-bit value left
+ *
+ * @word: value to rotate
+ * @shift: bits to roll
+ */
+static inline __u32 rol32(__u32 word, int shift)
+{
+ __u32 res;
+
+ asm("roll %%cl,%0"
+ : "=r" (res)
+ : "0" (word), "c" (shift)
+ : "cc");
+ return res;
+}
+
+/*
+ * ror32 - rotate a 32-bit value right
+ *
+ * @word: value to rotate
+ * @shift: bits to roll
+ */
+static inline __u32 ror32(__u32 word, int shift)
+{
+ __u32 res;
+
+ asm("rorl %%cl,%0"
+ : "=r" (res)
+ : "0" (word), "c" (shift)
+ : "cc");
+ return res;
+}
+
#endif /* __KERNEL__ */

#ifdef __KERNEL__

Chuck

2005-01-23 02:45:58

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH 1/12] random pt4: Create new rol32/ror32 bitops

On Sat, Jan 22, 2005 at 09:10:40PM -0500, Chuck Ebbert wrote:
> On Fri, 21 Jan 2005 at 15:41:06 -0600 Matt Mackall wrote:
>
> > Add rol32 and ror32 bitops to bitops.h
>
> Can you test this patch on top of yours? I did it on 2.6.10-ac10 but it
> should apply OK. Compile tested and booted, but only random.c is using it
> in my kernel.

If I recall correctly from my testing of this about a year ago,
compilers were already generating the appropriate rol/ror
instructions. Have you checked the disassembly and found it wanting?

--
Mathematics is the supreme nostalgia of our time.

2005-01-23 03:19:26

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/12] random pt4: Create new rol32/ror32 bitops

On Sat, Jan 22, 2005 at 09:10:40PM -0500, Chuck Ebbert wrote:
> On Fri, 21 Jan 2005 at 15:41:06 -0600 Matt Mackall wrote:
>
> > Add rol32 and ror32 bitops to bitops.h
>
> Can you test this patch on top of yours? I did it on 2.6.10-ac10 but it
> should apply OK. Compile tested and booted, but only random.c is using it
> in my kernel.

Does random really use variable rotates? For constant rotates
gcc detects the usual C idiom and turns it transparently into
the right machine instruction.

If that's the case I would use it because it'll avoid some messy
code everywhere.

-Andi

2005-01-23 04:13:43

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH 1/12] random pt4: Create new rol32/ror32 bitops

On Sun, Jan 23, 2005 at 04:19:21AM +0100, Andi Kleen wrote:
> On Sat, Jan 22, 2005 at 09:10:40PM -0500, Chuck Ebbert wrote:
> > On Fri, 21 Jan 2005 at 15:41:06 -0600 Matt Mackall wrote:
> >
> > > Add rol32 and ror32 bitops to bitops.h
> >
> > Can you test this patch on top of yours? I did it on 2.6.10-ac10 but it
> > should apply OK. Compile tested and booted, but only random.c is using it
> > in my kernel.
>
> Does random really use variable rotates? For constant rotates
> gcc detects the usual C idiom and turns it transparently into
> the right machine instruction.

Nope, random doesn't. The only thing I converted in my sweep that did
were CAST5 and CAST6, which are fairly unique in doing key-based
rotations. On the other hand:

typedef unsigned int __u32;

static inline __u32 rol32(__u32 word, int shift)
{
return (word << shift) | (word >> (32 - shift));
}

int foo(int val, int rot)
{
return rol32(val, rot);
}

With 2.95:

0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 4d 0c mov 0xc(%ebp),%ecx
6: 8b 45 08 mov 0x8(%ebp),%eax
9: d3 c0 rol %cl,%eax
b: c9 leave
c: c3 ret

With 3.3.5:

0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 08 mov 0x8(%ebp),%eax
6: 8b 4d 0c mov 0xc(%ebp),%ecx
9: 5d pop %ebp
a: d3 c0 rol %cl,%eax
c: c3 ret

With gcc-snapshot:

0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 08 mov 0x8(%ebp),%eax
6: 8b 4d 0c mov 0xc(%ebp),%ecx
9: d3 c0 rol %cl,%eax
b: 5d pop %ebp
c: c3 ret

So I think tweaks for x86 at least are unnecessary.

--
Mathematics is the supreme nostalgia of our time.

2005-01-23 07:47:44

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [PATCH 1/12] random pt4: Create new rol32/ror32 bitops

On Sat, 22 Jan 2005 at 20:13:24 -0800 Matt Mackall wrote:

> So I think tweaks for x86 at least are unnecessary.

So the compiler looks for that specific sequence of instructions:

(a << b) | (a >> (sizeof(a) * 8 - b)

and recognizes that it means rotation? Wow.


Chuck

2005-01-24 22:24:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/12] random pt4: Create new rol32/ror32 bitops

Followup to: <[email protected]>
By author: Chuck Ebbert <[email protected]>
In newsgroup: linux.dev.kernel
>
> On Sat, 22 Jan 2005 at 20:13:24 -0800 Matt Mackall wrote:
>
> > So I think tweaks for x86 at least are unnecessary.
>
> So the compiler looks for that specific sequence of instructions:
>
> (a << b) | (a >> (sizeof(a) * 8 - b)
>
> and recognizes that it means rotation? Wow.
>

You know, there is a LOT of instructions like that. x86 has a single
instruction to do:

a = b + (c << 3) + 3600;

(Assuming a, b and c are in eax, ebx, and ecx, respecively, the
instruction is "leal 3600(%ebx,%ecx,3),%eax".)

The C compiler really needs to recognize these kinds of idioms, not
just in the source, but that occur natually as a result of code
generation and optimizations. The compiler lingo for this is
"peephole optimization."

-hpa

2005-01-25 20:51:39

by Denis Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 4/12] random pt4: Cleanup SHA interface

On Friday 21 January 2005 23:41, Matt Mackall wrote:
> Clean up SHA hash function for moving to lib/
> Do proper endian conversion
> Provide sha_init function
> Add kerneldoc
>
> Signed-off-by: Matt Mackall <[email protected]>
>
> Index: rnd2/drivers/char/random.c
> ===================================================================
> --- rnd2.orig/drivers/char/random.c 2005-01-20 12:28:27.979725732 -0800
> +++ rnd2/drivers/char/random.c 2005-01-20 12:28:34.506893589 -0800
> @@ -671,29 +671,6 @@
>
> EXPORT_SYMBOL(add_disk_randomness);
>
> -/******************************************************************
> - *
> - * Hash function definition
> - *
> - *******************************************************************/
> -
> -/*
> - * This chunk of code defines a function
> - * void sha_transform(__u32 digest[HASH_BUFFER_SIZE + HASH_EXTRA_SIZE],
> - * __u32 const data[16])
> - *
> - * The function hashes the input data to produce a digest in the first
> - * HASH_BUFFER_SIZE words of the digest[] array, and uses HASH_EXTRA_SIZE
> - * more words for internal purposes. (This buffer is exported so the
> - * caller can wipe it once rather than this code doing it each call,
> - * and tacking it onto the end of the digest[] array is the quick and
> - * dirty way of doing it.)
> - *
> - * For /dev/random purposes, the length of the data being hashed is
> - * fixed in length, so appending a bit count in the usual way is not
> - * cryptographically necessary.
> - */
> -
> #define HASH_BUFFER_SIZE 5
> #define EXTRACT_SIZE 10
> #define HASH_EXTRA_SIZE 80
> @@ -717,20 +694,32 @@
> #define K3 0x8F1BBCDCL /* Rounds 40-59: sqrt(5) * 2^30 */
> #define K4 0xCA62C1D6L /* Rounds 60-79: sqrt(10) * 2^30 */
>
> -static void sha_transform(__u32 digest[85], __u32 const data[16])
> +/*
> + * sha_transform: single block SHA1 transform
> + *
> + * @digest: 160 bit digest to update
> + * @data: 512 bytes of data to hash
> + * @W: 80 words of workspace
> + *
> + * This function generates a SHA1 digest for a single. Be warned, it
> + * does not handle padding and message digest, do not confuse it with
> + * the full FIPS 180-1 digest algorithm for variable length messages.
> + */
> +static void sha_transform(__u32 digest[5], const char *data, __u32 W[80])
> {
> - __u32 A, B, C, D, E; /* Local vars */
> + __u32 A, B, C, D, E;
> __u32 TEMP;
> int i;
> -#define W (digest + HASH_BUFFER_SIZE) /* Expanded data array */
>
> + memset(W, 0, sizeof(W));

Hmm..... Parameter decays into pointer, sizeof(W) != 80*4. See:

# cat tt.c
#include <stdio.h>
void f(char w[80]) {
printf("%d\n", sizeof(w));
}
int main() {
f(0);
return 0;
}
# gcc tt.c; ./a.out
4
# gcc --version
gcc (GCC) 3.4.3

In light of this, if your sha1 passed regression tests,
I conclude that we don't need that memset at all.

> + for (i = 0; i < 16; i++)
> + W[i] = be32_to_cpu(((const __u32 *)data)[i]);
> /*
> * Do the preliminary expansion of 16 to 80 words. Doing it
> * out-of-line line this is faster than doing it in-line on
> * register-starved machines like the x86, and not really any
> * slower on real processors.
> */
> - memcpy(W, data, 16*sizeof(__u32));
> for (i = 0; i < 64; i++) {
> TEMP = W[i] ^ W[i+2] ^ W[i+8] ^ W[i+13];
> W[i+16] = rol32(TEMP, 1);
> @@ -768,7 +757,6 @@
> digest[4] += E;
>
> /* W is wiped by the caller */
> -#undef W
> }
>
> #undef f1
> @@ -780,6 +768,20 @@
> #undef K3
> #undef K4
>
> +/*
> + * sha_init: initialize the vectors for a SHA1 digest
> + *
> + * @buf: vector to initialize
> + */
> +static void sha_init(__u32 *buf)
> +{
> + buf[0] = 0x67452301;
> + buf[1] = 0xefcdab89;
> + buf[2] = 0x98badcfe;
> + buf[3] = 0x10325476;
> + buf[4] = 0xc3d2e1f0;
> +}
> +
> /*********************************************************************
> *
> * Entropy extraction routines
> @@ -870,13 +872,7 @@
> int i, x;
> __u32 data[16], buf[85];
>
> - /* Hash the pool to get the output */
> - buf[0] = 0x67452301;
> - buf[1] = 0xefcdab89;
> - buf[2] = 0x98badcfe;
> - buf[3] = 0x10325476;
> - buf[4] = 0xc3d2e1f0;
> -
> + sha_init(buf);
> /*
> * As we hash the pool, we mix intermediate values of
> * the hash back into the pool. This eliminates
> @@ -886,7 +882,7 @@
> * function can be inverted.
> */
> for (i = 0, x = 0; i < r->poolinfo->poolwords; i += 16, x+=2) {
> - sha_transform(buf, r->pool+i);
> + sha_transform(buf, (__u8 *)r->pool+i, buf + 5);
> add_entropy_words(r, &buf[x % 5], 1);
> }
>
> @@ -896,7 +892,7 @@
> * final time.
> */
> __add_entropy_words(r, &buf[x % 5], 1, data);
> - sha_transform(buf, data);
> + sha_transform(buf, (__u8 *)data, buf + 5);
>
> /*
> * In case the hash function has some recognizable
> @@ -1771,7 +1767,7 @@
> tmp[0]=saddr;
> tmp[1]=daddr;
> tmp[2]=(sport << 16) + dport;
> - sha_transform(tmp+16, tmp);
> + sha_transform(tmp+16, (__u8 *)tmp, tmp + 16 + 5);
> seq = tmp[17] + sseq + (count << COOKIEBITS);
>
> memcpy(tmp + 3, syncookie_secret[1], sizeof(syncookie_secret[1]));
> @@ -1779,7 +1775,7 @@
> tmp[1]=daddr;
> tmp[2]=(sport << 16) + dport;
> tmp[3] = count; /* minute counter */
> - sha_transform(tmp + 16, tmp);
> + sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
>
> /* Add in the second hash and the data */
> return seq + ((tmp[17] + data) & COOKIEMASK);
> @@ -1808,7 +1804,7 @@
> tmp[0]=saddr;
> tmp[1]=daddr;
> tmp[2]=(sport << 16) + dport;
> - sha_transform(tmp + 16, tmp);
> + sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
> cookie -= tmp[17] + sseq;
> /* Cookie is now reduced to (count * 2^24) ^ (hash % 2^24) */
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2005-01-25 21:12:24

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH 4/12] random pt4: Cleanup SHA interface

On Tue, Jan 25, 2005 at 10:49:01PM +0200, Denis Vlasenko wrote:
> On Friday 21 January 2005 23:41, Matt Mackall wrote:
> > +static void sha_transform(__u32 digest[5], const char *data, __u32 W[80])
> > {
> > - __u32 A, B, C, D, E; /* Local vars */
> > + __u32 A, B, C, D, E;
> > __u32 TEMP;
> > int i;
> > -#define W (digest + HASH_BUFFER_SIZE) /* Expanded data array */
> >
> > + memset(W, 0, sizeof(W));
>
> Hmm..... Parameter decays into pointer, sizeof(W) != 80*4. See:

Good spotting.

> In light of this, if your sha1 passed regression tests,
> I conclude that we don't need that memset at all.

Indeed, I noticed it was superfluous when I was benchmarking it. The
entire function is replaced a couple patches later.

--
Mathematics is the supreme nostalgia of our time.

2005-01-25 21:15:48

by Denis Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 1/12] random pt4: Create new rol32/ror32 bitops

On Friday 21 January 2005 23:41, Matt Mackall wrote:
> --- rnd2.orig/include/linux/bitops.h 2005-01-19 22:57:54.000000000 -0800
> +++ rnd2/include/linux/bitops.h 2005-01-20 09:20:24.641660815 -0800
> @@ -134,4 +134,26 @@
> return sizeof(w) == 4 ? generic_hweight32(w) : generic_hweight64(w);
> }
>
> +/*
> + * rol32 - rotate a 32-bit value left
> + *
> + * @word: value to rotate
> + * @shift: bits to roll
> + */
> +static inline __u32 rol32(__u32 word, int shift)
> +{
> + return (word << shift) | (word >> (32 - shift));
> +}
> +
> +/*
> + * ror32 - rotate a 32-bit value right
> + *
> + * @word: value to rotate
> + * @shift: bits to roll
> + */
> +static inline __u32 ror32(__u32 word, int shift)
> +{
> + return (word >> shift) | (word << (32 - shift));
> +}
> +
> #endif

gcc generates surprisingly good assembly for this,
I coudn't beat it for 32bit rotations. Cool!

So you are absolutely right here with not trying
to asm optimize it.

OTOH 64bit gcc rol is worse.
--
vda

2005-01-25 21:19:39

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH 6/12] random pt4: Replace SHA with faster version

On Tue, Jan 25, 2005 at 11:07:21PM +0200, Denis Vlasenko wrote:
> On Friday 21 January 2005 23:41, Matt Mackall wrote:
> > - * @W: 80 words of workspace
> > + * @W: 80 words of workspace, caller should clear
>
> Why?

Are you asking why should the caller clear or why should it be cleared at all?

For the first question, having the caller do the clear avoids
redundant clears on repeated calls.

For the second question, forward security. If an attacker breaks into
the box shortly after a secret is generated, he may be able to recover
the secret from such state.

--
Mathematics is the supreme nostalgia of our time.

2005-01-25 21:24:04

by Denis Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 6/12] random pt4: Replace SHA with faster version

On Friday 21 January 2005 23:41, Matt Mackall wrote:
> - * @W: 80 words of workspace
> + * @W: 80 words of workspace, caller should clear

Why?

> *
> * This function generates a SHA1 digest for a single. Be warned, it
> * does not handle padding and message digest, do not confuse it with
> * the full FIPS 180-1 digest algorithm for variable length messages.
> */
> -void sha_transform(__u32 *digest, const char *data, __u32 *W)
> +void sha_transform(__u32 *digest, const char *in, __u32 *W)
> {
> - __u32 A, B, C, D, E;
> - __u32 TEMP;
> - int i;
> + __u32 a, b, c, d, e, t, i;
>
> - memset(W, 0, sizeof(W));
> for (i = 0; i < 16; i++)
> - W[i] = be32_to_cpu(((const __u32 *)data)[i]);
> - /*
> - * Do the preliminary expansion of 16 to 80 words. Doing it
> - * out-of-line line this is faster than doing it in-line on
> - * register-starved machines like the x86, and not really any
> - * slower on real processors.
> - */
> - for (i = 0; i < 64; i++) {
> - TEMP = W[i] ^ W[i+2] ^ W[i+8] ^ W[i+13];
> - W[i+16] = rol32(TEMP, 1);
> + W[i] = be32_to_cpu(((const __u32 *)in)[i]);
> +
> + for (i = 0; i < 64; i++)
> + W[i+16] = rol32(W[i+13] ^ W[i+8] ^ W[i+2] ^ W[i], 1);
--
vda

2005-01-25 21:35:52

by Denis Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 6/12] random pt4: Replace SHA with faster version

On Tuesday 25 January 2005 23:14, Matt Mackall wrote:
> On Tue, Jan 25, 2005 at 11:07:21PM +0200, Denis Vlasenko wrote:
> > On Friday 21 January 2005 23:41, Matt Mackall wrote:
> > > - * @W: 80 words of workspace
> > > + * @W: 80 words of workspace, caller should clear
> >
> > Why?
>
> Are you asking why should the caller clear or why should it be cleared at all?
>
> For the first question, having the caller do the clear avoids
> redundant clears on repeated calls.
>
> For the second question, forward security. If an attacker breaks into
> the box shortly after a secret is generated, he may be able to recover
> the secret from such state.

Whoops. I understood a comment as 'caller should clear prior to use'
and this seems to be untrue (code overwrites entire W[80] vector
without using prior contents).

Now I think that you most probably meant 'caller should clear AFTER use'.
If so, comment should be amended.
--
vda

2005-01-25 21:53:46

by Matt Mackall

[permalink] [raw]
Subject: [PATCH] SHA1 clarify kerneldoc

On Tue, Jan 25, 2005 at 11:31:19PM +0200, Denis Vlasenko wrote:
> On Tuesday 25 January 2005 23:14, Matt Mackall wrote:
> > On Tue, Jan 25, 2005 at 11:07:21PM +0200, Denis Vlasenko wrote:
> > > On Friday 21 January 2005 23:41, Matt Mackall wrote:
> > > > - * @W: 80 words of workspace
> > > > + * @W: 80 words of workspace, caller should clear
> > >
> > > Why?
> >
> > Are you asking why should the caller clear or why should it be cleared at all?
> >
> > For the first question, having the caller do the clear avoids
> > redundant clears on repeated calls.
> >
> > For the second question, forward security. If an attacker breaks into
> > the box shortly after a secret is generated, he may be able to recover
> > the secret from such state.
>
> Whoops. I understood a comment as 'caller should clear prior to use'
> and this seems to be untrue (code overwrites entire W[80] vector
> without using prior contents).
>
> Now I think that you most probably meant 'caller should clear AFTER use'.
> If so, comment should be amended.

Indeed.

Index: rc2mm1/lib/sha1.c
===================================================================
--- rc2mm1.orig/lib/sha1.c 2005-01-25 09:31:59.000000000 -0800
+++ rc2mm1/lib/sha1.c 2005-01-25 13:48:34.000000000 -0800
@@ -25,11 +25,15 @@
*
* @digest: 160 bit digest to update
* @data: 512 bits of data to hash
- * @W: 80 words of workspace, caller should clear
+ * @W: 80 words of workspace (see note)
*
* This function generates a SHA1 digest for a single. Be warned, it
* does not handle padding and message digest, do not confuse it with
* the full FIPS 180-1 digest algorithm for variable length messages.
+ *
+ * Note: If the hash is security sensitive, the caller should be sure
+ * to clear the workspace. This is left to the caller to avoid
+ * unnecessary clears between chained hashing operations.
*/
void sha_transform(__u32 *digest, const char *in, __u32 *W)
{


--
Mathematics is the supreme nostalgia of our time.

2005-01-26 01:33:16

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH 7/12] random pt4: Update cryptolib to use SHA fro lib

On Fri, 2005-01-21 at 15:41 -0600, Matt Mackall wrote:
> * Copyright (c) Alan Smithee.
> * Copyright (c) Andrew McDonald <[email protected]>

Alan Smithee?

2005-01-26 01:42:47

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH 7/12] random pt4: Update cryptolib to use SHA fro lib

On Tue, Jan 25, 2005 at 08:33:11PM -0500, Lee Revell wrote:
> On Fri, 2005-01-21 at 15:41 -0600, Matt Mackall wrote:
> > * Copyright (c) Alan Smithee.
> > * Copyright (c) Andrew McDonald <[email protected]>
>
> Alan Smithee?

Aka anonymous contributor.

--
Mathematics is the supreme nostalgia of our time.

2005-01-27 18:21:59

by Bill Davidsen

[permalink] [raw]
Subject: Re: [PATCH] SHA1 clarify kerneldoc

Matt Mackall wrote:
> On Tue, Jan 25, 2005 at 11:31:19PM +0200, Denis Vlasenko wrote:
>
>>On Tuesday 25 January 2005 23:14, Matt Mackall wrote:
>>
>>>On Tue, Jan 25, 2005 at 11:07:21PM +0200, Denis Vlasenko wrote:
>>>
>>>>On Friday 21 January 2005 23:41, Matt Mackall wrote:
>>>>
>>>>>- * @W: 80 words of workspace
>>>>>+ * @W: 80 words of workspace, caller should clear
>>>>
>>>>Why?
>>>
>>>Are you asking why should the caller clear or why should it be cleared at all?
>>>
>>>For the first question, having the caller do the clear avoids
>>>redundant clears on repeated calls.
>>>
>>>For the second question, forward security. If an attacker breaks into
>>>the box shortly after a secret is generated, he may be able to recover
>>>the secret from such state.
>>
>>Whoops. I understood a comment as 'caller should clear prior to use'
>>and this seems to be untrue (code overwrites entire W[80] vector
>>without using prior contents).
>>
>>Now I think that you most probably meant 'caller should clear AFTER use'.
>>If so, comment should be amended.
>
>
> Indeed.
>
> Index: rc2mm1/lib/sha1.c
> ===================================================================
> --- rc2mm1.orig/lib/sha1.c 2005-01-25 09:31:59.000000000 -0800
> +++ rc2mm1/lib/sha1.c 2005-01-25 13:48:34.000000000 -0800
> @@ -25,11 +25,15 @@
> *
> * @digest: 160 bit digest to update
> * @data: 512 bits of data to hash
> - * @W: 80 words of workspace, caller should clear
> + * @W: 80 words of workspace (see note)
> *
> * This function generates a SHA1 digest for a single. Be warned, it
^^^^^^
Is this a term I don't know, "single" as a noun, or should "512 bit
block" follow, as it does in crypto/sha1 from rc1-bk1 which is what I
have handy?

> * does not handle padding and message digest, do not confuse it with
> * the full FIPS 180-1 digest algorithm for variable length messages.
> + *
> + * Note: If the hash is security sensitive, the caller should be sure
> + * to clear the workspace. This is left to the caller to avoid
> + * unnecessary clears between chained hashing operations.
> */
> void sha_transform(__u32 *digest, const char *in, __u32 *W)
> {
>
>


--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2005-01-27 19:28:41

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] SHA1 clarify kerneldoc

On Thu, Jan 27, 2005 at 01:22:28PM -0500, Bill Davidsen wrote:
> > *
> > * This function generates a SHA1 digest for a single. Be warned, it
> ^^^^^^
> Is this a term I don't know, "single" as a noun, or should "512 bit
> block" follow, as it does in crypto/sha1 from rc1-bk1 which is what I
> have handy?

That'll teach me to add comments.

Signed-off-by: Matt Mackall <[email protected]>

Index: rc2mm1/lib/sha1.c
===================================================================
--- rc2mm1.orig/lib/sha1.c 2005-01-27 11:24:23.000000000 -0800
+++ rc2mm1/lib/sha1.c 2005-01-27 11:25:19.000000000 -0800
@@ -27,9 +27,10 @@
* @data: 512 bits of data to hash
* @W: 80 words of workspace (see note)
*
- * This function generates a SHA1 digest for a single. Be warned, it
- * does not handle padding and message digest, do not confuse it with
- * the full FIPS 180-1 digest algorithm for variable length messages.
+ * This function generates a SHA1 digest for a single 512-bit block.
+ * Be warned, it does not handle padding and message digest, do not
+ * confuse it with the full FIPS 180-1 digest algorithm for variable
+ * length messages.
*
* Note: If the hash is security sensitive, the caller should be sure
* to clear the workspace. This is left to the caller to avoid


--
Mathematics is the supreme nostalgia of our time.