From: Alexey Dobriyan Subject: Re: sha512: make it work, undo percpu message schedule Date: Sat, 14 Jan 2012 21:20:24 +0300 Message-ID: <20120114182024.GA4207@p183.telecom.by> References: <20120111000040.GA3801@p183.telecom.by> <20120111003611.GA12257@gondor.apana.org.au> <20120112235514.GA5065@p183.telecom.by> <20120113070813.GA20068@gondor.apana.org.au> <1326450942.2272.20.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <1326451301.2272.23.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <1326452246.2272.31.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <1326458053.3826.4.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Herbert Xu , linux-crypto@vger.kernel.org, netdev@vger.kernel.org, ken@codelabs.ch, Steffen Klassert To: Eric Dumazet Return-path: Received: from mail-bk0-f46.google.com ([209.85.214.46]:33794 "EHLO mail-bk0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754778Ab2ANSUb (ORCPT ); Sat, 14 Jan 2012 13:20:31 -0500 Content-Disposition: inline In-Reply-To: <1326458053.3826.4.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> Sender: linux-crypto-owner@vger.kernel.org List-ID: On Fri, Jan 13, 2012 at 01:34:13PM +0100, Eric Dumazet wrote: > Le vendredi 13 janvier 2012 =E0 13:33 +0200, Alexey Dobriyan a =E9cri= t : > > On 1/13/12, Eric Dumazet wrote: > >=20 > > > + static u64 msg_schedule[80]; > > > + static DEFINE_SPINLOCK(msg_schedule_lock); > >=20 > > No guys, no. > >=20 > > SHA-512 only needs u64[16] running window for message scheduling. > >=20 > > I'm sending whitespace mangled patch which is only tested > > with selfcryptotest passed, so you won't apply something complex. > >=20 > > Stackspace usage drops down to like this: > >=20 > > -139: 48 81 ec c8 02 00 00 sub $0x2c8,%rsp > > +136: 48 81 ec 18 01 00 00 sub $0x118,%rsp > >=20 > > --- a/crypto/sha512_generic.c > > +++ b/crypto/sha512_generic.c > > @@ -21,8 +21,6 @@ > > #include > > #include > >=20 > > -static DEFINE_PER_CPU(u64[80], msg_schedule); > > - > > static inline u64 Ch(u64 x, u64 y, u64 z) > > { > > return z ^ (x & (y ^ z)); > > @@ -80,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u= 8 *input) > >=20 > > static inline void BLEND_OP(int I, u64 *W) > > { > > - W[I] =3D s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16]; > > + W[I%16] =3D s1(W[(I-2)%16]) + W[(I-7)%16] + s0(W[(I-15)%16]) + W[= I%16]; > > } > >=20 > > static void > > @@ -89,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input) > > u64 a, b, c, d, e, f, g, h, t1, t2; > >=20 > > int i; > > - u64 *W =3D get_cpu_var(msg_schedule); > > + u64 W[16]; > >=20 > > /* load the input */ > > for (i =3D 0; i < 16; i++) > > LOAD_OP(i, W, input); > >=20 > > - for (i =3D 16; i < 80; i++) { > > - BLEND_OP(i, W); > > - } > > - > > /* load the state into our registers */ > > a=3Dstate[0]; b=3Dstate[1]; c=3Dstate[2]; d=3Dstate[3]; > > e=3Dstate[4]; f=3Dstate[5]; g=3Dstate[6]; h=3Dstate[7]; > >=20 > > - /* now iterate */ > > - for (i=3D0; i<80; i+=3D8) { > > - t1 =3D h + e1(e) + Ch(e,f,g) + sha512_K[i ] + W[i ]; > > - t2 =3D e0(a) + Maj(a,b,c); d+=3Dt1; h=3Dt1+t2; > > - t1 =3D g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1]; > > - t2 =3D e0(h) + Maj(h,a,b); c+=3Dt1; g=3Dt1+t2; > > - t1 =3D f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2]; > > - t2 =3D e0(g) + Maj(g,h,a); b+=3Dt1; f=3Dt1+t2; > > - t1 =3D e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3]; > > - t2 =3D e0(f) + Maj(f,g,h); a+=3Dt1; e=3Dt1+t2; > > - t1 =3D d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4]; > > - t2 =3D e0(e) + Maj(e,f,g); h+=3Dt1; d=3Dt1+t2; > > - t1 =3D c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5]; > > - t2 =3D e0(d) + Maj(d,e,f); g+=3Dt1; c=3Dt1+t2; > > - t1 =3D b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6]; > > - t2 =3D e0(c) + Maj(c,d,e); f+=3Dt1; b=3Dt1+t2; > > - t1 =3D a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[i+7]; > > - t2 =3D e0(b) + Maj(b,c,d); e+=3Dt1; a=3Dt1+t2; > > +#define SHA512_0_15(i, a, b, c, d, e, f, g, h) \ > > + t1 =3D h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i]; \ > > + t2 =3D e0(a) + Maj(a,b,c); \ > > + d +=3D t1; \ > > + h =3D t1 + t2 > > + > > +#define SHA512_16_79(i, a, b, c, d, e, f, g, h) \ > > + BLEND_OP(i, W); \ > > + t1 =3D h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)%16]; \ > > + t2 =3D e0(a) + Maj(a,b,c); \ > > + d +=3D t1; \ > > + h =3D t1 + t2 > > + > > + for (i =3D 0; i < 16; i +=3D 8) { > > + SHA512_0_15(i, a, b, c, d, e, f, g, h); > > + SHA512_0_15(i + 1, h, a, b, c, d, e, f, g); > > + SHA512_0_15(i + 2, g, h, a, b, c, d, e, f); > > + SHA512_0_15(i + 3, f, g, h, a, b, c, d, e); > > + SHA512_0_15(i + 4, e, f, g, h, a, b, c, d); > > + SHA512_0_15(i + 5, d, e, f, g, h, a, b, c); > > + SHA512_0_15(i + 6, c, d, e, f, g, h, a, b); > > + SHA512_0_15(i + 7, b, c, d, e, f, g, h, a); > > + } > > + for (i =3D 16; i < 80; i +=3D 8) { > > + SHA512_16_79(i, a, b, c, d, e, f, g, h); > > + SHA512_16_79(i + 1, h, a, b, c, d, e, f, g); > > + SHA512_16_79(i + 2, g, h, a, b, c, d, e, f); > > + SHA512_16_79(i + 3, f, g, h, a, b, c, d, e); > > + SHA512_16_79(i + 4, e, f, g, h, a, b, c, d); > > + SHA512_16_79(i + 5, d, e, f, g, h, a, b, c); > > + SHA512_16_79(i + 6, c, d, e, f, g, h, a, b); > > + SHA512_16_79(i + 7, b, c, d, e, f, g, h, a); > > } > >=20 > > state[0] +=3D a; state[1] +=3D b; state[2] +=3D c; state[3] +=3D = d; > > @@ -128,8 +136,6 @@ sha512_transform(u64 *state, const u8 *input) > >=20 > > /* erase our data */ > > a =3D b =3D c =3D d =3D e =3D f =3D g =3D h =3D t1 =3D t2 =3D 0; > > - memset(W, 0, sizeof(__get_cpu_var(msg_schedule))); > > - put_cpu_var(msg_schedule); > > } > >=20 > > static int >=20 >=20 > Even if its true, its not stable material. >=20 > stable teams want obvious patches. I understand that. But it _is_ obvious if you see what macro does and original code and read FIPS 180-2 6.3.2 "SHA-512 Hash Computation" from where it is obvious that W[i] depends on W[i - 16] and doesn't depend on earlier values. This stack reduction patch completely fixes original bug and doesn't show any AH errors. Given the nature of crypto code where one bit mistake ruins everything, it should be enough.