From: Eric Dumazet Subject: Re: sha512: make it work, undo percpu message schedule Date: Fri, 13 Jan 2012 11:41:41 +0100 Message-ID: <1326451301.2272.23.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> References: <20120111000040.GA3801@p183.telecom.by> <20120111003611.GA12257@gondor.apana.org.au> <20120112235514.GA5065@p183.telecom.by> <20120113070813.GA20068@gondor.apana.org.au> <1326450942.2272.20.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alexey Dobriyan , linux-crypto@vger.kernel.org, netdev@vger.kernel.org, ken@codelabs.ch, Steffen Klassert To: Herbert Xu Return-path: Received: from mail-wi0-f174.google.com ([209.85.212.174]:42736 "EHLO mail-wi0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752828Ab2AMKlo (ORCPT ); Fri, 13 Jan 2012 05:41:44 -0500 In-Reply-To: <1326450942.2272.20.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> Sender: linux-crypto-owner@vger.kernel.org List-ID: Le vendredi 13 janvier 2012 =C3=A0 11:35 +0100, Eric Dumazet a =C3=A9cr= it : > I wonder ... >=20 > With 4096 cpus, do we really want to reserve 5242880 bytes of memory = for > this function ? >=20 > What about following patch instead ? >=20 > (Trying a dynamic memory allocation, and fallback on a single > pre-allocated bloc of memory, shared by all cpus, protected by a > spinlock) >=20 > diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c > index 9ed9f60..5c80a76 100644 > --- a/crypto/sha512_generic.c > +++ b/crypto/sha512_generic.c > @@ -21,7 +21,6 @@ > #include > #include > =20 > -static DEFINE_PER_CPU(u64[80], msg_schedule); > =20 > static inline u64 Ch(u64 x, u64 y, u64 z) > { > @@ -87,10 +86,16 @@ static void > sha512_transform(u64 *state, const u8 *input) > { > u64 a, b, c, d, e, f, g, h, t1, t2; > - > + static u64 msg_schedule[80]; > + static DEFINE_SPINLOCK(msg_schedule_lock); > int i; > - u64 *W =3D get_cpu_var(msg_schedule); > + u64 *W =3D kzalloc(sizeof(msg_schedule), GFP_ATOMIC | __GFP_NOWARN)= ; > =20 And a plain kmalloc() is enough, since we fully initialize the array a bit later. for (i =3D 0; i < 16; i++) LOAD_OP(i, W, input); for (i =3D 16; i < 80; i++) { BLEND_OP(i, W); }