Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp309053ybt; Wed, 8 Jul 2020 00:04:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJybYyyEMeChT0EOKUe1duKF1UaGCeuEV9G3478hldcd/wYmu09cBv76U2F6ZUmKkVVZ5CC1 X-Received: by 2002:a17:907:1114:: with SMTP id qu20mr52367214ejb.41.1594191866765; Wed, 08 Jul 2020 00:04:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594191866; cv=none; d=google.com; s=arc-20160816; b=XowNqyaW7ft3+bvCDxYIphUoc7ZDk6zmnu9mXcKTXsnaublINYQOs7wklbTb3AhD6z 1bgBXhxNxtI2aq26cDSx2/i9gY3AqjrlAapxUxbYndl6UFkC9+dPsywZ7IGkgvRAv+H2 A7YabnvKzIAlYjbwm5PQMh0NXvBBM+otz+c8OWxmz4MGUL/AQY0sDsnzlfY5BxtQi2Wm Flsy7adVTKn0hnWPsj4gA6sLVncn76WbKPH6Rgd2vMxIrvmG8fe5VFpw1mI0SdO6ABjU JjhvLS1PDsNOdEH0RKfqz4lHU8H2CBM9K15RR8C1YPfdtrfT3Pftrk90yFJ3+Qtjbqva r09Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=FTx0Cxl6af2CPf1J1CJT1mhs3qDferaX1oYz3Nsmyzo=; b=Xh68hC7b/lXWzMS1ZPrKuWlSylDYchndNlJeY2PijRU319fjwqx2NiLonFBXrwRFaN /ieTfk0f7f7/FjUIa2zTUGLNaaKydd9slN/uHY6e6f6uRA2LkJX8BgM56TvTWmEj06Xf KMhWWC/BJT1z4btRjzD5QQ9mvd2m4TCDO90+hzldkiqdiahup7LoZhw9WAtHJOcLRf7I 5IJ9hk0CC+GLqFwCgatzM3GT+sV2jV5EmenD5pJvgerFnXTxeOLaIidfJ2YJXWdkfHJ7 aRnHH58/eRdnCvgp3hA6yfm4VDXj+V3tvaAxRByRPO+qjwXp7NVTB4bHjZ8XTOzpWBpP FTAQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b3si17485584edn.338.2020.07.08.00.04.01; Wed, 08 Jul 2020 00:04:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729727AbgGHHC3 (ORCPT + 99 others); Wed, 8 Jul 2020 03:02:29 -0400 Received: from sitav-80046.hsr.ch ([152.96.80.46]:50930 "EHLO mail.strongswan.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729667AbgGHHC3 (ORCPT ); Wed, 8 Jul 2020 03:02:29 -0400 X-Greylist: delayed 479 seconds by postgrey-1.27 at vger.kernel.org; Wed, 08 Jul 2020 03:02:28 EDT Received: from obook.home (unknown [IPv6:2a02:1205:507f:2dd0:218b:aeae:d903:4d0b]) by mail.strongswan.org (Postfix) with ESMTPSA id DD444401B3; Wed, 8 Jul 2020 08:54:27 +0200 (CEST) Message-ID: <29a9c7669048dfcf6e6b52f55bd70fa4d9c29523.camel@strongswan.org> Subject: Re: [v3 PATCH] crypto: chacha - Add DEFINE_CHACHA_STATE macro From: Martin Willi To: Ard Biesheuvel , Herbert Xu Cc: Eric Biggers , Linux Crypto Mailing List Date: Wed, 08 Jul 2020 08:54:27 +0200 In-Reply-To: References: <20200706133733.GA6479@gondor.apana.org.au> <20200706190717.GB736284@gmail.com> <20200706223716.GA10958@gondor.apana.org.au> <20200708023108.GK839@sol.localdomain> <20200708024402.GA10648@gondor.apana.org.au> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.3-0ubuntu1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org > > Also, I wonder if we shouldn't simply change the chacha code to use > > unaligned loads for the state array, as it likely makes very little > > difference in practice (the state is not accessed from inside the > > round processing loop) > > I am seeing a 0.25% slowdown on 1k blocks in the SSE3 code with the > change below: [...] > > AVX2 and AVX512 uses vbroadcasti128 with memory operands to load the > state, so they don't require any changes afaik. I agree. Moving SSE to use unaligned loads is certainly acceptable these days. Some AVX functions use vpbroadcastd with u32 load granularity anyway. Some use vbroadcasti128 that theoretically could (?) suffer somewhat when operating on unaligned data, but it I guess it won't justify all that alignment cruft. Regards, Martin