From: Mathias Krause Subject: Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64 Date: Sun, 14 Aug 2011 21:06:43 +0200 Message-ID: References: <1311529994-7924-1-git-send-email-minipli@googlemail.com> <1311529994-7924-3-git-send-email-minipli@googlemail.com> <20110804064436.GA16247@gondor.apana.org.au> <54B2EB610B7F1340BB6A0D4CA04A4F10013EFF76B1@orsmsx505.amr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT Cc: Herbert Xu , "David S. Miller" , "linux-crypto@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Andrew Lutomirski To: "Locktyukhin, Maxim" Return-path: Received: from mail-vx0-f174.google.com ([209.85.220.174]:56044 "EHLO mail-vx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754653Ab1HNTGo convert rfc822-to-8bit (ORCPT ); Sun, 14 Aug 2011 15:06:44 -0400 In-Reply-To: <54B2EB610B7F1340BB6A0D4CA04A4F10013EFF76B1@orsmsx505.amr.corp.intel.com> Sender: linux-crypto-owner@vger.kernel.org List-ID: Hi Max, 2011/8/8 Locktyukhin, Maxim : > I'd like to note that at Intel we very much appreciate Mathias effort to port/integrate this implementation into Linux kernel! > > > $0.02 re tcrypt perf numbers below: I believe something must be terribly broken with the tcrypt measurements > > 20 (and more) cycles per byte shown below are not reasonable numbers for SHA-1 - ~6 c/b (as can be seen in some of the results for Core2) is the expected results ... so, while relative improvement seen is sort of consistent, the absolute performance numbers are very much off (and yes Sandy Bridge on AVX code is expected to be faster than Core2/SSSE3 - ~5.2 c/b vs. ~5.8 c/b on the level of the sha1_update() call to me more precise) > > this does not affect the proposed patch in any way, it looks like tcrypt's timing problem to me - I'd even venture a guess that it may be due to the use of RDTSC (that gets affected significantly by Turbo/EIST, TSC is isotropic in time but not with the core clock domain, i.e. RDTSC cannot be used to measure core cycles without at least disabling EIST and Turbo, or doing runtime adjustment of actual bus/core clock ratio vs. the standard ratio always used by TSC - I could elaborate more if someone is interested) I found the Sandy Bridge numbers odd too but suspected, it might be because of the laptop platform. The SSSE3 numbers on this platform were slightly lower than the AVX numbers and that for still way off the ones for the Core2 system. But your explanation fits well, too. It might be EIST or Turbo mode that tampered with the numbers. Another, maybe more likely point might be the overhead Andy mentioned. > thanks again, > -Max > Mathias