Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp2327925rwr; Fri, 28 Apr 2023 08:58:04 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6+o7krcD56Y1LO3P7M10O9masdZvh1ocOicMChLhYFdg2fFHJmnxlC7xCvRDi2Q/osb3Sk X-Received: by 2002:a05:6a20:6a26:b0:f0:2d4a:8855 with SMTP id p38-20020a056a206a2600b000f02d4a8855mr7390321pzk.8.1682697484551; Fri, 28 Apr 2023 08:58:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682697484; cv=none; d=google.com; s=arc-20160816; b=fH7mzP/Rmd7usG9Lgh3ZAoD2g65SlzglHxrzKhOXwJVG45PMH37nuDy/xZdud6bI0j DrxkDWrRd0gH98PyakPEyk5Enpv/f4vFKQ6PhsqG73Y3L0ZtaEFJjbPeOoKRmhSQpdbl eKB19SGRclfE3eEQGIH33SKkyDrn87C8bi+oF0/BbNDxDd5qYG/07ct8pMPOl9nUezt0 yxqwkeohGpt/ATkGCJQ7fbTq9HfuK1hJ9ps8CM4IOe7+9mkDmX0MrksszDxHL9cUnoju 3odjaj7jzGYbbaJPJgLswMYO0k4MEdYc1tewUG9QqgeuSivZpmu1QhYoR/Fd1wjwk84Y kfig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=ile/dhQ4swm5+9MPyHm+7QU0ip9RAZs48wTm5tohuUY=; b=ovxx2pHflNuicU+WpI3sOGUoT4C+hCHjynCBTimq1UCOjrIIvMRSOOxJ/gEbVFlqgz 8F9fxVOGJdN1KBI1ofZnHZx6gi5efehgGbI2yVY2RmEpWGlmEi8m0pgDBsIH+xrFuDBl GD8QxLaNI/lZ103cTGeOy757LcTu9gdBvQ2O40nPoKNkD7pAL9me9GzgrWQDegIK+n7M jW55MUOyt/zVdvnt0rlpSsNXo4UbARRu67k7LoaG7gpfNS5uAginJLp+VXlfxLZ5Y41C DhX43C+u94a7oaq2O5jd/fJWg0cUfm8lIHn9xAcIkQoVWQiJPP/5xRDNSH6Sb+GFW5xA TRSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ub4vmxhg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 78-20020a630151000000b005287f5fbf5esi8249903pgb.254.2023.04.28.08.57.53; Fri, 28 Apr 2023 08:58:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ub4vmxhg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229996AbjD1Py6 (ORCPT + 99 others); Fri, 28 Apr 2023 11:54:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345910AbjD1Pyy (ORCPT ); Fri, 28 Apr 2023 11:54:54 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D45412B; Fri, 28 Apr 2023 08:54:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682697292; x=1714233292; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=h6uIyeYL79bSTSpFqsQE6YehbS58Bqy5B/88nYriI/E=; b=Ub4vmxhgA2SVT3iBPaVn8bFbtRR0hdKfSlYqok/8R3WsdqGpYaTHHJih 6yeCHwV1pFmHjL5DcyPkVRguQILiJomDmldVQIhA++BaNedIzolZ6QEd2 xgRwIrjR6pThPkcQFA4lFXgbu+2U+qaKNQsnvoA+vSGcm23TSP2WuDt2C DQj5iwYQCLBPgDkyK84EvSS9F+9eE0WCcC2+RmvJWgMRdGWYlC6+qoW2V Wg3ycl5sbdZqAdne8LzNM0ndrFsbM5m/a0jB2/9/M491FpOK8CEPmf0uJ Q/+fnNzi7I+5lkG7UMYFiFRwzCMpWWX1SEWWvWb4IXREPasPEHMVILLVh Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10694"; a="327404521" X-IronPort-AV: E=Sophos;i="5.99,235,1677571200"; d="scan'208";a="327404521" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2023 08:54:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10694"; a="869243891" X-IronPort-AV: E=Sophos;i="5.99,235,1677571200"; d="scan'208";a="869243891" Received: from scha1-mobl1.amr.corp.intel.com (HELO [10.212.154.191]) ([10.212.154.191]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2023 08:54:51 -0700 Message-ID: Date: Fri, 28 Apr 2023 08:54:51 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH 3/3] crypto: LEA block cipher AVX2 optimization Content-Language: en-US To: Dongsoo Lee , linux-crypto@vger.kernel.org Cc: Herbert Xu , "David S. Miller" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, "David S. Miller" , Dongsoo Lee References: <20230428110058.1516119-1-letrhee@nsr.re.kr> <20230428110058.1516119-4-letrhee@nsr.re.kr> From: Dave Hansen In-Reply-To: <20230428110058.1516119-4-letrhee@nsr.re.kr> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > +config CRYPTO_LEA_AVX2 > + tristate "Ciphers: LEA with modes: ECB, CBC, CTR, XTS (SSE2/MOVBE/AVX2)" > + select CRYPTO_LEA > + imply CRYPTO_XTS > + imply CRYPTO_CTR > + help > + LEA cipher algorithm (KS X 3246, ISO/IEC 29192-2:2019) > + > + LEA is one of the standard cryptographic alorithms of > + the Republic of Korea. It consists of four 32bit word. The "four 32bit word" thing is probably not a detail end users care about enough to see in Kconfig text. > + See: > + https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do > + > + Architecture: x86_64 using: > + - SSE2 (Streaming SIMD Extensions 2) > + - MOVBE (Move Data After Swapping Bytes) > + - AVX2 (Advanced Vector Extensions) What about i386? If this is truly 64-bit-only for some reason, it's not reflected anywhere that I can see, like having a: depends on X86_64 I'm also a _bit_ confused why this has one config option called "_AVX2" but that also includes the SSE2 implementation. > + Processes 4(SSE2), 8(AVX2) blocks in parallel. > + In CTR mode, the MOVBE instruction is utilized for improved performance. > + > config CRYPTO_CHACHA20_X86_64 > tristate "Ciphers: ChaCha20, XChaCha20, XChaCha12 (SSSE3/AVX2/AVX-512VL)" > depends on X86 && 64BIT > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile > index 9aa46093c91b..de23293b88df 100644 > --- a/arch/x86/crypto/Makefile > +++ b/arch/x86/crypto/Makefile > @@ -109,6 +109,9 @@ aria-aesni-avx2-x86_64-y := aria-aesni-avx2-asm_64.o aria_aesni_avx2_glue.o > obj-$(CONFIG_CRYPTO_ARIA_GFNI_AVX512_X86_64) += aria-gfni-avx512-x86_64.o > aria-gfni-avx512-x86_64-y := aria-gfni-avx512-asm_64.o aria_gfni_avx512_glue.o > > +obj-$(CONFIG_CRYPTO_LEA_AVX2) += lea-avx2-x86_64.o > +lea-avx2-x86_64-y := lea_avx2_x86_64-asm.o lea_avx2_glue.o > + > quiet_cmd_perlasm = PERLASM $@ > cmd_perlasm = $(PERL) $< > $@ > $(obj)/%.S: $(src)/%.pl FORCE > diff --git a/arch/x86/crypto/lea_avx2_glue.c b/arch/x86/crypto/lea_avx2_glue.c > new file mode 100644 > index 000000000000..532958d3caa5 > --- /dev/null > +++ b/arch/x86/crypto/lea_avx2_glue.c > @@ -0,0 +1,1112 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/* > + * Glue Code for the SSE2/MOVBE/AVX2 assembler instructions for the LEA Cipher > + * > + * Copyright (c) 2023 National Security Research. > + * Author: Dongsoo Lee > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include "ecb_cbc_helpers.h" > + > +#define SIMD_KEY_ALIGN 16 > +#define SIMD_ALIGN_ATTR __aligned(SIMD_KEY_ALIGN) > + > +struct lea_xts_ctx { > + u8 raw_crypt_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR; > + u8 raw_tweak_ctx[sizeof(struct crypto_lea_ctx)] SIMD_ALIGN_ATTR; > +}; The typing here is a bit goofy. What's wrong with: struct lea_xts_ctx { struct crypto_lea_ctx crypt_ctx SIMD_ALIGN_ATTR; struct crypto_lea_ctx lea_ctx SIMD_ALIGN_ATTR; }; ? You end up with the same sized structure but you don't have to cast it as much. > +struct _lea_u128 { > + u64 v0, v1; > +}; > + > +static inline void xor_1blk(u8 *out, const u8 *in1, const u8 *in2) > +{ > + const struct _lea_u128 *_in1 = (const struct _lea_u128 *)in1; > + const struct _lea_u128 *_in2 = (const struct _lea_u128 *)in2; > + struct _lea_u128 *_out = (struct _lea_u128 *)out; > + > + _out->v0 = _in1->v0 ^ _in2->v0; > + _out->v1 = _in1->v1 ^ _in2->v1; > +} > + > +static inline void xts_next_tweak(u8 *out, const u8 *in) > +{ > + const u64 *_in = (const u64 *)in; > + u64 *_out = (u64 *)out; > + u64 v0 = _in[0]; > + u64 v1 = _in[1]; > + u64 carry = (u64)(((s64)v1) >> 63); > + > + v1 = (v1 << 1) ^ (v0 >> 63); > + v0 = (v0 << 1) ^ ((u64)carry & 0x87); > + > + _out[0] = v0; > + _out[1] = v1; > +} I don't really care either way, but it's interesting that in two adjacent functions this deals with two adjacent 64-bit values. In one it defines a structure with two u64's and in the next it treats it as an array. > +static int xts_encrypt_8way(struct skcipher_request *req) > +{ ... It's kinda a shame that there isn't more code shared here between, for instance the 4way and 8way functions. But I guess this crypto code tends to be merged and then very rarely fixed up after. > +static int xts_lea_set_key(struct crypto_skcipher *tfm, const u8 *key, > + u32 keylen) > +{ > + struct crypto_tfm *tfm_ctx = crypto_skcipher_ctx(tfm); > + struct lea_xts_ctx *ctx = crypto_tfm_ctx(tfm_ctx); > + > + struct crypto_lea_ctx *crypt_key = > + (struct crypto_lea_ctx *)(ctx->raw_crypt_ctx); > + struct crypto_lea_ctx *tweak_key = > + (struct crypto_lea_ctx *)(ctx->raw_tweak_ctx); These were those goofy casts that can go away if the typing is a bit more careful ... > +static struct simd_skcipher_alg *lea_simd_algs[ARRAY_SIZE(lea_simd_avx2_algs)]; > + > +static int __init crypto_lea_avx2_init(void) > +{ > + const char *feature_name; > + > + if (!boot_cpu_has(X86_FEATURE_XMM2)) { > + pr_info("SSE2 instructions are not detected.\n"); > + return -ENODEV; > + } > + > + if (!boot_cpu_has(X86_FEATURE_MOVBE)) { > + pr_info("MOVBE instructions are not detected.\n"); > + return -ENODEV; > + } > + > + if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_AVX)) { > + pr_info("AVX2 instructions are not detected.\n"); > + return -ENODEV; > + } > + > + if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, > + &feature_name)) { > + pr_info("CPU feature '%s' is not supported.\n", feature_name); > + return -ENODEV; > + } This looks suspect. It requires that *ALL* of XMM2, MOVBE, AVX, AVX2 and XSAVE support for *ANY* of these to be used. In other cipher code that I've seen, it separates out the AVX/YMM acceleration from the pure SSE2/XMM acceleration functions so that CPUs with only SSE2 can still benefit. Either this is wrong, or there is something subtle going on that I'm missing.