Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp40101pxb; Tue, 23 Feb 2021 17:21:25 -0800 (PST) X-Google-Smtp-Source: ABdhPJwKKN1VmvD1jFhmq1FvehOSwd2vK+XOVHcYKnUBszPKsNQovBS+1anPqmtiSr1aTzl2sdpV X-Received: by 2002:a17:906:2816:: with SMTP id r22mr28309409ejc.2.1614129685144; Tue, 23 Feb 2021 17:21:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614129685; cv=none; d=google.com; s=arc-20160816; b=o+vejXOvgTtrQ8MKL22booVq+gOaJK0BGwo7UzJWQziEdyzvOjMe0mdc8J9bWokXFx HVJ1QBPH23GrdIgBXgh0Wp783Tbd+YRJ4BEtW1WpJJ1uiKYt/KJiLS601FQaLKPO1Bx8 Vc9659YrkfDnwd4PIG2zC/mFLJfJKId4vXiPDpyHt1RdTfoTfCJrW5sxso5tsDt5mfdk 7d0WvQT4hmeIB6VgTbo5reGiGc0vGu7e8NgeVUUipaPaTTipnjsm2Gj7cepzTS3EC086 zmxqCwvf1lBMr3W8AYJJMp7k2uen1PeCZa74L8ipoRJz6z2os8l6wUE8rWYbJsYcUtwN eUMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=ztV2jGnd+ln++1JCdCzV+K0fa3o5glJpFVtgMqvRjMg=; b=mEiFbfrigDmScUDl2o194WB5e/Pn6NvAPUOMv0YoYw8pe+sgJlXh0BPEJXKXv+Bsnt aCGj+4CBPaZa9LuZFkylmRtwlTvi8EpP0gfmYZLZyZjX942i5etVpWerTRrMzXNVCcTg PVqFMzybHqi30dBEf3CfGxyhzDDuRDm5EJe5n4/pwTX3J2OCDNzrWoin1UBjOYICzlkl fCPHACOmMOoy+f9HkPeuoV4IGRpWjp8km7HcWc+UFQVJD4LFxLzmiHxXDfK3uzDkWgSe Awsw/Q84105hkgj5ZlglszLwYOeMMhnEHCarNa6GWg8/X9HiBHS9U1jgeN0yjG7+ce2c DePA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t8si119232edy.573.2021.02.23.17.20.50; Tue, 23 Feb 2021 17:21:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232608AbhBXBUM (ORCPT + 99 others); Tue, 23 Feb 2021 20:20:12 -0500 Received: from mga03.intel.com ([134.134.136.65]:36027 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233855AbhBXAzQ (ORCPT ); Tue, 23 Feb 2021 19:55:16 -0500 IronPort-SDR: ESR6VY9aawAZNg4Rcii4yAIgoAIudOqEqozCl2o9TWq8p8uQlNP/BzLZgK3lq+lWI8Q4Cylqgi EboSKthiGTGQ== X-IronPort-AV: E=McAfee;i="6000,8403,9904"; a="185055576" X-IronPort-AV: E=Sophos;i="5.81,201,1610438400"; d="scan'208";a="185055576" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2021 16:54:39 -0800 IronPort-SDR: YCu6Y4GV//GLZPXt6iFPHIg4O+XVjjQOgR4ESuY6aBK0isXLej+G34VoWvhTGWUFlj/BxmcGeH chujS+CRxwNA== X-IronPort-AV: E=Sophos;i="5.81,201,1610438400"; d="scan'208";a="433011917" Received: from meghadey-mobl1.amr.corp.intel.com (HELO [10.212.20.123]) ([10.212.20.123]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2021 16:54:37 -0800 Subject: Re: [RFC V2 0/5] Introduce AVX512 optimized crypto algorithms To: Andy Lutomirski , Tony Luck , Asit K Mallick , "H. Peter Anvin" Cc: Linux Crypto Mailing List , Herbert Xu , "David S. Miller" , "Ravi V. Shankar" , "Chen, Tim C" , "Kleen, Andi" , Dave Hansen , greg.b.tucker@intel.com, "Kasten, Robert A" , rajendrakumar.chinnaiyan@intel.com, tomasz.kantecki@intel.com, ryan.d.saffores@intel.com, ilya.albrekht@intel.com, Kyung Min Park , Weiny Ira , Eric Biggers , Ard Biesheuvel , X86 ML References: <1611386920-28579-1-git-send-email-megha.dey@intel.com> From: "Dey, Megha" Message-ID: <3878af8d-ac1e-522a-7c9f-fda4a1f5b967@intel.com> Date: Tue, 23 Feb 2021 16:54:31 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Hi Andy, On 1/24/2021 8:23 AM, Andy Lutomirski wrote: > On Fri, Jan 22, 2021 at 11:29 PM Megha Dey wrote: >> Optimize crypto algorithms using AVX512 instructions - VAES and VPCLMULQDQ >> (first implemented on Intel's Icelake client and Xeon CPUs). >> >> These algorithms take advantage of the AVX512 registers to keep the CPU >> busy and increase memory bandwidth utilization. They provide substantial >> (2-10x) improvements over existing crypto algorithms when update data size >> is greater than 128 bytes and do not have any significant impact when used >> on small amounts of data. >> >> However, these algorithms may also incur a frequency penalty and cause >> collateral damage to other workloads running on the same core(co-scheduled >> threads). These frequency drops are also known as bin drops where 1 bin >> drop is around 100MHz. With the SpecCPU and ffmpeg benchmark, a 0-1 bin >> drop(0-100MHz) is observed on Icelake desktop and 0-2 bin drops (0-200Mhz) >> are observed on the Icelake server. >> >> The AVX512 optimization are disabled by default to avoid impact on other >> workloads. In order to use these optimized algorithms: >> 1. At compile time: >> a. User must enable CONFIG_CRYPTO_AVX512 option >> b. Toolchain(assembler) must support VPCLMULQDQ and VAES instructions >> 2. At run time: >> a. User must set module parameter use_avx512 at boot time >> b. Platform must support VPCLMULQDQ and VAES features >> >> N.B. It is unclear whether these coarse grain controls(global module >> parameter) would meet all user needs. Perhaps some per-thread control might >> be useful? Looking for guidance here. > > I've just been looking at some performance issues with in-kernel AVX, > and I have a whole pile of questions that I think should be answered > first: > > What is the impact of using an AVX-512 instruction on the logical > thread, its siblings, and other cores on the package? > > Does the impact depend on whether it’s a 512-bit insn or a shorter EVEX insn? > > What is the impact on subsequent shorter EVEX, VEX, and legacy > SSE(2,3, etc) insns? > > How does VZEROUPPER figure in? I can find an enormous amount of > misinformation online, but nothing authoritative. > > What is the effect of the AVX-512 states (5-7) being “in use”? As far > as I can tell, the only operations that clear XINUSE[5-7] are XRSTOR > and its variants. Is this correct? > > On AVX-512 capable CPUs, do we ever get a penalty for executing a > non-VEX insn followed by a large-width EVEX insn without an > intervening VZEROUPPER? The docs suggest no, since Broadwell and > before don’t support EVEX, but I’d like to know for sure. > > > My current opinion is that we should not enable AVX-512 in-kernel > except on CPUs that we determine have good AVX-512 support. Based on > some reading, that seems to mean Ice Lake Client and not anything > before it. I also think a bunch of the above questions should be > answered before we do any of this. Right now we have a regression of > unknown impact in regular AVX support in-kernel, we will have > performance issues in-kernel depending on what user code has done > recently, and I'm still trying to figure out what to do about it. > Throwing AVX-512 into the mix without real information is not going to > improve the situation. We are currently working on providing you with answers on the questions you have raised regarding AVX. Thanks, Megha