Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2479554pxb; Sun, 24 Jan 2021 08:25:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJwpLeOmnypcYcxAnw7c38eehkPqqN4R2w6L3DIw8raRrbyG/xLKRAUsux8RtKayVlJA/siu X-Received: by 2002:a17:906:1c0a:: with SMTP id k10mr1282612ejg.138.1611505538093; Sun, 24 Jan 2021 08:25:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611505538; cv=none; d=google.com; s=arc-20160816; b=FQO3Hg5twIaH2pVZZ6Y5QoxEqI3aPO5rrPhfU/PhqVZ7Oy4E2NXX0mix6EYyEWqGFe EEgMj9/aS1Hr/W+8kdtX4SpFULvjb0soal5MLBDN/f8iiCIRQ897gyUDB35UHD+Gj7BX aWD/IUANws/trXkXaq1xZxFnm/F9Uq/MvtL8s/w9DAb2DxQq0SuH9LgupfnYEN2UmsUQ /cu2Zu1IqKrBey+caV90eHfxeUq8PbZBkgetlMAZzD/aeg13siiMAMPOMIwPgDFrAvrM PTQvYRgmijAp6NBMp4x8AnA7Lu6ntDyn4HcM5Kx3BSR5f45XrLSs1fB53+NnxaLup6zj LDzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ZYjUU+KCMJvUlKQsANhJ7wnbtBQaTW4MPokMLLVx/Zs=; b=SgEYLxgP5lFXK527CEGQsiqb/zZoIT4pfmwxifvV7r59dJQo4bMWkehXi8LCqRW3On nlT7xguc4VFLM18KkW2e+QVr3vb71FLAIYlvDz6X2282mMLa4Eok6AnnIJM3sk97hirL 3/UHFcQe57ZfQrHY9miZ7s7Y1WHgXc1WbDVvEzskH6hdML2kIhAyT5i3w3gxUxMqJBVL hUAWFJ1LbJmFYezXLWIXsjkNFfoCGnicAudc9i4y5LdKxgAidZvu1v47D3wcg94z8K7Q Bs3i7/mRAgaUntMzSdqJRfDimwOgbcIFx06aJ6lMUn5mv9nPYjt+6WasEBg5zggFqCWC 5CBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=JUzin1qa; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n2si6327640edi.25.2021.01.24.08.25.07; Sun, 24 Jan 2021 08:25:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=JUzin1qa; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726290AbhAXQYy (ORCPT + 99 others); Sun, 24 Jan 2021 11:24:54 -0500 Received: from mail.kernel.org ([198.145.29.99]:43698 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726086AbhAXQYx (ORCPT ); Sun, 24 Jan 2021 11:24:53 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 4C4AB22B48 for ; Sun, 24 Jan 2021 16:24:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611505450; bh=1TatI210WJgZj/jDJkhx6YRQcHibA+56Kuz+2iQjaQo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=JUzin1qaagQSdOCe0saIEC6Lcg6I6TB8FRLkwymOZAzITTK4/ND7zDqlIXbysyr3r njjVd/Am+NYDRQ3cwU8C1UOTFiziSahKWBu30zeVgTLnzo5ke/0um7dGingZKbcFzn qbD+s+kmdOA2gBgwsxFsZrWjufxKdMjRsPCvy2K2e0oD5F2x1D6nhHacYxKL0v2Lmy mvFm0wf5XRh7EK4WfzNWKqb0D45VQ122kJHoBFiNb+qIcMhMkJFyXPeTOXdKronCVB PngYcc1oIINwKSIa1SdRXmI/kXr6/Ddi25X4MCwzpObYA92oItcXwjstgnvE20H3l4 hCG4MdCVdU08A== Received: by mail-ed1-f51.google.com with SMTP id n6so12310552edt.10 for ; Sun, 24 Jan 2021 08:24:10 -0800 (PST) X-Gm-Message-State: AOAM533i0MxUmnSulhbnHbNU4qEy9n8ZGN92mFltDxX4wwTfZ2AosnQD pGRQrnNfHSgDRmIZ1/zLTm7I6BX4mE2JmV2wtnlbpw== X-Received: by 2002:aa7:d4d2:: with SMTP id t18mr395846edr.238.1611505448819; Sun, 24 Jan 2021 08:24:08 -0800 (PST) MIME-Version: 1.0 References: <1611386920-28579-1-git-send-email-megha.dey@intel.com> In-Reply-To: <1611386920-28579-1-git-send-email-megha.dey@intel.com> From: Andy Lutomirski Date: Sun, 24 Jan 2021 08:23:57 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC V2 0/5] Introduce AVX512 optimized crypto algorithms To: Megha Dey , Tony Luck , Asit K Mallick , "H. Peter Anvin" Cc: Linux Crypto Mailing List , Herbert Xu , "David S. Miller" , "Ravi V. Shankar" , "Chen, Tim C" , "Kleen, Andi" , Dave Hansen , greg.b.tucker@intel.com, "Kasten, Robert A" , rajendrakumar.chinnaiyan@intel.com, tomasz.kantecki@intel.com, ryan.d.saffores@intel.com, ilya.albrekht@intel.com, Kyung Min Park , Weiny Ira , Eric Biggers , Ard Biesheuvel , X86 ML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Fri, Jan 22, 2021 at 11:29 PM Megha Dey wrote: > > Optimize crypto algorithms using AVX512 instructions - VAES and VPCLMULQD= Q > (first implemented on Intel's Icelake client and Xeon CPUs). > > These algorithms take advantage of the AVX512 registers to keep the CPU > busy and increase memory bandwidth utilization. They provide substantial > (2-10x) improvements over existing crypto algorithms when update data siz= e > is greater than 128 bytes and do not have any significant impact when use= d > on small amounts of data. > > However, these algorithms may also incur a frequency penalty and cause > collateral damage to other workloads running on the same core(co-schedule= d > threads). These frequency drops are also known as bin drops where 1 bin > drop is around 100MHz. With the SpecCPU and ffmpeg benchmark, a 0-1 bin > drop(0-100MHz) is observed on Icelake desktop and 0-2 bin drops (0-200Mhz= ) > are observed on the Icelake server. > > The AVX512 optimization are disabled by default to avoid impact on other > workloads. In order to use these optimized algorithms: > 1. At compile time: > a. User must enable CONFIG_CRYPTO_AVX512 option > b. Toolchain(assembler) must support VPCLMULQDQ and VAES instructions > 2. At run time: > a. User must set module parameter use_avx512 at boot time > b. Platform must support VPCLMULQDQ and VAES features > > N.B. It is unclear whether these coarse grain controls(global module > parameter) would meet all user needs. Perhaps some per-thread control mig= ht > be useful? Looking for guidance here. I've just been looking at some performance issues with in-kernel AVX, and I have a whole pile of questions that I think should be answered first: What is the impact of using an AVX-512 instruction on the logical thread, its siblings, and other cores on the package? Does the impact depend on whether it=E2=80=99s a 512-bit insn or a shorter = EVEX insn? What is the impact on subsequent shorter EVEX, VEX, and legacy SSE(2,3, etc) insns? How does VZEROUPPER figure in? I can find an enormous amount of misinformation online, but nothing authoritative. What is the effect of the AVX-512 states (5-7) being =E2=80=9Cin use=E2=80= =9D? As far as I can tell, the only operations that clear XINUSE[5-7] are XRSTOR and its variants. Is this correct? On AVX-512 capable CPUs, do we ever get a penalty for executing a non-VEX insn followed by a large-width EVEX insn without an intervening VZEROUPPER? The docs suggest no, since Broadwell and before don=E2=80=99t support EVEX, but I=E2=80=99d like to know for sure. My current opinion is that we should not enable AVX-512 in-kernel except on CPUs that we determine have good AVX-512 support. Based on some reading, that seems to mean Ice Lake Client and not anything before it. I also think a bunch of the above questions should be answered before we do any of this. Right now we have a regression of unknown impact in regular AVX support in-kernel, we will have performance issues in-kernel depending on what user code has done recently, and I'm still trying to figure out what to do about it. Throwing AVX-512 into the mix without real information is not going to improve the situation.