Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp593664pxb; Wed, 24 Feb 2021 09:46:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJyKcnnHgqpvT04IOMkPS4PMooX3wvnvWjdcAFhhvvH+3dNs/g48d/pTa1kEDMJe2fuheFuk X-Received: by 2002:a05:6402:26d3:: with SMTP id x19mr34185451edd.0.1614188797197; Wed, 24 Feb 2021 09:46:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614188797; cv=none; d=google.com; s=arc-20160816; b=fTlswuClZhmYvLcfJKx/y6cf/UvldbQk3JJsBDmOsaufc3p69uGi7dz4fwNxb1SEv0 NPz8aNe3Hp4spwg5Ltd55lXKzOmiuyGL33CuxXq26l97zItaqT+QTB0Yyb43wtVLzZJQ Ogc8b/CeOtnPHdk3L19amMnwm95+bq02x8IvrAqgrO3qPTEtHirqA5XRTScJIFvPp1DF 2LcsiBhVENVD2ng0/2Up3ngl1vNhtiS5tv0UbPIKfvKDyTErjoX2ZCNMiDkAKJ9C81as AEHzJ6BZslsatTbrzdv1HPFnCrqKQb3mj8DElhvnn8kkmmPSgf113bkf2axJnyNwz+yA x9pw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=MEXz0G8KKDnUKxV2cypD9OPawck1f8+XiI61hjO7isQ=; b=lCo2PLiQuJVZglyzdIwXnK1HWgfzaBG1WK9CQfSc4UTTpgkmuTLkr5eWoQJJfk42pO qdt3Sycg63tLb6DVItOZYfZKt3M8gZC++4BnIkQFSPrnMm65+BjuNcr83wAnE4utQ+fo HlSgwRScfbhNVkiWxx4YeLPM9ePWmwhHyZ5zDBZl8gVFccLgqNUS39JbsRsFyV69Amvl tnQayx6S+G1naIVG4o6DGA2Ud6WD/ls3xS3AH926w05NvMEg3A6V09gbSbYu2j28oCH0 eG6h5JHF51BH1FdtT1b/gwTJalkrmqkGY3bhYi+yfin4Qk38X6NIMroL77KNZHvIdGy6 nFtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nrKeEP74; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b1si1582880edy.78.2021.02.24.09.46.04; Wed, 24 Feb 2021 09:46:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nrKeEP74; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234567AbhBXRnR (ORCPT + 99 others); Wed, 24 Feb 2021 12:43:17 -0500 Received: from mail.kernel.org ([198.145.29.99]:53704 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229594AbhBXRnQ (ORCPT ); Wed, 24 Feb 2021 12:43:16 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 697EE64EC4 for ; Wed, 24 Feb 2021 17:42:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1614188555; bh=X1ac+OpvnCT+uyEuBgb8T0Qwu4OjvGB9QsfK/vP2hP0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=nrKeEP749PtolT0EsjpmIeAWrR70W+dN+2JYGV4X0Nx5fvzfhX0P63CYSdxYoNCXt nwkhAZ34yy0sZ0ZW4wePrQEIR9Fl4iLWn1esdomOMiG473EidJCPkeJ2RpBrmRZCWV ARvUVhSAKt0oEBsFbopYSqkTLq10U4FwTcg7aMtQcjjqX9p+KWKfXldE1iKPLUgg4T 5wZmKQZEskOr0TCHbwG4+oM2po1KloLlVkW6jxRrl9aHlHJJJzw5HQEu73MGrmUhdy vIHtAc6r+jJJ9eBzKvm2cISS5SS7/2scOOfNm9BxW5vICJgRHYsDMg+6cNMWDezWrT HGS9ET1m2QAZQ== Received: by mail-ed1-f45.google.com with SMTP id h10so3599306edl.6 for ; Wed, 24 Feb 2021 09:42:35 -0800 (PST) X-Gm-Message-State: AOAM533rp3QU5aRkClh+3syc99TSGLdY8Mzw/XJ6Xz4RrHAkr9HJAp5i 4hS/po6/N0b5UqjJn0OWMqYkfBYe5vdo/bX1tBIDGQ== X-Received: by 2002:a05:6402:27cd:: with SMTP id c13mr34122575ede.263.1614188553686; Wed, 24 Feb 2021 09:42:33 -0800 (PST) MIME-Version: 1.0 References: <1611386920-28579-1-git-send-email-megha.dey@intel.com> <3878af8d-ac1e-522a-7c9f-fda4a1f5b967@intel.com> In-Reply-To: <3878af8d-ac1e-522a-7c9f-fda4a1f5b967@intel.com> From: Andy Lutomirski Date: Wed, 24 Feb 2021 09:42:21 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC V2 0/5] Introduce AVX512 optimized crypto algorithms To: "Dey, Megha" Cc: Andy Lutomirski , Tony Luck , Asit K Mallick , "H. Peter Anvin" , Linux Crypto Mailing List , Herbert Xu , "David S. Miller" , "Ravi V. Shankar" , "Chen, Tim C" , "Kleen, Andi" , Dave Hansen , greg.b.tucker@intel.com, "Kasten, Robert A" , rajendrakumar.chinnaiyan@intel.com, tomasz.kantecki@intel.com, ryan.d.saffores@intel.com, ilya.albrekht@intel.com, Kyung Min Park , Weiny Ira , Eric Biggers , Ard Biesheuvel , X86 ML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Tue, Feb 23, 2021 at 4:54 PM Dey, Megha wrote: > > Hi Andy, > > On 1/24/2021 8:23 AM, Andy Lutomirski wrote: > > On Fri, Jan 22, 2021 at 11:29 PM Megha Dey wrote: > >> Optimize crypto algorithms using AVX512 instructions - VAES and VPCLMU= LQDQ > >> (first implemented on Intel's Icelake client and Xeon CPUs). > >> > >> These algorithms take advantage of the AVX512 registers to keep the CP= U > >> busy and increase memory bandwidth utilization. They provide substanti= al > >> (2-10x) improvements over existing crypto algorithms when update data = size > >> is greater than 128 bytes and do not have any significant impact when = used > >> on small amounts of data. > >> > >> However, these algorithms may also incur a frequency penalty and cause > >> collateral damage to other workloads running on the same core(co-sched= uled > >> threads). These frequency drops are also known as bin drops where 1 bi= n > >> drop is around 100MHz. With the SpecCPU and ffmpeg benchmark, a 0-1 bi= n > >> drop(0-100MHz) is observed on Icelake desktop and 0-2 bin drops (0-200= Mhz) > >> are observed on the Icelake server. > >> > >> The AVX512 optimization are disabled by default to avoid impact on oth= er > >> workloads. In order to use these optimized algorithms: > >> 1. At compile time: > >> a. User must enable CONFIG_CRYPTO_AVX512 option > >> b. Toolchain(assembler) must support VPCLMULQDQ and VAES instructi= ons > >> 2. At run time: > >> a. User must set module parameter use_avx512 at boot time > >> b. Platform must support VPCLMULQDQ and VAES features > >> > >> N.B. It is unclear whether these coarse grain controls(global module > >> parameter) would meet all user needs. Perhaps some per-thread control = might > >> be useful? Looking for guidance here. > > > > I've just been looking at some performance issues with in-kernel AVX, > > and I have a whole pile of questions that I think should be answered > > first: > > > > What is the impact of using an AVX-512 instruction on the logical > > thread, its siblings, and other cores on the package? > > > > Does the impact depend on whether it=E2=80=99s a 512-bit insn or a shor= ter EVEX insn? > > > > What is the impact on subsequent shorter EVEX, VEX, and legacy > > SSE(2,3, etc) insns? > > > > How does VZEROUPPER figure in? I can find an enormous amount of > > misinformation online, but nothing authoritative. > > > > What is the effect of the AVX-512 states (5-7) being =E2=80=9Cin use=E2= =80=9D? As far > > as I can tell, the only operations that clear XINUSE[5-7] are XRSTOR > > and its variants. Is this correct? > > > > On AVX-512 capable CPUs, do we ever get a penalty for executing a > > non-VEX insn followed by a large-width EVEX insn without an > > intervening VZEROUPPER? The docs suggest no, since Broadwell and > > before don=E2=80=99t support EVEX, but I=E2=80=99d like to know for sur= e. > > > > > > My current opinion is that we should not enable AVX-512 in-kernel > > except on CPUs that we determine have good AVX-512 support. Based on > > some reading, that seems to mean Ice Lake Client and not anything > > before it. I also think a bunch of the above questions should be > > answered before we do any of this. Right now we have a regression of > > unknown impact in regular AVX support in-kernel, we will have > > performance issues in-kernel depending on what user code has done > > recently, and I'm still trying to figure out what to do about it. > > Throwing AVX-512 into the mix without real information is not going to > > improve the situation. > > We are currently working on providing you with answers on the questions > you have raised regarding AVX. Thanks!