Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp917798pxb; Thu, 15 Apr 2021 09:26:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx/Af0cH4nkCvMC37TDR8n1fymh9BCUE91rPWUtUv1EYT4M2DK+hEXteOP51jhtzpgZVSXk X-Received: by 2002:a05:6402:518c:: with SMTP id q12mr5304655edd.11.1618503974839; Thu, 15 Apr 2021 09:26:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618503974; cv=none; d=google.com; s=arc-20160816; b=S8o6PlLmLYC7DHSRokMDNnnsc4t31tTX+wTycniQEYzh0FH1+/tbSWUFJ6fBtSNtQl vByROPl8rUZLL2MqmY7MHDpQ10Gym8H1qdTi8ZQa1dB6AptBw+ILk2WRwNtCPGt0UugE vvFCXSLRdr0DeD+jbLDR2jI7LUEp2NudtHZBd9tDtifwOeVvk/fb8+KXqUwWOutI4uZ2 ysrmyH+1E9XP50XtEPJ8X6PeFnUsqYhZ6s+7rVkYAnPon4uc/+lH9Hc0JojvUc9LJuL2 92t4D2t3QrWfK8dVb8rczR9ji9Bby+MVOEJ2wdqa25B7RpnzdaUp6A2KDNtN6MHJkBdH h0JQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=E1SuAgbT6fW2X0hi56ThBvWSW0k/fhGosL65Oa6kCL4=; b=PZDeGsrbOp90fXM+vz4333rNdL2qSgyd241je8XHwREskynjMDOjCIHzCikOa3TXZC p3B1bbj4Bn0KnB7vaUVRsjnDVDdmQpug+gXKoJcU/yYVg3NflfLd2GEkLsCbLvaWdKyJ iDfFFAbkXJsgfH4mduEWXFWiuBiAGQU1ySfqsX9BRBylJE89U1W/VzckeVbLz7cWBHNu mMejeX/aAHMHlbPcmeIKu9WXoHnzMZc02zVkpVMXa6u5AwFwb7+FiRVLr+b93leiVIFi nvrmgiXJauO/LmbEiUZ8Pb4Y1RhnKLVS9EL9vbtXHKLgD6gOcjfEsUWYQ1OU94lfFVbO gBEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rXPA0h80; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bo2si2639948ejb.202.2021.04.15.09.25.42; Thu, 15 Apr 2021 09:26:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rXPA0h80; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232993AbhDOQZM (ORCPT + 99 others); Thu, 15 Apr 2021 12:25:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:42944 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232769AbhDOQZL (ORCPT ); Thu, 15 Apr 2021 12:25:11 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9CCAB611AB for ; Thu, 15 Apr 2021 16:24:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1618503888; bh=VYxHahJBforwe3MfNV0e6RnvHNGnmfHfYrMKb8oKV9s=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=rXPA0h80WaVpCx89GVFcTOw3HtWUAQvIh+b7FMxm941WEq3OiP144lHVm5oVlwSl/ /neWcCWZ3lg4Hj1aEAUpJpgQ/H5FuoBSSfG3QtEMT3QmxygeTKba2aphak9AuW3dcB /EHZudsFe0lKKCEiAWSiFJtIbjqm7ubllgnh3kUe3wvBLZ90zUxOLeu5R+TFVup+DH sTKRUWxheJnAxUQ6xQEPLh5JMGY+FWtVTk3Mm99FYK9jwTRDeUavVNuXJP0dZ5kC5t ERIUHb2b/DVL3MTKmWLiIb/dYJAwTHkk85eCn9sFU3lZ6x37nUAxDNyyw2SmJUDGWH CkrRCtN0oaWFw== Received: by mail-ed1-f54.google.com with SMTP id i3so3138424edt.1 for ; Thu, 15 Apr 2021 09:24:48 -0700 (PDT) X-Gm-Message-State: AOAM533+O9VUK3ZzecQJ+pqr5AF21qTMxlRWL28JLtbEJkg9UEZf/yt9 NeVQ4BMnesA3ED46Azcb37ZoP8uHGggEydgufvJv9w== X-Received: by 2002:aa7:c144:: with SMTP id r4mr5324339edp.222.1618503887222; Thu, 15 Apr 2021 09:24:47 -0700 (PDT) MIME-Version: 1.0 References: <87lf9nk2ku.fsf@oldenburg.str.redhat.com> In-Reply-To: From: Andy Lutomirski Date: Thu, 15 Apr 2021 09:24:35 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features To: Len Brown Cc: Andy Lutomirski , Willy Tarreau , Florian Weimer , "Bae, Chang Seok" , Dave Hansen , X86 ML , LKML , linux-abi@vger.kernel.org, "libc-alpha@sourceware.org" , Rich Felker , Kyle Huey , Keno Fischer Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 14, 2021 at 2:48 PM Len Brown wrote: > > > > Then I take the transition penalty into and out of AMX code (I'll > > believe there is no penalty when I see it -- we've had a penalty with > > VEX and with AVX-512) and my program runs *slower*. > > If you have a clear definition of what "transition penalty" is, please share it. Given the generally awful state of Intel's documentation about these issues, it's quite hard to tell for real. But here are some examples. VEX: Figures 11-1 ("AVX-SSE Transitions in the Broadwell, and Prior Generation Microarchitectures") and 11-2 ("AVX-SSE Transitions in the Skylake Microarchitecture"). We *still* have a performance regression in the upstream kernel because, despite all common sense, the CPUs consider LDMXCSR to be an SSE instruction and VLDMXCSR to be an AVX instruction despite the fact that neither one of them touch the XMM or YMM state at all. AVX-512: https://lore.kernel.org/linux-crypto/CALCETrU06cuvUF5NDSm8--dy3dOkxYQ88cGWaakOQUE4Vkz88w@mail.gmail.com/ https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html > > Lacking one, I'll assume you are referring to the > impact on turbo frequency of using AMX hardware? > > Again... > > On the hardware that supports AMX, there is zero impact on frequency > due to the presence of AMX state, whether modified or unmodified. > > We resolved on another thread that Linux will never allow entry > into idle with modified AMX state, and so AMX will have zero impact > on the ability of the process to enter deep power-saving C-states. > > It is true that AMX activity is considered when determining max turbo. > (as it must be) > However, the *release* of the turbo credits consumed by AMX is > "several orders of magnitude" faster on this generation > than it was for AVX-512 on pre-AMX hardware. What is the actual impact of a trivial function that initializes the tile config, does one tiny math op, and then does TILERELEASE? > Yes, the proposal, and the working patch set on the list, context > switches XFD -- which is exactly what that hardware was designed to do. > If the old and new tasks have the same value of XFD, the MSR write is skipped. > > I'm not aware of any serious proposal to context-switch XCR0, > as it would break the current programming model, where XCR0 > advertises what the OS supports. It would also impact performance, > as every write to XCR0 necessarily provokes a VMEXIT. You're arguing against a nonsensical straw man. In the patches, *as submitted*, if you trip the XFD #NM *once* and you are the only thread on the system to do so, you will eat the cost of a WRMSR on every subsequent context switch. This is not free. If we use XCR0 (I'm not saying we will -- I'm just mentioning at a possibility), then the penalty is presumably worse due to the VMX issue.