Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4102574pxf; Mon, 29 Mar 2021 22:11:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz7jsskHlJ2j5uqjB+3UliMCjzKev0saBABzHSJXm6vxpLhTtXUH932F0/GQ1K2kX1cWTa8 X-Received: by 2002:a17:906:7257:: with SMTP id n23mr18403661ejk.412.1617081074707; Mon, 29 Mar 2021 22:11:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617081074; cv=none; d=google.com; s=arc-20160816; b=JdLPvMwP+1jfbx16KZsvyeI8ZWgQX/Wq5b2dZbpxHFWwjLl4CYLUU4SryOb7O9f77M rXU7N7YoPok1s6Z/+NdcKA/C8Brc2JE15uueqTaSUtnXzjWj33PcsibsGbb0CjEOBM9m MFyU74P34TJgOYqXHr3iE0pyXXaSbC2FzwYxk2ZI1HT/D1KwXI1WW0Cy+bPU0HUXv3JS sl+p05Yn8IGU4mfXi/lc+lejgGnV7GBm+M1G9VTQoaHJZcc7yEHN3RyOrKSlcrtGcxRk ReEgFyPgJhRY4RQ4hmOUVUlvfeK6OKJian87f7DDd8JFr6ia6l4A6oXAAdZQp40dqJjv 1hnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Ua8J5UedTwmlVEtlHQwqBtNqdrSBtOMm7yxjUj7iHfE=; b=flWX7w4EMCJOZMPtt245yvHKug2fD6hTlozlj884g7r7hPnGQCAjDjM5tkranqVHI8 DEl2KQVMXm3KncwACG0K6zxV5glLoRedhkpytOW2HZRIkRm6uk7BPq+AJvQK00o4qKD6 qLNmrOTnLkIapjxaUzGQL/S5svGgNHirgTxT0+PKHEQ/jR3iFG+n+AXFL3m1dF1MqtzD ImW6TWsjWGfbG2tvbKn3BtFb807atXsXWSxTkmNZd8/0pJ0ZEM4Kr4dHUe8KZ8n+aQDa wQnapTliNPb8yyWwY5LK7IcAXMtriX+8rh9BSGOKJbzH+8EIkIhxBIN6aGaKlqEnuVai Dqvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="F3lF1/Cs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k12si14487565edq.82.2021.03.29.22.10.52; Mon, 29 Mar 2021 22:11:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="F3lF1/Cs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229479AbhC3FJ0 (ORCPT + 99 others); Tue, 30 Mar 2021 01:09:26 -0400 Received: from mail.kernel.org ([198.145.29.99]:34102 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230223AbhC3FJN (ORCPT ); Tue, 30 Mar 2021 01:09:13 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3464A619A9 for ; Tue, 30 Mar 2021 05:09:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1617080952; bh=eRihny9hogI295O5z040VauExeekQ3CI+ExVGM94lUM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=F3lF1/CsZHRFJwWUN+rbXGJUv0dzn3oDgjLKQcuY4EDCEm/Cs/Mp2WP9iGYlWgtDX /I/jejvf1oksy6390dlc67/C16VSuz86LVJ6cQoRmgY6x0oVaIIX/WpeCF2rswywwA k9cLwRgncO9Cl6+4BI8ayNC1SymtBhoSAeIZ4FPoBz6GoKiMwpQLBaRk2hBZEcu0sc H7VmSIsmjE+I+5dbgb7GrWykvAgVzeBLOptRIfmgptqUEMIzs1z2i4tqPr/If3hxGE sXjXJhDbpofSFnrpvhR0l3C/+UvsIVAJwlSfUxbG6nmkSH41g9oKvVxvfmEg3i0gAd Hkdj+Qoc7w4pA== Received: by mail-ej1-f45.google.com with SMTP id e14so22768260ejz.11 for ; Mon, 29 Mar 2021 22:09:12 -0700 (PDT) X-Gm-Message-State: AOAM530p/hWvrMVTM5GMEBsXK1hahMYrZgwj9M/h9hunWeAmmI3hcgUX wOXPpsKjdfMljSzEFTia6elILcRlJSJcblJIbVmcsg== X-Received: by 2002:a17:906:a896:: with SMTP id ha22mr31343851ejb.503.1617080950596; Mon, 29 Mar 2021 22:09:10 -0700 (PDT) MIME-Version: 1.0 References: <5F98327E-8EC4-455E-B9E1-74D2F13578C5@amacapital.net> In-Reply-To: From: Andy Lutomirski Date: Mon, 29 Mar 2021 22:08:59 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features To: Len Brown Cc: Greg KH , Andy Lutomirski , "Bae, Chang Seok" , Dave Hansen , X86 ML , LKML , libc-alpha , Florian Weimer , Rich Felker , Kyle Huey , Keno Fischer , Linux API Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 29, 2021 at 3:38 PM Len Brown wrote: > > On Mon, Mar 29, 2021 at 2:16 PM Andy Lutomirski wro= te: > > > Hi Andy, > > Can you provide a concise definition of the exact problemI(s) this thread > is attempting to address? The AVX-512 state, all by itself, is more than 2048 bytes. Quoting the POSIX sigaltstack page (man 3p sigaltstack): The value SIGSTKSZ is a system default specifying the number of by= tes that would be used to cover the usual case when manually allocating = an alternate stack area. The value MINSIGSTKSZ is defined to be the mi= ni=E2=80=90 mum stack size for a signal handler. In computing an alternate st= ack size, a program should add that amount to its stack requirements to = al=E2=80=90 low for the system implementation overhead. The constants SS_ONSTA= CK, SS_DISABLE, SIGSTKSZ, and MINSIGSTKSZ are defined in . arch/x86/include/uapi/asm/signal.h:#define MINSIGSTKSZ 2048 arch/x86/include/uapi/asm/signal.h:#define SIGSTKSZ 8192 Regrettably, the Linux signal frame format is the uncompacted format and, also regrettably, the uncompacted format has the nasty property that its format depends on XCR0 but not on the set of registers that are actually used or wanted, so, with the current ABI, the signal frame is stuck being quite large for all programs on a machine that supports avx512 and has it enabled by the kernel. And it's even larger for AMX and violates SIGSTKSZ as well as MINSTKSZ. There are apparently real programs that break as a result. We need to find a way to handle new, large extended states without breaking user ABI. We should also find a way to handle them without consuming silly amounts of stack space for programs that don't use them. Sadly, if the solution we settle on involves context switching XCR0, performance on first-generation hardware will suffer because VMX does not have any way to allow guests to write XCR0 without exiting. I don't consider this to be a showstopper -- if we end up having this problem, fixing it in subsequent CPUs is straightforward. > > Thank ahead-of-time for excluding "blow up power consumption", > since that paranoia is not grounded in fact. > I will gladly exclude power consumption from this discussion, since that's a separate issue that has nothing to do with the user<->kernel ABI.