Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp883087pxt; Fri, 6 Aug 2021 16:54:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz6MGx3PuGOISLjxEbzV9Se6KaEt0N9eg7hX35y1CZhStMRAps+O52bgQK7htt4iw6GdUN9 X-Received: by 2002:a92:ce4b:: with SMTP id a11mr70712ilr.79.1628294045734; Fri, 06 Aug 2021 16:54:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628294045; cv=none; d=google.com; s=arc-20160816; b=ZpDmmIfB58PP53hUVoDGINbYoWnGLcTdD28jshghcNFBNdxD1q2pzM8fQd0swKHGKn n3PbYUPAddmnDs2V7G/Xlh7lNLliIoTomq0X9GykfeIFy4ZF3ow/cyomZ8O2+RoEscW+ 2VwTJz6ZGWJqhA0jh1aonU1YJC+TqAho5QaVc+Cnf7fS1R8HqyT68iMRreua7WuQERq1 9T2nF2gWR3YoWEE7U/4ZozoEmNJ285nPtVTy1uppABKLKTuTlPG3pkRXYbWH3FgUcKSp U9R/rIi+mjlhQQNSe8duHZWeBWhkjjaFNeQpR/ybOveAp2+wTAN6UxVzWUJf1h9kb4h8 RLfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:organization:message-id:date:subject:cc:to :from; bh=E1FBn0GpR0ofyHbo3e+jzrd9K05GGJNzVoDLdJucsQY=; b=GnoeH0+kq9CiT+t8hfcsMnPrRDXT67rm1vylAi7mby5WzIj6m7VTlaVkW9FO3hsd5c yfLlFPvbE12xm4IXI8eS3V088+Oy1jswwm+HQx7FMAjyXxM3j/dzw9cHT1lpZYczwBHY dL8E57VVhsygNxbzUmxjkXhOeuJhvBU+jExb+/iuk68pA3Xayf33qwAkLIdAogrl5Vk5 XDJBDGLoxiDRwLJZ8gpWg9snFRY+n2dizkzk/7bCDELRtxOO9PtVfz/sebJ3qXrP80IE reAZgi5y5hqsHr5/FgIXfpB35BKNb4Vq/0KdAPZLe2+UCcgKMti5NyAaL0RfpCEEm/u9 CZBg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n12si10308802ilm.119.2021.08.06.16.53.54; Fri, 06 Aug 2021 16:54:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235480AbhHFQqw (ORCPT + 99 others); Fri, 6 Aug 2021 12:46:52 -0400 Received: from mga06.intel.com ([134.134.136.31]:62418 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235319AbhHFQqv (ORCPT ); Fri, 6 Aug 2021 12:46:51 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10068"; a="275447371" X-IronPort-AV: E=Sophos;i="5.84,301,1620716400"; d="scan'208";a="275447371" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Aug 2021 09:46:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,301,1620716400"; d="scan'208";a="523560916" Received: from irsmsx605.ger.corp.intel.com ([163.33.146.138]) by fmsmga002.fm.intel.com with ESMTP; 06 Aug 2021 09:46:29 -0700 Received: from tjmaciei-mobl5.localnet (10.212.136.161) by IRSMSX605.ger.corp.intel.com (163.33.146.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.10; Fri, 6 Aug 2021 17:46:26 +0100 From: Thiago Macieira To: , , , , , "Chang S. Bae" CC: , , , , , Subject: Re: [PATCH v9 14/26] x86/arch_prctl: Create ARCH_SET_STATE_ENABLE/ARCH_GET_STATE_ENABLE Date: Fri, 6 Aug 2021 09:46:22 -0700 Message-ID: <3718618.i2J648eyUT@tjmaciei-mobl5> Organization: Intel Corporation In-Reply-To: <20210730145957.7927-15-chang.seok.bae@intel.com> References: <20210730145957.7927-1-chang.seok.bae@intel.com> <20210730145957.7927-15-chang.seok.bae@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Originating-IP: [10.212.136.161] X-ClientProxiedBy: orsmsx605.amr.corp.intel.com (10.22.229.18) To IRSMSX605.ger.corp.intel.com (163.33.146.138) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Friday, 30 July 2021 07:59:45 PDT Chang S. Bae wrote: > + for_each_thread(tsk, t) { > + t->thread.fpu.dynamic_state_perm |= req_dynstate_perm; > + nr_threads++; > + } > + > + if (nr_threads != tsk->signal->nr_threads) { > + for_each_thread(tsk, t) > + t->thread.fpu.dynamic_state_perm = > old_dynstate_perm; > + pr_err("x86/fpu: ARCH_XSTATE_PERM failed > as thread number mismatched.\n"); > + return -EBUSY; > + } > + return 0; > +} Hello all As I was trying to write the matching userspace code, I think the solution above had two problems. First the simpler one: that EBUSY. It must go and you can do that with a lock. Library code cannot ensure that it is running in single-threaded state and that no other threads are started or exit while they make the system call. There's nothing the library in question can do if it got an EBUSY. Do you want me to try again? What if it fails again? What's the state of the dynamically permitted states after an EBUSY? It's probably inconsistent. Moreover, there's an ABA problem there: what happens if a thread starts and another exits while this system call is running? And what happens if two threads are making this system call? (also, shouldn't tsk->signal->nr_threads be an atomic read?) The second and bigger problem is the consequence of not issuing the ARCH_SET_STATE_ENABLE call: a SIGILL. Up until now, this hasn't happened, so I expect this to be a surprise to people, in the worst possible way. The Intel Software Developer Manual and every single tutorial out there says that the sequence of actions is: 1) check that OSXSAVE is enabled 2) check that the AVX, AVX512 or AMX instructions are supported with CPUID 3) execute XGETBV EAX=0 4) disable any instructions whose matching state is not enabled by the OS This is what software developers will write for AMX and any new future state, until they learn better. This is also all that other OSes will require to run. Moreover, until developers can actually run their software on CPUs with AMX support, they will not notice any missed system calls (the Software Development Emulator tool will execute the instructions whether you've issued the syscall or not). As a consequence, there's a large chance that a test escape like that will cause software to start crashing when run on AMX-capable CPUs when those start showing up and get enabled in public clouds. So I have to insist that the XGETBV instruction's result match exactly what is permitted to run. That means we either enable AMX unconditionally with no need for system calls (with or without XFD trapping to dynamically allocate more state), or that the XCR0 register be set without the AMX bits by default, until the system call is issued. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel DPG Cloud Engineering