Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp8047737rwd; Tue, 20 Jun 2023 09:26:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6lAorhmKiR3MaU9TgsmaSa1BnU00/UpBJ67ulwBiilmxPD8eh6BCrDzFjo5KxQODYLTMMg X-Received: by 2002:a05:6a20:a125:b0:11f:7f84:21c3 with SMTP id q37-20020a056a20a12500b0011f7f8421c3mr6271983pzk.31.1687278378587; Tue, 20 Jun 2023 09:26:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687278378; cv=none; d=google.com; s=arc-20160816; b=KMG5hQLWpS8T3fRwB87LEblnXzME3/dJCEn5tF6a8a8FLLi6M2GIsqPqQpwugUz6sW zMfg2RxuPTA/8iUeZ8Al6nV5guOt6b+RE7kEbKsft//TrAlgL9/wBnFjkK6V0XuBG9TJ DSxfAT0dkBAETCyL9gA/Jj7MA6gvQXV7a25++DtkeeHqZbkw28bKiIe/q/P3LgYJXhoS tdViJ65bpgbBsD+ftyHtrpFIgbHvwRmVsa9jY9MuxQok5b2Vp32GaYTTUHNe/I1mmdUw lQVlg2lzji2Amc3vcHqNH6u68fl9vunnZ/Ucu22dST+WR/dE7x2WSpyKUoqnanBpGS3M tU+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=x6duUwuQ9s7uQmjst1wPu5zim2i1dNwrZaWwPiLob14=; b=vD/QaWgX12eFE2mHi/1+YChXNQY8EGsxp8u//gShlSmPrZqKittMgRAIHgBAKagPFz 1BaOgGiPBCJAzkUUkMQY1mzHYxVqqsaRbwv0ZwbN/GElVDKJjimnb5yX4igS7YsoUAY9 JHKryhPte859EbZr6N39OQWDaaxGvneuILhLk7G73liCE2NP+PBKphvys2hPl3/lPaYY OeFJ26icYcpTP1RkJuy1y9zEnHmxKseNUOWhs4gJrUlFH9vSTJxbMGlaWbjadjEcA3Aw 3PPfHclTHxfVy1+Am6kD9n25f+XusKfZen4JXJwbp+maTNFX68ysH2zS/7AHUklOZLv/ hMjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m0itpQk1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c74-20020a621c4d000000b00668717a964fsi1959845pfc.33.2023.06.20.09.26.04; Tue, 20 Jun 2023 09:26:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m0itpQk1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233162AbjFTQVm (ORCPT + 99 others); Tue, 20 Jun 2023 12:21:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230347AbjFTQVk (ORCPT ); Tue, 20 Jun 2023 12:21:40 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B1FBE2; Tue, 20 Jun 2023 09:21:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1687278099; x=1718814099; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=JdugfFBFPuP9CneMUY34icuY2vnuoN7wjB/Ecq9vP+0=; b=m0itpQk1n0pnzk6ED4hMaA2fm1YY4rbk0U0R578bjxuPf2EsK1KHu3RS 9vk0zjqdBaI3ye0M5JGDlMFBsio2aXaDDVMTs0mqCxsvXzzdYNZkdFVJ4 amGPpmBtHnmsIt6TL2JlnjtvJXA4mGruC1sFIHgfDoxE5ttIDi2Jtcs7s NFlV8bwOFc1lcPrJTdHwmdhj+nqOX4YRvUbmRsGPKO+GDMb8ig8WZQRZQ YtOOOfZrHvDeIolunE9qDKf/r/9Nw3YYDmWN1zAR3YCXnOWp2apYV+6bh eiLKLezYz+vPsj0dHxeFzKY7q3b4ybNRUsVwK7MsIJnEyoJxfcf1g9Q1S A==; X-IronPort-AV: E=McAfee;i="6600,9927,10747"; a="344650732" X-IronPort-AV: E=Sophos;i="6.00,257,1681196400"; d="scan'208";a="344650732" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jun 2023 09:21:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10747"; a="714094674" X-IronPort-AV: E=Sophos;i="6.00,257,1681196400"; d="scan'208";a="714094674" Received: from rashmigh-mobl.amr.corp.intel.com (HELO [10.255.228.28]) ([10.255.228.28]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jun 2023 09:21:38 -0700 Message-ID: Date: Tue, 20 Jun 2023 09:21:38 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCH v11 04/20] x86/cpu: Detect TDX partial write machine check erratum Content-Language: en-US To: David Hildenbrand , Kai Huang , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, kirill.shutemov@linux.intel.com, tony.luck@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, "Raj, Ashok" References: <86f2a8814240f4bbe850f6a09fc9d0b934979d1b.1685887183.git.kai.huang@intel.com> <723dd9da-ebd5-edb0-e9e5-2d8c14aaffe2@redhat.com> <216753fd-c659-711e-12d0-d12e34110efc@redhat.com> From: Dave Hansen In-Reply-To: <216753fd-c659-711e-12d0-d12e34110efc@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/20/23 09:03, David Hildenbrand wrote: > On 20.06.23 17:39, Dave Hansen wrote: >> On 6/19/23 05:21, David Hildenbrand wrote: >>> So, ordinary writes to TD private memory are not a problem? I thought >>> one motivation for the unmapped-guest-memory discussion was to prevent >>> host (userspace) writes to such memory because it would trigger a MC and >>> eventually crash the host. >> >> Those are two different problems. >> >> Problem #1 (this patch): The host encounters poison when going about its >> normal business accessing normal memory.  This happens when something in >> the host accidentally clobbers some TDX memory and *then* reads it. >> Only occurs with partial writes. >> >> Problem #2 (addressed with unmapping): Host *userspace* intentionally >> and maliciously clobbers some TDX memory and then the TDX module or a >> TDX guest can't run because the memory integrity checks (checksum or TD >> bit) fail.  This can also take the system down because #MC's are nasty. >> >> Host userspace unmapping doesn't prevent problem #1 because it's the >> kernel who screwed up with the _kernel_ mapping. > > Ahh, thanks for verifying. I was hoping that problem #2 would get fixed > in HW as well (and treated like a BUG). No, it's really working as designed. #1 _can_ be fixed because the hardware can just choose to let the host run merrily along corrupting TDX data and blissfully unaware of the carnage until TDX stumbles on the mess. Blissful ignorance really is a useful feature here. It means, for instance, that if the kernel screws up, it can still blissfully kexec(), reboot , boot a new kernel, or dump to the console without fear of #MC. #2 is much harder because the TDX data is destroyed and yet the TDX side still wants to run. The SEV folks chose page faults on write to stop SEV from running and the TDX folks chose #MC on reads as the mechanism. All of the nastiness on the TDX side is (IMNHO) really a consequence of that decision to use machine checks. (Aside: I'm not specifically crapping on the TDX CPU designers here. I don't particularly like the SEV approach either. But this mess is a result of the TDX design choices. There are other messes in other patch series from SEV. ) > Because problem #2 also sounds like something that directly violates the > first paragraph of this patch description "violations of > this integrity protection are supposed to only affect TDX operations and > are never supposed to affect the host kernel itself." > > So I would expect the TDX guest to fail hard, but not other TDX guests > (or the host kernel). This is more fallout from the #MC design choice. Let's use page faults as an example since our SEV friends are using them. *ANY* instruction that reads memory can page fault, have the kernel fix up the fault, and continue merrily along its way. #MC is fundamentally different. The exceptions can be declared to be unrecoverable. The CPU says, "whoopsie, I managed to deliver this #MC, but it would be too hard for me so I can't continue." These "too hard" scenarios are shrinking over time, but they do exist. They're fatal.