Received: by 2002:a89:d88:0:b0:1fa:5c73:8e2d with SMTP id eb8csp2532185lqb; Tue, 28 May 2024 02:57:13 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU6KigqWgN7bugd7ay8QXNoKd1xOvKIMGkC5X7r7Fwq7Rqz8sbuZZDSchD5LCyE2bmJHI7HdJe8El9T/wXuJrQeFgxVf+23dlh0YDSHYg== X-Google-Smtp-Source: AGHT+IGsQ1JI6n8Y2E3RnOWoKzyft4X5hbnZkNk/lOTBvEceYtxPIM9gmlACanSqlt6yQiyIrShn X-Received: by 2002:a05:622a:1790:b0:43a:cc5d:35b5 with SMTP id d75a77b69052e-43fb0e44aacmr125737571cf.22.1716890233104; Tue, 28 May 2024 02:57:13 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716890233; cv=pass; d=google.com; s=arc-20160816; b=BdUxgjfxxMB2RDcpabThaxHO1Q+vaZS6oUNxFtdbaF325tN0p84lK2r0KGQcBnn4lX NSY392cCmm0KZUNbQas0XiPNWDS8hRO8O7DYuB6oYg8NGQA/oNny26HImeSqH8EK1WgW YbBfdHN5uxYsq53gzLg5iiSr/MM+iWMcwEFy4heiHAxRnDXedWxw8rbEasuokxgfYb6c EXZiT1eZZHmiuuBTc9D7zOa0Vlcv2Pabg9RV9nuTbmQXC100oWBIWJtMnvDefmGwR+c2 BCrC67ACNv8/2tcALk3bATWi1EUk7gW77m1CNzNxDOWy8XD5P78fRrMeUDPnh/aHGTlk rYww== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=ie/vRqNOeY5KBuFsiEGIej54UCcwdt9GdiV9LrdxB+s=; fh=jKzY+IJTxktZ685R93VhJUbGF0rvTl3iTwtLazyIugI=; b=KoJx0cUr9Ozcd73fuloaW2MbaYSK/EdziVqFJ3Xabb7B+DgHI1BsehM6RcASjeaJ/x +UTeVOrNYccTL+Ivr8AYAIZFruG0Lr6iXW1tN12L8PjomPP9y4YCq/5iO2H3m8Pg2OVZ 8aDbkoDrEqneawekmzOrLd3KyuUhzuPnDEhLXpmMZwpuQ460FSQjcF9a5aSHcknav9pV DkgmKOSzh12Rp4+XL9T0qB6rJvox2sFzTd1axWLZMqFHrNgcQZ3aMC8RA15UL9AKJEaZ Jf0AYs2ILUNG4AuqJRuGkvejVOegqgoYooFe2lUiIr9nlnnTzczX3e9hCQ0gWKJmJV3P 3uhg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YpcPVHk2; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-192044-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-192044-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id d75a77b69052e-43fddbeefd2si409521cf.514.2024.05.28.02.57.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 May 2024 02:57:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-192044-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YpcPVHk2; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-192044-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-192044-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 855011C233D7 for ; Tue, 28 May 2024 09:57:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6C61C16D304; Tue, 28 May 2024 09:55:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YpcPVHk2" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 932FD16ABDE; Tue, 28 May 2024 09:55:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716890144; cv=none; b=a+HnviDYS7no3HnVfALkkV8VN4k9WOkN/hqEECKqpBfvhRALMOh7GRH8+7GKpfCMPzqXZ8xhN3Gbu7g8new/xSwcjWKTqQIDMoWtZbKiALnQ1+4368Cr/gx0hLRNacWcqiPTyZjdCWEanVYmhLnIGmnrqYJGzDmkQsUm5K/dCp8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716890144; c=relaxed/simple; bh=ZUiXcW7hC4PknzlHzEIxbEGjZYshXHBK0zojzFc4RxY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jlGDYKqtHN70Y4JixqZ5Po9ptkWSs6dbOsfOqB2u84JWksP+QW/gD0MUGNYpdWKQpr2Ai7sZPAu8jZ5L1I6cFHBG38YRWh2cs63QpIvWWJs2J0flmzpbM2HRR07GUk93HZOq7svQj2MaPkYXr++rt4a6qM4CHvuM57ofgIveaS4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YpcPVHk2; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1716890143; x=1748426143; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZUiXcW7hC4PknzlHzEIxbEGjZYshXHBK0zojzFc4RxY=; b=YpcPVHk2gFBIyVYp0XWpMKQ0k6+r/GXCrYNMSU3n3EAZcwSmeQ57T0AD eYE0MvyGOpjr2UN1Tmw3SY1f0kMJF1M1F4by1G23Sg4vbMShmHtOMEc9Z Iyd/Kt/ws03svI5FGchVuGi8uAmehPvfq6pwrvnnLnlIQ8TmzaoBNioWh xyvFIhr6PlzBvNgoSTMbmirnpmUOtqR+qD+txO6oSTf2oqpnMjmXbswU/ Oi9UqGmDM5DsDM7pr6iWtkoNuSFvNT3fCptw9x9KBt53c0rbbaS0tBTQn eMBdSmpg2vvehJ/Gb+HHGaDp1qTDjUGggd8okOYPf+PkYGQylbv/imnun w==; X-CSE-ConnectionGUID: GgOfFIHYRD6RMoZLATW7Ew== X-CSE-MsgGUID: lp4bNqkeTieHBaKHc4cKpg== X-IronPort-AV: E=McAfee;i="6600,9927,11085"; a="13097756" X-IronPort-AV: E=Sophos;i="6.08,194,1712646000"; d="scan'208";a="13097756" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2024 02:55:41 -0700 X-CSE-ConnectionGUID: 9xrhv8WrR22gC93PqHv50g== X-CSE-MsgGUID: BEkOX6SDRsSnZMGuSwGRqg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,194,1712646000"; d="scan'208";a="34951754" Received: from black.fi.intel.com ([10.237.72.28]) by fmviesa007.fm.intel.com with ESMTP; 28 May 2024 02:55:35 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id C60D98B8; Tue, 28 May 2024 12:55:26 +0300 (EEST) From: "Kirill A. Shutemov" To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org Cc: "Rafael J. Wysocki" , Peter Zijlstra , Adrian Hunter , Kuppuswamy Sathyanarayanan , Elena Reshetova , Jun Nakajima , Rick Edgecombe , Tom Lendacky , "Kalra, Ashish" , Sean Christopherson , "Huang, Kai" , Ard Biesheuvel , Baoquan He , "H. Peter Anvin" , "Kirill A. Shutemov" , "K. Y. Srinivasan" , Haiyang Zhang , kexec@lists.infradead.org, linux-hyperv@vger.kernel.org, linux-acpi@vger.kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, Tao Liu Subject: [PATCHv11 11/19] x86/tdx: Convert shared memory back to private on kexec Date: Tue, 28 May 2024 12:55:14 +0300 Message-ID: <20240528095522.509667-12-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240528095522.509667-1-kirill.shutemov@linux.intel.com> References: <20240528095522.509667-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit TDX guests allocate shared buffers to perform I/O. It is done by allocating pages normally from the buddy allocator and converting them to shared with set_memory_decrypted(). The second, kexec-ed kernel has no idea what memory is converted this way. It only sees E820_TYPE_RAM. Accessing shared memory via private mapping is fatal. It leads to unrecoverable TD exit. On kexec walk direct mapping and convert all shared memory back to private. It makes all RAM private again and second kernel may use it normally. The conversion occurs in two steps: stopping new conversions and unsharing all memory. In the case of normal kexec, the stopping of conversions takes place while scheduling is still functioning. This allows for waiting until any ongoing conversions are finished. The second step is carried out when all CPUs except one are inactive and interrupts are disabled. This prevents any conflicts with code that may access shared memory. Signed-off-by: Kirill A. Shutemov Reviewed-by: Rick Edgecombe Reviewed-by: Kai Huang Tested-by: Tao Liu --- arch/x86/coco/tdx/tdx.c | 69 +++++++++++++++++++++++++++++++ arch/x86/include/asm/pgtable.h | 5 +++ arch/x86/include/asm/set_memory.h | 3 ++ arch/x86/mm/pat/set_memory.c | 41 ++++++++++++++++-- 4 files changed, 115 insertions(+), 3 deletions(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 979891e97d83..c0a651fa8963 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -14,6 +15,7 @@ #include #include #include +#include /* MMIO direction */ #define EPT_READ 0 @@ -831,6 +833,70 @@ static int tdx_enc_status_change_finish(unsigned long vaddr, int numpages, return 0; } +/* Stop new private<->shared conversions */ +static void tdx_kexec_begin(bool crash) +{ + /* + * Crash kernel reaches here with interrupts disabled: can't wait for + * conversions to finish. + * + * If race happened, just report and proceed. + */ + if (!set_memory_enc_stop_conversion(!crash)) + pr_warn("Failed to stop shared<->private conversions\n"); +} + +/* Walk direct mapping and convert all shared memory back to private */ +static void tdx_kexec_finish(void) +{ + unsigned long addr, end; + long found = 0, shared; + + lockdep_assert_irqs_disabled(); + + addr = PAGE_OFFSET; + end = PAGE_OFFSET + get_max_mapped(); + + while (addr < end) { + unsigned long size; + unsigned int level; + pte_t *pte; + + pte = lookup_address(addr, &level); + size = page_level_size(level); + + if (pte && pte_decrypted(*pte)) { + int pages = size / PAGE_SIZE; + + /* + * Touching memory with shared bit set triggers implicit + * conversion to shared. + * + * Make sure nobody touches the shared range from + * now on. + */ + set_pte(pte, __pte(0)); + + if (!tdx_enc_status_changed(addr, pages, true)) { + pr_err("Failed to unshare range %#lx-%#lx\n", + addr, addr + size); + } + + found += pages; + } + + addr += size; + } + + __flush_tlb_all(); + + shared = atomic_long_read(&nr_shared); + if (shared != found) { + pr_err("shared page accounting is off\n"); + pr_err("nr_shared = %ld, nr_found = %ld\n", shared, found); + } +} + void __init tdx_early_init(void) { struct tdx_module_args args = { @@ -890,6 +956,9 @@ void __init tdx_early_init(void) x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required; x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required; + x86_platform.guest.enc_kexec_begin = tdx_kexec_begin; + x86_platform.guest.enc_kexec_finish = tdx_kexec_finish; + /* * TDX intercepts the RDMSR to read the X2APIC ID in the parallel * bringup low level code. That raises #VE which cannot be handled diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 65b8e5bb902c..e39311a89bf4 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -140,6 +140,11 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } +static inline bool pte_decrypted(pte_t pte) +{ + return cc_mkdec(pte_val(pte)) == pte_val(pte); +} + #define pmd_dirty pmd_dirty static inline bool pmd_dirty(pmd_t pmd) { diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h index 9aee31862b4a..d490db38db9e 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -49,8 +49,11 @@ int set_memory_wb(unsigned long addr, int numpages); int set_memory_np(unsigned long addr, int numpages); int set_memory_p(unsigned long addr, int numpages); int set_memory_4k(unsigned long addr, int numpages); + +bool set_memory_enc_stop_conversion(bool wait); int set_memory_encrypted(unsigned long addr, int numpages); int set_memory_decrypted(unsigned long addr, int numpages); + int set_memory_np_noalias(unsigned long addr, int numpages); int set_memory_nonglobal(unsigned long addr, int numpages); int set_memory_global(unsigned long addr, int numpages); diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index a7a7a6c6a3fb..2a548b65ef5f 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2227,12 +2227,47 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc) return ret; } +/* + * The lock serializes conversions between private and shared memory. + * + * It is taken for read on conversion. A write lock guarantees that no + * concurrent conversions are in progress. + */ +static DECLARE_RWSEM(mem_enc_lock); + +/* + * Stop new private<->shared conversions. + * + * Taking the exclusive mem_enc_lock waits for in-flight conversions to complete. + * The lock is not released to prevent new conversions from being started. + * + * If sleep is not allowed, as in a crash scenario, try to take the lock. + * Failure indicates that there is a race with the conversion. + */ +bool set_memory_enc_stop_conversion(bool wait) +{ + if (!wait) + return down_write_trylock(&mem_enc_lock); + + down_write(&mem_enc_lock); + + return true; +} + static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) { - if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) - return __set_memory_enc_pgtable(addr, numpages, enc); + int ret = 0; - return 0; + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) { + if (!down_read_trylock(&mem_enc_lock)) + return -EBUSY; + + ret = __set_memory_enc_pgtable(addr, numpages, enc); + + up_read(&mem_enc_lock); + } + + return ret; } int set_memory_encrypted(unsigned long addr, int numpages) -- 2.43.0