Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp2179379iof; Tue, 7 Jun 2022 22:20:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwyNnRhA/CI4Kd+rmy1L7myCZumPi5EjmZPpD7bK6K90Py8FKiOV3beV3E2B3e3PVhJWLig X-Received: by 2002:a17:90b:1298:b0:1e3:49cb:85be with SMTP id fw24-20020a17090b129800b001e349cb85bemr36242313pjb.159.1654665657489; Tue, 07 Jun 2022 22:20:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654665657; cv=none; d=google.com; s=arc-20160816; b=t8pfGsamnvtQtEFNuDSd1K5WyiAjsQUk1HFVl5ApR9QuO6xUWS8K/m4tW80wX+6buW LDfXmeY9iybctNg5ZYtJlXJysJ1d+pG1Lf0Y/hnAHMPBSWB7LZl0WAkK6cOgmCODesLb fS4Bzt75/n8gM0ynFbvTR/ZeSd2vvRURJIpm/WdzqfGgLJIE8u4CNC1y8t71cmhOsQsC PyGjr+iekZlk2U5RAcGLETIo3FQvhSwLZfHv0480Vdh5KiCtqVKFKRIXY84x+u4uHrER P/5XBFeA3qs2xWFx/dVF5gCRCnufu+MmPgpsTVtjvcnHn4toJkl0hObhHiieQfUqRvC+ qdxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=eWVqeXJ2ZZ/6+3qKI/aVixZbfcyM2eG3o4MTKrSKb5A=; b=kNPTfeo1LXoXsOqQq4bdN9RNhRiLUqR9TJqESQG29Sp0iEVFLKStFo2JNAQyl8n7bG fmJ6rG7sSzehWkGUN4y/q5rCreCT+/z7Wg34+wfeumAEDl0FxeRk9jT1vRdgtkC8Ra/S 3YL0groUzydKOdIaL23XJ2x/ABY6NYRmd6kO9leQ1Wy6C1VhoPbOG7QL9vkB0aPvtZo1 OO+kVflNbXlsB3AJ3TyjRb3HNTMQrZwf/z8vkklMqPRzF7KZlQna546v9KgUWQeKokip rxZqQJUmIthbBQqHwLQ0JOS8+j0U1auZPihl6IJhV/q5TgIid80y8DzPVZJvYOO4n6/H HeKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=mxen4+2a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e7-20020a656bc7000000b003fd2f71f2bfsi22005075pgw.585.2022.06.07.22.20.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jun 2022 22:20:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=mxen4+2a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9687024CCB4; Tue, 7 Jun 2022 21:48:52 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380141AbiFGVej (ORCPT + 99 others); Tue, 7 Jun 2022 17:34:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377251AbiFGUdE (ORCPT ); Tue, 7 Jun 2022 16:33:04 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 454241E3023; Tue, 7 Jun 2022 11:34:42 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E5DC2B8233E; Tue, 7 Jun 2022 18:34:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4CA0EC385A5; Tue, 7 Jun 2022 18:34:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1654626876; bh=r4j1M57vT8iVRMc7Xw3EvLyMXYMOfpVtqi0mnSWvYD4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mxen4+2ampQuMArGNPEPnaRlsl8TEImaGE/QceeNEYr2QAkjTuUrOFfBuu3oJt4qt 4Cu+SWkLOV6MkHWj6qdDql6RrEA7DMrDT189EXwn1+4EGM/WmE7hcCt/yeB/AGCtlO aDVZx5XJHGl1LnX8UOTm4ZlnKctfp+TeQnjZ+Oto= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Christoph Hellwig , Dan Williams , Jane Chu , Tony Luck , Sasha Levin Subject: [PATCH 5.17 536/772] mce: fix set_mce_nospec to always unmap the whole page Date: Tue, 7 Jun 2022 19:02:08 +0200 Message-Id: <20220607165004.761120430@linuxfoundation.org> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220607164948.980838585@linuxfoundation.org> References: <20220607164948.980838585@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jane Chu [ Upstream commit 5898b43af954b83c4a4ee4ab85c4dbafa395822a ] The set_memory_uc() approach doesn't work well in all cases. As Dan pointed out when "The VMM unmapped the bad page from guest physical space and passed the machine check to the guest." "The guest gets virtual #MC on an access to that page. When the guest tries to do set_memory_uc() and instructs cpa_flush() to do clean caches that results in taking another fault / exception perhaps because the VMM unmapped the page from the guest." Since the driver has special knowledge to handle NP or UC, mark the poisoned page with NP and let driver handle it when it comes down to repair. Please refer to discussions here for more details. https://lore.kernel.org/all/CAPcyv4hrXPb1tASBZUg-GgdVs0OOFKXMXLiHmktg_kFi7YBMyQ@mail.gmail.com/ Now since poisoned page is marked as not-present, in order to avoid writing to a not-present page and trigger kernel Oops, also fix pmem_do_write(). Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Reviewed-by: Christoph Hellwig Reviewed-by: Dan Williams Signed-off-by: Jane Chu Acked-by: Tony Luck Link: https://lore.kernel.org/r/165272615484.103830.2563950688772226611.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Signed-off-by: Sasha Levin --- arch/x86/kernel/cpu/mce/core.c | 6 +++--- arch/x86/mm/pat/set_memory.c | 23 +++++++++++------------ drivers/nvdimm/pmem.c | 30 +++++++----------------------- include/linux/set_memory.h | 4 ++-- 4 files changed, 23 insertions(+), 40 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2d719e0d2e40..9ed616e7e1cf 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -613,7 +613,7 @@ static int uc_decode_notifier(struct notifier_block *nb, unsigned long val, pfn = mce->addr >> PAGE_SHIFT; if (!memory_failure(pfn, 0)) { - set_mce_nospec(pfn, whole_page(mce)); + set_mce_nospec(pfn); mce->kflags |= MCE_HANDLED_UC; } @@ -1350,7 +1350,7 @@ static void kill_me_maybe(struct callback_head *cb) ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags); if (!ret) { - set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); + set_mce_nospec(p->mce_addr >> PAGE_SHIFT); sync_core(); return; } @@ -1374,7 +1374,7 @@ static void kill_me_never(struct callback_head *cb) p->mce_count = 0; pr_err("Kernel accessed poison in user space at %llx\n", p->mce_addr); if (!memory_failure(p->mce_addr >> PAGE_SHIFT, 0)) - set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); + set_mce_nospec(p->mce_addr >> PAGE_SHIFT); } static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callback_head *)) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index b143972885eb..1f1018104488 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1925,14 +1925,9 @@ int set_memory_wb(unsigned long addr, int numpages) } EXPORT_SYMBOL(set_memory_wb); -/* - * Prevent speculative access to the page by either unmapping - * it (if we do not require access to any part of the page) or - * marking it uncacheable (if we want to try to retrieve data - * from non-poisoned lines in the page). - */ +/* Prevent speculative access to a page by marking it not-present */ #ifdef CONFIG_X86_64 -int set_mce_nospec(unsigned long pfn, bool unmap) +int set_mce_nospec(unsigned long pfn) { unsigned long decoy_addr; int rc; @@ -1954,19 +1949,23 @@ int set_mce_nospec(unsigned long pfn, bool unmap) */ decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63)); - if (unmap) - rc = set_memory_np(decoy_addr, 1); - else - rc = set_memory_uc(decoy_addr, 1); + rc = set_memory_np(decoy_addr, 1); if (rc) pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn); return rc; } +static int set_memory_present(unsigned long *addr, int numpages) +{ + return change_page_attr_set(addr, numpages, __pgprot(_PAGE_PRESENT), 0); +} + /* Restore full speculative operation to the pfn. */ int clear_mce_nospec(unsigned long pfn) { - return set_memory_wb((unsigned long) pfn_to_kaddr(pfn), 1); + unsigned long addr = (unsigned long) pfn_to_kaddr(pfn); + + return set_memory_present(&addr, 1); } EXPORT_SYMBOL_GPL(clear_mce_nospec); #endif /* CONFIG_X86_64 */ diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 58d95242a836..4aa17132a557 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -158,36 +158,20 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem, struct page *page, unsigned int page_off, sector_t sector, unsigned int len) { - blk_status_t rc = BLK_STS_OK; - bool bad_pmem = false; phys_addr_t pmem_off = sector * 512 + pmem->data_offset; void *pmem_addr = pmem->virt_addr + pmem_off; - if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) - bad_pmem = true; + if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) { + blk_status_t rc = pmem_clear_poison(pmem, pmem_off, len); + + if (rc != BLK_STS_OK) + return rc; + } - /* - * Note that we write the data both before and after - * clearing poison. The write before clear poison - * handles situations where the latest written data is - * preserved and the clear poison operation simply marks - * the address range as valid without changing the data. - * In this case application software can assume that an - * interrupted write will either return the new good - * data or an error. - * - * However, if pmem_clear_poison() leaves the data in an - * indeterminate state we need to perform the write - * after clear poison. - */ flush_dcache_page(page); write_pmem(pmem_addr, page, page_off, len); - if (unlikely(bad_pmem)) { - rc = pmem_clear_poison(pmem, pmem_off, len); - write_pmem(pmem_addr, page, page_off, len); - } - return rc; + return BLK_STS_OK; } static void pmem_submit_bio(struct bio *bio) diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h index 683a6c3f7179..369769ce7399 100644 --- a/include/linux/set_memory.h +++ b/include/linux/set_memory.h @@ -43,10 +43,10 @@ static inline bool can_set_direct_map(void) #endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */ #ifdef CONFIG_X86_64 -int set_mce_nospec(unsigned long pfn, bool unmap); +int set_mce_nospec(unsigned long pfn); int clear_mce_nospec(unsigned long pfn); #else -static inline int set_mce_nospec(unsigned long pfn, bool unmap) +static inline int set_mce_nospec(unsigned long pfn) { return 0; } -- 2.35.1