Received: by 2002:ab2:b82:0:b0:1f3:401:3cfb with SMTP id 2csp553939lqh; Thu, 28 Mar 2024 09:18:06 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWEf0Ohwcn/u7udk9hn7jG78u9ZrdB8N5634UBVj4H9f5SYL1YKIfRHHfXviCQuVlO5GIjRUqa325U/FHcYhwSQZ16BOrU9MpNUxTT18w== X-Google-Smtp-Source: AGHT+IGpTb/elJVK69+su9Tef0zZ/asbXEEr9Z5KewdaVaVftXLmgEN08mXfijbGrwzz3q33H+Yw X-Received: by 2002:a17:902:76c5:b0:1e0:b687:c5d1 with SMTP id j5-20020a17090276c500b001e0b687c5d1mr3314480plt.64.1711642686375; Thu, 28 Mar 2024 09:18:06 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711642686; cv=pass; d=google.com; s=arc-20160816; b=TA1KqQsY6lh7GvfHQnYuSUm6ABCcVWG1P8cHRAo6Fq0OvfV1ljpVjkGEE/iocKKXe/ Xl96zaQFWS23RDQFp/e/1KCoJFKh+nRT7d+vZr3f0ZkPQLugrClon/zOsTpVhDqVyTmb NRg+nKLGTQZsSvC8+P1EBSx8IdsCyvcAhjpEsePufjv7PjhGvIzFlDd+GiJP92SfALIV 3vZNVrKMfIz4JBMbD3p0mrvpIdflhj7ltZ90cPgF81WwvbAfKccd9YMxI1lbcA03fZEz lRVcdz/ZHMQ/QotD+TAfwvVSgcdq7AiRSy3OxWZ8anQgaGVIurWxHMk6jrMGwFDRa/i0 sRDQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :in-reply-to:content-disposition:references:message-id:subject:cc:to :from:date:dkim-signature; bh=O1mCM99rDKh1WnktnKGuwlFcn5k0KykkoCp2lndeuMg=; fh=2N8lvZbxmS2LnNkLTdbr80C29NfQf4yvSc15I62KDwg=; b=tRxCbSZqnuW/OM+9Vz+PA3VnqRuNuLDfdKvVDhPW33YsY3H1ce3x5ywRlDWo8hdeG1 XH7AXHvvsw4j/l8i0wCvDHTWnTJkkaInx8lJcO/LkhEg39eyA38fzJYFm63QgzyObdeg G5eOxTw3hj6a7lSopDfKVBde3xUygfasEd1imww1vzXm2MLv+tujtqXafRFU0ffjKVZR B/gEAJCtsFWSTrJg5lXF/KZ5/fC6mq8Ij6lwEBdlJ6DoFp7lYuP78yH99TPalNTikgAm CPJNyQWcdpDmH5wijKxjheAndukozmHzB6tHOIwE+9YdsstKtyxE+Ik4LguLUtrqhT+F KSQA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=OSCiqtkn; arc=pass (i=1 spf=pass spfdomain=hpe.com dkim=pass dkdomain=hpe.com dmarc=pass fromdomain=hpe.com); spf=pass (google.com: domain of linux-kernel+bounces-123227-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-123227-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id h14-20020a170902f54e00b001e11b2a16ebsi1776123plf.87.2024.03.28.09.18.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 09:18:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-123227-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=OSCiqtkn; arc=pass (i=1 spf=pass spfdomain=hpe.com dkim=pass dkdomain=hpe.com dmarc=pass fromdomain=hpe.com); spf=pass (google.com: domain of linux-kernel+bounces-123227-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-123227-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 0E88629786C for ; Thu, 28 Mar 2024 16:18:06 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D9FE0130E50; Thu, 28 Mar 2024 16:17:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b="OSCiqtkn" Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAF2B130A4D; Thu, 28 Mar 2024 16:17:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.147.86 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711642676; cv=none; b=X2BlFPVTpWNqDhcrDeKaCanz4d8qleED+DU3p/+ribJhYRxaF+OcWR9D2vpXctnv0PLkZjqc6VZmhc3dxA++1DIC9lbnb6lfg/x2ueuAi7i3mFgkIdisdHhho2cyYmFyV1KGI8TnHyb1WvM903qpK5aK3v1CS0nPgbaY7Q1+qgA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711642676; c=relaxed/simple; bh=XXzHklSlwZUBc70mb+IrCrGHDHHicLnr6FwTlMm8dZ0=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=lSimGF5uZI734DqLJXU1D+bCk3QXwp896bkCqtNl5bZpRwrydlDziAK3atuUQf3SFHFvjHPCQAKtJm60y2RyWJaWZukbAnLN+l7TCZiEN0O2VMh8hOHyw8sEVFjz/az//0YZL+Uhj0Bh76sJwHkHC+TorGOiJIKjV0+xWTKAWcU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com; spf=pass smtp.mailfrom=hpe.com; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b=OSCiqtkn; arc=none smtp.client-ip=148.163.147.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hpe.com Received: from pps.filterd (m0150241.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42SGGknu029299; Thu, 28 Mar 2024 16:17:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=date : from : to : cc : subject : message-id : references : content-type : in-reply-to : mime-version; s=pps0720; bh=O1mCM99rDKh1WnktnKGuwlFcn5k0KykkoCp2lndeuMg=; b=OSCiqtkn8xZcj7dr0jQBJM5mHq+Izb5Ir7I+Xzk/K8ZeLGHC/r7pLTDegj/BubOKPyol oCR8CBxkG0QMHMhRPGH8PmMVI09Tf0NAWaIVYuolQ4VLPi2VTd4tlIPiLEI1Up3iO+0t u5+N2X0xnNezU9MslN+WKpxNmaSdfWOEzMgBADr1KisT+gFKxq9DAWDFwYu60CYUefnF QOS+0/fMql4uW0HaT714ZHrzQNj8avHslcd3l9NUzAB990JQk/j4aRkSY7qWAXU10O1W WlMDmE8PXQf2oBAu+c4EjXRvrNVBudb3DILoXPyjTwC/c1B77T0DL6/pSM18lqfvcXEW Zg== Received: from p1lg14879.it.hpe.com ([16.230.97.200]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3x51q7da27-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Mar 2024 16:17:10 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id 86D3313045; Thu, 28 Mar 2024 16:17:09 +0000 (UTC) Received: from swahl-home.5wahls.com (unknown [16.231.227.39]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTPS id 90E168014FE; Thu, 28 Mar 2024 16:17:05 +0000 (UTC) Date: Thu, 28 Mar 2024 11:17:03 -0500 From: Steve Wahl To: Steve Wahl , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Linux regressions mailing list , Pavin Joseph , stable@vger.kernel.org, Eric Hagberg Cc: Simon Horman , Eric Biederman , Dave Young , Sarah Brofeldt , Russ Anderson , Dimitri Sivanich , Hou Wenlong , Andrew Morton , Baoquan He , Yuntao Wang , Bjorn Helgaas Subject: Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped. Message-ID: References: <20240328160614.1838496-1-steve.wahl@hpe.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240328160614.1838496-1-steve.wahl@hpe.com> X-Proofpoint-GUID: yktVL9b0sBOCXIFUqc1t4HYfWi__5IaH X-Proofpoint-ORIG-GUID: yktVL9b0sBOCXIFUqc1t4HYfWi__5IaH X-Proofpoint-UnRewURL: 0 URL was un-rewritten Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-28_15,2024-03-28_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 priorityscore=1501 malwarescore=0 suspectscore=0 bulkscore=0 mlxlogscore=999 mlxscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2403280111 Note: I cc:'d stable in the email headers by mistake. NO CC: stable tag, I don't want this to go into stable. Thanks, --> Steve On Thu, Mar 28, 2024 at 11:06:14AM -0500, Steve Wahl wrote: > When ident_pud_init() uses only gbpages to create identity maps, large > ranges of addresses not actually requested can be included in the > resulting table; a 4K request will map a full GB. On UV systems, this > ends up including regions that will cause hardware to halt the system > if accessed (these are marked "reserved" by BIOS). Even processor > speculation into these regions is enough to trigger the system halt. > And MTRRs cannot be used to restrict this speculation, there are not > enough MTRRs to cover all the reserved regions. > > The fix for that would be to only use gbpages when map creation > requests include the full GB page of space, and falling back to using > smaller 2M pages when only portions of a GB page are included in the > request. > > But on some other systems, possibly due to buggy bios, that solution > leaves some areas out of the identity map that are needed for kexec to > succeed. It is believed that these areas are not marked properly for > map_acpi_tables() in arch/x86/kernel/machine_kexec_64.c to catch and > map them. The nogbpages kernel command line option also causes these > systems to fail even without these changes. > > So, create kexec identity maps using full GB pages on all platforms > but UV; on UV, use narrower 2MB pages in the identity map where a full > GB page would include areas outside the region requested. > > No attempt is made to coalesce mapping requests. If a request requires > a map entry at the 2M (pmd) level, subsequent mapping requests within > the same 1G region will also be at the pmd level, even if adjacent or > overlapping such requests could have been combined to map a full > gbpage. Existing usage starts with larger regions and then adds > smaller regions, so this should not have any great consequence. > > Signed-off-by: Steve Wahl > > Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.") > Reported-by: Pavin Joseph > Closes: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/ > Link: https://lore.kernel.org/all/20240322162135.3984233-1-steve.wahl@hpe.com/ > Tested-by: Pavin Joseph > Tested-by: Eric Hagberg > Tested-by: Sarah Brofeldt > --- > > v4: Incorporate fix for regression on systems relying on gbpages > mapping more than the ranges actually requested for successful > kexec, by limiting the effects of the change to UV systems. > This patch based on tip/x86/urgent. > > v3: per Dave Hansen review, re-arrange changelog info, > refactor code to use bool variable and split out conditions. > > v2: per Dave Hansen review: Additional changelog info, > moved pud_large() check earlier in the code, and > improved the comment describing the conditions > that restrict gbpage usage. > > > arch/x86/include/asm/init.h | 1 + > arch/x86/kernel/machine_kexec_64.c | 10 ++++++++++ > arch/x86/mm/ident_map.c | 24 +++++++++++++++++++----- > 3 files changed, 30 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h > index cc9ccf61b6bd..371d9faea8bc 100644 > --- a/arch/x86/include/asm/init.h > +++ b/arch/x86/include/asm/init.h > @@ -10,6 +10,7 @@ struct x86_mapping_info { > unsigned long page_flag; /* page flag for PMD or PUD entry */ > unsigned long offset; /* ident mapping offset */ > bool direct_gbpages; /* PUD level 1GB page support */ > + bool direct_gbpages_only; /* use 1GB pages exclusively */ > unsigned long kernpg_flag; /* kernel pagetable flag override */ > }; > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c > index b180d8e497c3..3a2f5d291a88 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_ACPI > /* > @@ -212,6 +213,15 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable) > > if (direct_gbpages) > info.direct_gbpages = true; > + /* > + * UV systems need restrained use of gbpages in the identity > + * maps to avoid system halts. But some other systems rely on > + * using gbpages to expand mappings outside the regions > + * actually listed, to include areas required for kexec but > + * not explicitly named by the bios. > + */ > + if (!is_uv_system()) > + info.direct_gbpages_only = true; > > for (i = 0; i < nr_pfn_mapped; i++) { > mstart = pfn_mapped[i].start << PAGE_SHIFT; > diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c > index 968d7005f4a7..a538a54aba5d 100644 > --- a/arch/x86/mm/ident_map.c > +++ b/arch/x86/mm/ident_map.c > @@ -26,18 +26,32 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page, > for (; addr < end; addr = next) { > pud_t *pud = pud_page + pud_index(addr); > pmd_t *pmd; > + bool use_gbpage; > > next = (addr & PUD_MASK) + PUD_SIZE; > if (next > end) > next = end; > > - if (info->direct_gbpages) { > - pud_t pudval; > + /* if this is already a gbpage, this portion is already mapped */ > + if (pud_leaf(*pud)) > + continue; > + > + /* Is using a gbpage allowed? */ > + use_gbpage = info->direct_gbpages; > > - if (pud_present(*pud)) > - continue; > + if (!info->direct_gbpages_only) { > + /* Don't use gbpage if it maps more than the requested region. */ > + /* at the beginning: */ > + use_gbpage &= ((addr & ~PUD_MASK) == 0); > + /* ... or at the end: */ > + use_gbpage &= ((next & ~PUD_MASK) == 0); > + } > + /* Never overwrite existing mappings */ > + use_gbpage &= !pud_present(*pud); > + > + if (use_gbpage) { > + pud_t pudval; > > - addr &= PUD_MASK; > pudval = __pud((addr - info->offset) | info->page_flag); > set_pud(pud, pudval); > continue; > > base-commit: b6540de9b5c867b4c8bc31225db181cc017d8cc7 > -- > 2.26.2 > -- Steve Wahl, Hewlett Packard Enterprise