Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp1506267lqz; Mon, 1 Apr 2024 08:17:51 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUSLC+FD1+EDlhUoOA03wK9GM6KDlZItHxbZcO53p+5cACwYOOxUD9vo6wotUrtTVgK5/hvS9uKyX+vmfRLm+Q9a9hwHzlBmGsBQjs1OQ== X-Google-Smtp-Source: AGHT+IE8DThXhxxmU5dJWJZ6YtW4vpAxUuHdBpHqEFY8wDektr3vtHDkPr+kQxltEDe3yBvjgswB X-Received: by 2002:a05:6358:282:b0:183:90a2:f4ce with SMTP id w2-20020a056358028200b0018390a2f4cemr7631400rwj.0.1711984670879; Mon, 01 Apr 2024 08:17:50 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711984670; cv=pass; d=google.com; s=arc-20160816; b=utQalrzGzu8lz2fO74r5f3sNEkSxaP1sAe36vU1m0y4Am0CNHC0m04WJ6W8aoW+qcM HV3wPzpgJR28YtVF/ltiIp/55QpUx/N16F8v0/E/kRkYdVFI4gBS5mf35Dp6ZAJNGHnZ GaOraQa6pyU68gQhdZs721d4KvNwVgfxYk7AvjBGYiPy4ZsOTpnDRkEEhfMJLo9dVFR/ guq4M6FeoFrf3h7IVFeB4ML2glRRGNf75VGpUGUqlWSCzKl8ueqFb/xr6ICTzmQ0L0JD NZzJDgM3ssHZcFYU2+knPnZla3GJ8ygrwTh+zSTZMfUD12enQ8AQGCGu+ckzgo1vQq/T kDGw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=BirW+UIBY1HR4YTM1gaQXYTvu6fujR94JXCjW3QUDQg=; fh=xReUhrvWB0xot5HMSnPl4mzbDLmKj7jbOyzAwSFheKg=; b=wAfzMhoemBhxzEbsXQYDnRZ8VUMJMakeVBQyd4G3dzZzuzIvqfWuABNlU8CgfRLwDR 9fW7ldLlONKZtK2eH4bfmDyrivGUkJ8nGX7Q1x0VPSG0CX7dEMCDLkEmJfvp+sDLYSiq GTc75ZyEqM3NfrJKqaVxQhYa7U795ztFP1H8HLmE/uiHTd7RcVXwAk+qbuxP/X5Ej4cc pRNCz2QGZCw61Mig5k9U4LMRXW9usrNAgEJMB0c+79IAXt6LiAwsKD4OXXyjwNDsUTm5 RlxAShSwHVeOQhXRj3ZEqWba0VpfCWdwwVHRPwzGeF5mxmKUw10KxhW4j0jcLGkTd0GF BZ0Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=GEwJij2p; arc=pass (i=1 spf=pass spfdomain=hpe.com dkim=pass dkdomain=hpe.com dmarc=pass fromdomain=hpe.com); spf=pass (google.com: domain of linux-kernel+bounces-126788-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-126788-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id u8-20020a631408000000b005f07c3f9129si9281234pgl.69.2024.04.01.08.17.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Apr 2024 08:17:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-126788-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=GEwJij2p; arc=pass (i=1 spf=pass spfdomain=hpe.com dkim=pass dkdomain=hpe.com dmarc=pass fromdomain=hpe.com); spf=pass (google.com: domain of linux-kernel+bounces-126788-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-126788-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 39F9BB219E7 for ; Mon, 1 Apr 2024 15:17:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C5AC546421; Mon, 1 Apr 2024 15:17:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b="GEwJij2p" Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED94F45974; Mon, 1 Apr 2024 15:17:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.147.86 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711984657; cv=none; b=IxuC76jOQm+kYJ/j1fQXZo0ujwz4ycE8tcXsJD+CZfytgAX+y3fIpkZBVQvqFBr4GPEZHDwD/poor6gpi1X1R62N+P/Ch6yeWmI09/M65rRgyLNuoPJceo0pCZwg9dP1tNdfSGXhfd3s7whOhLDUcduFonlThACMZS9pvAghrmY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711984657; c=relaxed/simple; bh=oJmSsv5xGOhFtnk+Xic/LQeUyvu+FYl2RG6Gz/YH/qs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=eTEhyjbq5NLQgfeS/N8zXJeDrOX5PJPie/HqWl7qI/hYXrTMVbeRVcMj7dEFJukw86uYPzWg962VcNMpKICMsAAaGe1z2hVmPJI3M6OlWCyARvQYryueNcEG/8CgXbC7CvDWFnLnzhu/Qra4HlxFcpqqK775GYfRsDbz4qKPCkc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com; spf=pass smtp.mailfrom=hpe.com; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b=GEwJij2p; arc=none smtp.client-ip=148.163.147.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hpe.com Received: from pps.filterd (m0150241.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 431C2Qwp021586; Mon, 1 Apr 2024 15:16:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=pps0720; bh=BirW+UIBY1HR4YTM1gaQXYTvu6fujR94JXCjW3QUDQg=; b=GEwJij2p2LTOrxopReL8DU9Uy/7sy1wneW3c+CHkmpffhuqrxvIw0/uECwm9v7v14/FA bDcOq1fxAwFLH12Q4xpR9xznDMPXyNvSd5sZaSZuSJpRYXRvGcC6SJHDNA6AYBXDMhst Q0D/3o9wYgN0YWjmloPRfaSS4JHBoZAEEK0W0eolKTk2GqQmq+Jv7eTVrOsW6ImZYH26 8s8S2q17mb/GsW3FgG4PWlm0hGTbia4mV53/2M26H1yw4ID8G7OZh7/tv+tOtbHwMG5L /rGKk/kBeXB008JuXMXNxL+iVo28sREMso0JBrgVex86XiCaLRbEkcHX9tKpzmrKegIF 3w== Received: from p1lg14878.it.hpe.com ([16.230.97.204]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3x7s8gauu3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 01 Apr 2024 15:16:26 +0000 Received: from p1lg14886.dc01.its.hpecorp.net (unknown [10.119.18.237]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14878.it.hpe.com (Postfix) with ESMTPS id 63B281379E; Mon, 1 Apr 2024 15:16:15 +0000 (UTC) Received: from swahl-home.5wahls.com (unknown [16.231.227.36]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) by p1lg14886.dc01.its.hpecorp.net (Postfix) with ESMTPS id EA89E80360A; Mon, 1 Apr 2024 15:15:59 +0000 (UTC) Date: Mon, 1 Apr 2024 10:15:57 -0500 From: Steve Wahl To: "Eric W. Biederman" Cc: Steve Wahl , Russ Anderson , Ingo Molnar , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Linux regressions mailing list , Pavin Joseph , stable@vger.kernel.org, Eric Hagberg , Simon Horman , Dave Young , Sarah Brofeldt , Dimitri Sivanich Subject: Re: [PATCH] x86/mm/ident_map: Use full gbpages in identity maps except on UV platform. Message-ID: References: <20240322162135.3984233-1-steve.wahl@hpe.com> <20240325020334.GA10309@hpe.com> <87o7b273p2.fsf@email.froward.int.ebiederm.org> <87r0fv6ddb.fsf@email.froward.int.ebiederm.org> <87zfuj2bgh.fsf@email.froward.int.ebiederm.org> <87msqf12sy.fsf@email.froward.int.ebiederm.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87msqf12sy.fsf@email.froward.int.ebiederm.org> X-Proofpoint-ORIG-GUID: SldB7dXPKExcbOG7DIQQKbrxLxnGuLBX X-Proofpoint-GUID: SldB7dXPKExcbOG7DIQQKbrxLxnGuLBX X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-04-01_10,2024-04-01_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 adultscore=0 malwarescore=0 clxscore=1015 bulkscore=0 suspectscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2404010108 On Sat, Mar 30, 2024 at 10:46:21PM -0500, Eric W. Biederman wrote: > Steve Wahl writes: > > > On Thu, Mar 28, 2024 at 12:05:02AM -0500, Eric W. Biederman wrote: > >> > >> From my perspective the entire reason for wanting to be fine grained and > >> precise in the kernel memory map is because the UV systems don't have > >> enough MTRRs. So you have to depend upon the cache-ability attributes > >> for specific addresses of memory coming from the page tables instead of > >> from the MTRRs. > > > > It would be more accurate to say we depend upon the addresses not > > being listed in the page tables at all. We'd be OK with mapped but > > not accessed, if it weren't for processor speculation. There's no "no > > access" setting within the existing MTRR definitions, though there may > > be a setting that would rein in processor speculation enough to make > > due. > > The uncached setting and the write-combining settings that are used for > I/O are required to disable speculation for any regions so marked. Any > reads or writes to a memory mapped I/O region can result in hardware > with processing it as a command. Which as I understand it is exactly > the problem with UV systems. > > Frankly not mapping an I/O region (in an identity mapped page table) > instead of properly mapping it as it would need to be mapped for > performing I/O seems like a bit of a bug. > > >> If you had enough MTRRs more defining the page tables to be precisely > >> what is necessary would be simply an exercise in reducing kernel > >> performance, because it is more efficient in both page table size, and > >> in TLB usage to use 1GB pages instead of whatever smaller pages you have > >> to use for oddball regions. > >> > >> For systems without enough MTRRs the small performance hit in paging > >> performance is the necessary trade off. > >> > >> At least that is my perspective. Does that make sense? > > > > I think I'm begining to get your perspective. From your point of > > view, is kexec failing with "nogbpages" set a bug? My point of view > > is it likely is. I think your view would say it isn't? > > I would say it is a bug. > > Part of the bug is someone yet again taking something simple that > kexec is doing and reworking it to use generic code, then changing > the generic code to do something different from what kexec needs > and then being surprised that kexec stops working. > > The interface kexec wants to provide to whatever is being loaded is not > having to think about page tables until that software is up far enough > to enable their own page tables. > > People being clever and enabling just enough pages in the page tables > to work based upon the results of some buggy (they are always buggy some > are just less so than others) boot up firmware is where I get concerned. > > Said another way the point is to build an identity mapped page table. > Skipping some parts of the physical<->virtual identity because we seem > to think no one will use it is likely a bug. Hmm. I would think what's needed for kexec is to create, as nearly as possible, identical conditions to what the BIOS / bootloader provides when jumping to the kernel entry point. Whatever agreements are set on entry to the kernel, kexec needs to match. And I think you want a completely identity mapped table to match those entry point requirements, that's why on other platforms, the condition is MMU turned off. From that point of view, it does make sense to special case UV systems for this. The restricted areas we're talking about are not in the map when the bootloader is started on the UV platform. > I really don't see any point in putting holes in such a page table for > any address below the highest address that is good for something. Given > that on some systems the MTRRs are insufficient to do there job it > definitely makes sense to not enable caching on areas that we don't > think are memory. Well, on the UV platform, these addresses are *not* good for something, at least from any processor's point of view, nor any IO device (they are not allowed to appear in any DMA or PCI bus master transaction, either). A hardware ASIC is using this portion of local RAM to hold some tables that are too large to put directly on the ASIC. Things turn ugly if anyone else tries to access these addresses. In another message, Pavin thanked you for you work on kexec. I'd like to express my appreciation also. In my current job, I'm mostly focused on its use for kdump kernels. I've been dealing with kernel crash dumps since running Unix on i386 machines, and always had do deal with "OK, but what if the kernel state gets corrupt enough that the disk driver won't work, or network if you're trying to do a remote dump." The use of kexec to start a fresh instance of the kernel is an excelent way to solve that problem, in my opinion. And a couple of jobs ago we were able to use it to restart a SAN switch after software upgrade, without needing to stop forwarding traffic, which wouldn't have been possible without kexec. Thanks, --> Steve Wahl -- Steve Wahl, Hewlett Packard Enterprise