Received: by 2002:a05:7412:2a8a:b0:fc:a2b0:25d7 with SMTP id u10csp260438rdh; Wed, 7 Feb 2024 04:12:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IHNd4vrE3fMjE/TY8NihY5rAZfj6YLYdl9IQkJmE4qudk6URIP+5J1K3GK/WqPB5PUlKMD7 X-Received: by 2002:a05:6358:93a8:b0:176:3e0d:9910 with SMTP id h40-20020a05635893a800b001763e0d9910mr2761641rwb.0.1707307963428; Wed, 07 Feb 2024 04:12:43 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707307963; cv=pass; d=google.com; s=arc-20160816; b=0C7176LkPDV8H3H3pzp8hmtH3gN7+g+niXNJucKjCyE1Ub9j4ruVT8hOrk8KLEOBJf 0qc93nf8Bk2pOOrHD/7oxqlEL7rJSEyLtoD8NS73FfA6hCH1lVCeV80/THfgVkXXsv89 C9R2XwtpgqivqaJQV1X5sBIJJeJL9xiWK0QXKuVMykEKNJXLQ3lmlO3TFIZy3o/TSv5K MAp3C/BdS9UJKLPy38QtaXt76erIJMxz81iWIWUJLmoIzWTyO1077CulOhHNHYkN0gpm KG+mTtMcQ86K16qd2PyYXBgrsmojSuCw1HaGJC3dd36mWm7aVTC/bYWnBGNOYIm0GQaJ 5RNg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date:dkim-signature; bh=O+iZCV3C/1AAaB7Vry3eoAC4wcfa8jPoqGfLBwJ/kZU=; fh=IoJ6YuyKq6mU0TlMpg2O8Bo60phkepOi4Ugjrct+PYI=; b=itRPBSidCzs8RAm50WAN5xPEXhc4OAABxiEqEuAa8+0s0cgRAlg4obV6XRWdt5e7su cLLqsAXImDREVAb5xUm4JLlltszWdrV9ZqlPZS+3yeaHqUQZ7InVgcFYBpGZJr0gqQqd MGyZbnMyrxtRpzHdXjC/CsDbzr2y+ZZuLhr4mkErp+Am0/LneF1C0300T1imH97B3nR4 73OEcFh4psJ/q0PXSqvedUQ7D4GOZNtqtB70NGqhVstXjl0LchBpXYeLN14BVfodmE3G lmNJtgScChP1+aAAYoxCbtOzSH1iPnGELuHA7c15DjPo7dZy8bOjgQ9OAzaLsM6AMPlh fJzA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=PYCZmSdg; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-56469-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-56469-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=2; AJvYcCWcDRV4rSEu9fr4BPwltQ3FssJr4hWU3B0ya4ZuDqYcYEOVFQjo4ZtGvmCHHCAiNbz+dIuMSO71fLEm54ZvV6ZN+B2Idmj7zl+DdKhh7A== Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id s194-20020a632ccb000000b005dbd518e89esi1359149pgs.761.2024.02.07.04.12.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Feb 2024 04:12:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-56469-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=PYCZmSdg; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-56469-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-56469-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 1685F2915B4 for ; Wed, 7 Feb 2024 12:12:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DC8D85A0E6; Wed, 7 Feb 2024 12:11:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PYCZmSdg" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 018E359B66 for ; Wed, 7 Feb 2024 12:11:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707307892; cv=none; b=NKXBtOa8y76G8luUZUwnG8thxkfqdAWMDSn+Rv3DB3jszFobx0gC4Bvulh54ofje/t0fknyddk/k/IG4SGTG2TRaIMKIw1D7eP1x/6M2pWNbjTYv6x3yrmXQ5fOUX1PHuiI+VSXPy84Epq8OY7PgQjeVZFLdwblD7y8C02YJORI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707307892; c=relaxed/simple; bh=+eWY02fFCjdYE59uuyivWbW4kftlv5vlA77S17iwxcQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=XGu51WLH66VYsQP6nKxBKRs3lw/FXFa96aLsVubU8cLb/LFdCWiwD+91hrMWh9GyfxAbEdkvRshwoTzPf7J9n6A5r3bvVltgj2mPuP/OQs8EYWIXf+91GdIEYvAlV3vxAgzgt1xPMG99/o75aFZ8/Edh0k2axE/HemYYF2jmXpM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PYCZmSdg; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 470F1C43390; Wed, 7 Feb 2024 12:11:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707307891; bh=+eWY02fFCjdYE59uuyivWbW4kftlv5vlA77S17iwxcQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PYCZmSdgDd7Ka4HkpDKPCOWDml3yR/9iyJOsWJqfoo0xZfG0Ba1bG6M/iB+vgF/Ek XlLLMDXXQyJyEXMCIrIWWy1LOaHDyxHMNIDH0xiwzZsN7J+Xxi12u20fRNlXOMNLxm 1HNQhaJ1v6yit9zALbj44UNRRCI2oP+uJdjhnhDJNh8SlDO75McLcDTCTaISZ8VWLL r/xknjs+QVIEOuI+O0QH8zre4TFjDT4dJUgs1dQB/L5hqomcWda890zTdMcs5dVDFJ QbdXpra5hKOnIhL/UqB/uZK2HjCTiFO6PoTGymYNhttmkZcTOkXf4ZMjqY70+eiarE eeRjS2YbwYi7A== Date: Wed, 7 Feb 2024 12:11:25 +0000 From: Will Deacon To: Matthew Wilcox Cc: Nanyong Sun , Catalin Marinas , mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org, anshuman.khandual@arm.com, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize Message-ID: <20240207121125.GA22234@willie-the-truck> References: <20240113094436.2506396-1-sunnanyong@huawei.com> <20240207111252.GA22167@willie-the-truck> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) On Wed, Feb 07, 2024 at 11:21:17AM +0000, Matthew Wilcox wrote: > On Wed, Feb 07, 2024 at 11:12:52AM +0000, Will Deacon wrote: > > On Sat, Jan 27, 2024 at 01:04:15PM +0800, Nanyong Sun wrote: > > > > > > On 2024/1/26 2:06, Catalin Marinas wrote: > > > > On Sat, Jan 13, 2024 at 05:44:33PM +0800, Nanyong Sun wrote: > > > > > HVO was previously disabled on arm64 [1] due to the lack of necessary > > > > > BBM(break-before-make) logic when changing page tables. > > > > > This set of patches fix this by adding necessary BBM sequence when > > > > > changing page table, and supporting vmemmap page fault handling to > > > > > fixup kernel address translation fault if vmemmap is concurrently accessed. > > > > I'm not keen on this approach. I'm not even sure it's safe. In the > > > > second patch, you take the init_mm.page_table_lock on the fault path but > > > > are we sure this is unlocked when the fault was taken? > > > I think this situation is impossible. In the implementation of the second > > > patch, when the page table is being corrupted > > > (the time window when a page fault may occur), vmemmap_update_pte() already > > > holds the init_mm.page_table_lock, > > > and unlock it until page table update is done.Another thread could not hold > > > the init_mm.page_table_lock and > > > also trigger a page fault at the same time. > > > If I have missed any points in my thinking, please correct me. Thank you. > > > > It still strikes me as incredibly fragile to handle the fault and trying > > to reason about all the users of 'struct page' is impossible. For example, > > can the fault happen from irq context? > > The pte lock cannot be taken in irq context (which I think is what > you're asking?) While it is not possible to reason about all users of > struct page, we are somewhat relieved of that work by noting that this is > only for hugetlbfs, so we don't need to reason about slab, page tables, > netmem or zsmalloc. My concern is that an interrupt handler tries to access a 'struct page' which faults due to another core splitting a pmd mapping for the vmemmap. In this case, I think we'll end up trying to resolve the fault from irq context, which will try to take the spinlock. Avoiding the fault would make this considerably more robust and the architecture has introduced features to avoid break-before-make in some circumstances (see FEAT_BBM and its levels), so having this optimisation conditional on that would seem to be a better approach in my opinion. Will