Received: by 2002:a05:7412:2a8a:b0:fc:a2b0:25d7 with SMTP id u10csp235786rdh; Wed, 7 Feb 2024 03:21:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IGxnYY3kiAcx42wXmOzQOtWCE9NGoth3kklPJ/9WODmJ0HR16pNFGLgUjED5/rHQcrsrZDy X-Received: by 2002:a05:620a:c9b:b0:785:82e4:d5e9 with SMTP id q27-20020a05620a0c9b00b0078582e4d5e9mr5001026qki.41.1707304901007; Wed, 07 Feb 2024 03:21:41 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707304900; cv=pass; d=google.com; s=arc-20160816; b=JWOMPPB4Ni1msnsgb7JXnDxRHer3EkOFU4s8aMaC+Yc0dVCLkkYQcbrJQ+0lOqIJJn +EKpC9Dwji1cbQIhWdaX2KlWlPzkcBPD2UWNSL3hRo2tvJ+iYCdWeu2EzJxbzE+vHm9u GaB6zg8gFbzZUcavglQHwvAuudOVRG3o2+jJSOoPqzfvPB0eWE0MW0dujbFa+n9FRoOI qJtYR51x4QtyPYAUhzRuxWfkGerNo/oYSPiJ5KuK6SwrSAMbu4UG5o0rzp/wDl/XWuQz pRlFd1an2WoGfyINa005tUfvVQx7oZZRhIGnuYJzmM6sREryff67uF1jF4wbJH8E5Fh1 fdAw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=WpX5mJ+dnqA+dv/RTqJrWW+rQFLmTBT94nD0uWiwdMI=; fh=eYYjC3K05sP5R4/hf0oZy5/1f+jygWbudjnXJLQbmts=; b=CS4iL7jkXvG084/Vh4ZdHojklkVYEk0NddAz9fNld+RNKfTA7u2gkC/vP7mgLUoawR D5IKj1DmKNDBcO7eeZuDSf+K7OIiF7uTtDiO2AvAkrH8GX3zp6wDvI2JtlG2Pd+edGGF CLScRy7VVv54SFfkbwASPNHl9Yu/UginF0bUsBc76afHODGdeKVNY5HsOW0XKl+XPJK/ SRKl/4tbXLJF0l2CxtYHowO5OYsga87CECjuyMhR6aCgJ9LZG1gsgmi32l7Mt7srDrgS VzuMf5N4x+E0LbK9hYkFHsmvW8JhonTTXR4qSaOclFgR/1+6ePHu+uYoKfXEt9HShHpk qk/A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ImxC9KpO; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-56383-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-56383-linux.lists.archive=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCWID+IUYdCZjfGltiK45B5x/oV3JI7HslQLKHKWXvBsW82uY8tN4ehHW4hgqgNYFdBqlTANeA/BoedLXKMUhuFyoWG4iQw1AWsFTnddNg== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id l28-20020a05620a211c00b007859f225828si435766qkl.90.2024.02.07.03.21.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Feb 2024 03:21:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-56383-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ImxC9KpO; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-56383-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-56383-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id B0BFB1C25790 for ; Wed, 7 Feb 2024 11:21:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 54BDD1B295; Wed, 7 Feb 2024 11:21:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="ImxC9KpO" Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2292417BCB for ; Wed, 7 Feb 2024 11:21:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707304893; cv=none; b=cWoBUfMVVygqG7/YwC49UkbKAQqG1VVqBhLDtGK+qfS7eNV+b3+pgoUOXfI+N0dpwYGKZNBkflEH19UTMOnsoi1KW9dHGfNC/tSh2cuNpbAH2bm6FwITK7HXz1ifusMh4qWjGzltd8A2IEVYVgCe/NtNXhnh49r34AopnWLvREQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707304893; c=relaxed/simple; bh=dCNfvFwSJa8Rp8P+IbRv4dPyBw+ZIdZdM5oYn2lS0eE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Wq9WSdPkyYXQIjweaPgzVT8k2HcvGg3GJhDxxnt0v2zQTnBohpXEL++NkeNs7bliPhIEZqrh4Hx8sfmSoq5Typ0LvWI2k3bj7VMFQHauQiul36yYbb1+HZkiqmEBpt7+5ctmlfnW9qgZQxzCHXqQQdK9a4KkKDW6QpVieFeyv6Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=ImxC9KpO; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=WpX5mJ+dnqA+dv/RTqJrWW+rQFLmTBT94nD0uWiwdMI=; b=ImxC9KpOSn9Kp0yFacpAPbEXBC QgbMi18qkmmyI9lmhwLdu/xEpKlsg9SptWborQ7SxET2RfEPO+Etjs97p0IoPatPFO8iaA7bC36YI e7WvpDgWq+MzGUdEBxK7cVsjWIRazsEidEFb/U+NdZDDGTs8FISZ6QMNvr3SjN6CZ7a6bkW+ioYky UEjihsZRLEASsrjmij8aP/NxFrPSfPJaX2CgtUHmVgu5Wj6uwoNAH+PFymrZivF7DJhEoTOvsTqiF fpw9JM9IxVo6HV5JHCv3GF301Y0zZnQF2HU2kGXzcDvZb+6pIdofdsvOGJtPSayOiLNECLLfJVaRE dl3sWP0Q==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rXfzR-0000000EzNf-1XL0; Wed, 07 Feb 2024 11:21:17 +0000 Date: Wed, 7 Feb 2024 11:21:17 +0000 From: Matthew Wilcox To: Will Deacon Cc: Nanyong Sun , Catalin Marinas , mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org, anshuman.khandual@arm.com, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize Message-ID: References: <20240113094436.2506396-1-sunnanyong@huawei.com> <20240207111252.GA22167@willie-the-truck> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240207111252.GA22167@willie-the-truck> On Wed, Feb 07, 2024 at 11:12:52AM +0000, Will Deacon wrote: > On Sat, Jan 27, 2024 at 01:04:15PM +0800, Nanyong Sun wrote: > > > > On 2024/1/26 2:06, Catalin Marinas wrote: > > > On Sat, Jan 13, 2024 at 05:44:33PM +0800, Nanyong Sun wrote: > > > > HVO was previously disabled on arm64 [1] due to the lack of necessary > > > > BBM(break-before-make) logic when changing page tables. > > > > This set of patches fix this by adding necessary BBM sequence when > > > > changing page table, and supporting vmemmap page fault handling to > > > > fixup kernel address translation fault if vmemmap is concurrently accessed. > > > I'm not keen on this approach. I'm not even sure it's safe. In the > > > second patch, you take the init_mm.page_table_lock on the fault path but > > > are we sure this is unlocked when the fault was taken? > > I think this situation is impossible. In the implementation of the second > > patch, when the page table is being corrupted > > (the time window when a page fault may occur), vmemmap_update_pte() already > > holds the init_mm.page_table_lock, > > and unlock it until page table update is done.Another thread could not hold > > the init_mm.page_table_lock and > > also trigger a page fault at the same time. > > If I have missed any points in my thinking, please correct me. Thank you. > > It still strikes me as incredibly fragile to handle the fault and trying > to reason about all the users of 'struct page' is impossible. For example, > can the fault happen from irq context? The pte lock cannot be taken in irq context (which I think is what you're asking?) While it is not possible to reason about all users of struct page, we are somewhat relieved of that work by noting that this is only for hugetlbfs, so we don't need to reason about slab, page tables, netmem or zsmalloc. > If we want to optimise the vmemmap mapping for arm64, I think we need to > consider approaches which avoid the possibility of the fault altogether. > It's more complicated to implement, but I think it would be a lot more > robust. > > Andrew -- please can you drop these from -next? > > Thanks, > > Will