Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp8867201rwr; Thu, 11 May 2023 07:12:28 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ66Z15j1ct3Ajvd4A/zrPCbbb0/Jnh9zysF4YoYfF3EbNb4AnCvEbMiCFZ6TTqe8EzUYr8E X-Received: by 2002:a05:6a20:938e:b0:ea:fa7f:f879 with SMTP id x14-20020a056a20938e00b000eafa7ff879mr30604119pzh.42.1683814348660; Thu, 11 May 2023 07:12:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683814348; cv=none; d=google.com; s=arc-20160816; b=IPbob3MaJga51U+JS0+yukA+csxkJTtWhuICSZcJsQnuDcXzPt1H1FV5ERpThBMtJZ SvtXbQOPwIbB1JfM/tgztjUQVPd4UaJ7ine+gO3rU+W9LmLYZr7Rj15A0Ny8zHPK3H56 BY5Jl0HRxb/aymD66kh+D+whyAviyYgEpGHIw/rto26miJOxSKxg5M3wgii4eraJeNrm 2msPUfUWaU4OFr5g+BxL9lCSMj7BAB0W8TpToSpBMUG13vgYxdJwmRMUynlYmNp9URvH bL010kPh65jH6s49sa+TKj9h61eSx229Td3YT+DGD4ON/SSnBm6XN3lKFPqth25CzhD9 EnyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=agTIVDV40QlNuRDmtaMpYQzLmx8ylyt8K8o0YwfCE44=; b=kYYoj3AnXXpy5RpJTXyvyh3kuUV2ueD83D3jR43+BnmgbVTjx35kJhWv8khu+LYime qtTFNxjZGXffu7BWKHi9QMDJ8eTe0NhxZxlsZUkYgmenTeixAVmq+IyAwjvzj74BhoBp XYpeFiywA27OiQRkiaT13MeA1xKHi3zwiFZo252J6i1YNy90Kl/kMwqc3qo+asentj7k doI9vsHU7i/3OhGu1ICb2DaK/UM1ozK0dDLPjI3FmvFTgz1toGxyTq4Ng2QJjH94rC/H xJYlccprhb2Vwzoy+aZNTZpGpY7kAnyP445i2V6t7r6j4acvlvkVGb402UCy46PalE76 f65g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=lL2e25Q7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f64-20020a625143000000b0064638542cc1si7742325pfb.61.2023.05.11.07.12.12; Thu, 11 May 2023 07:12:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=lL2e25Q7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238237AbjEKOEY (ORCPT + 99 others); Thu, 11 May 2023 10:04:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238236AbjEKOEW (ORCPT ); Thu, 11 May 2023 10:04:22 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A353F10FF; Thu, 11 May 2023 07:04:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=agTIVDV40QlNuRDmtaMpYQzLmx8ylyt8K8o0YwfCE44=; b=lL2e25Q7Ym9Z2f1ilo2kWwHtLS 4LEtDqoDAi/1aC4kjoeLMlHtLt+a+FS4EW3ttxbaQDhTXZfyi02WOEzNc3/vCsr0owbyF5PEsIwHc 4EGA9zShanxJYpMWANvO4AfYvUXSJigLrvPdNGyjPeVrrNrqH+tCpFoKNI6P6bYEGIKsYYtBzAtzu 8Ajl8g5ng54yFG0pf0AiaE1GeB3gEJ+vkOMxu+NkdXV1EBgexS5qxecZBXe6DpvnyQd7D8QuN/WHz qIZBX6djd2pg1T+ptsLEa/faZBnf92py7b8vnk1aHVuruD1rplz5XIZSxcsHdZBQZO/oPVfH/8958 /oI/ASsA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1px6sh-00HGyK-LW; Thu, 11 May 2023 14:02:55 +0000 Date: Thu, 11 May 2023 15:02:55 +0100 From: Matthew Wilcox To: Hugh Dickins Cc: Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Russell King , Catalin Marinas , Will Deacon , Geert Uytterhoeven , Greg Ungerer , Michal Simek , Thomas Bogendoerfer , Helge Deller , John David Anglin , "Aneesh Kumar K.V" , Michael Ellerman , Alexandre Ghiti , Palmer Dabbelt , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , John Paul Adrian Glaubitz , "David S. Miller" , Chris Zankel , Max Filippov , Peter Zijlstra , x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michel Lespinasse Subject: Re: [PATCH 00/23] arch: allow pte_offset_map[_lock]() to fail Message-ID: References: <77a5d8c-406b-7068-4f17-23b7ac53bc83@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 10, 2023 at 09:35:44PM -0700, Hugh Dickins wrote: > On Wed, 10 May 2023, Matthew Wilcox wrote: > > On Tue, May 09, 2023 at 09:39:13PM -0700, Hugh Dickins wrote: > > > Two: pte_offset_map() will need to do an rcu_read_lock(), with the > > > corresponding rcu_read_unlock() in pte_unmap(). But most architectures > > > never supported CONFIG_HIGHPTE, so some don't always call pte_unmap() > > > after pte_offset_map(), or have used userspace pte_offset_map() where > > > pte_offset_kernel() is more correct. No problem in the current tree, > > > but a problem once an rcu_read_unlock() will be needed to keep balance. > > > > Hi Hugh, > > > > I shall have to spend some time looking at these patches, but at LSFMM > > just a few hours ago, I proposed and nobody objected to removing > > CONFIG_HIGHPTE. I don't intend to take action on that consensus > > immediately, so I can certainly wait until your patches are applied, but > > if this information simplifies what you're doing, feel free to act on it. > > Thanks a lot, Matthew: very considerate, as usual. > > Yes, I did see your "Whither Highmem?" (wither highmem!) proposal on the I'm glad somebody noticed the pun ;-) > list, and it did make me think, better get these patches and preview out > soon, before you get to vanish pte_unmap() altogether. HIGHMEM or not, > HIGHPTE or not, I think pte_offset_map() and pte_unmap() still have an > important role to play. > > I don't really understand why you're going down a remove-CONFIG_HIGHPTE > route: I thought you were motivated by the awkardness of kmap on large > folios; but I don't see how removing HIGHPTE helps with that at all > (unless you have a "large page tables" effort in mind, but I doubt it). Quite right, my primary concern is filesystem metadata; primarily directories as I don't think anybody has ever supported symlinks or superblocks larger than 4kB. I was thinking that removing CONFIG_HIGHPTE might simplify the page fault handling path a little, but now I've looked at it some more, and I'm not sure there's any simplification to be had. It should probably use kmap_local instead of kmap_atomic(), though. > But I've no investment in CONFIG_HIGHPTE if people think now is the > time to remove it: I disagree, but wouldn't miss it myself - so long > as you leave pte_offset_map() and pte_unmap() (under whatever names). > > I don't think removing CONFIG_HIGHPTE will simplify what I'm doing. > For a moment it looked like it would: the PAE case is nasty (and our > data centres have not been on PAE for a long time, so it wasn't a > problem I had to face before); and knowing pmd_high must be 0 for a > page table looked like it would help, but now I'm not so sure of that > (hmm, I'm changing my mind again as I write). > > Peter's pmdp_get_lockless() does rely for complete correctness on > interrupts being disabled, and I suspect that I may be forced in the > PAE case to do so briefly; but detest that notion. For now I'm just > deferring it, hoping for a better idea before third series finalized. > > I mention this (and Cc Peter) in passing: don't want this arch thread > to go down into that rabbit hole: we can start a fresh thread on it if > you wish, but right now my priority is commit messages for the second > series, rather than solving (or even detailing) the PAE problem. I infer that what you need is a pte_access_start() and a pte_access_end() which look like they can be plausibly rcu_read_lock() and rcu_read_unlock(), but might need to be local_irq_save() and local_irq_restore() in some configurations? We also talked about moving x86 to always RCU-free page tables in order to make accessing /proc/$pid/smaps lockless. I believe Michel is going to take a swing at this project.