Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp1107458pxb; Tue, 1 Feb 2022 18:45:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJy/9ANR7miPwac4OQ9ABQz4N/2Oz8cEEA6PEyVruWBluS/va2F3Xz/N4Xkk0WCyD+Why20e X-Received: by 2002:a05:6402:b6a:: with SMTP id cb10mr28280577edb.191.1643769938134; Tue, 01 Feb 2022 18:45:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643769938; cv=none; d=google.com; s=arc-20160816; b=peuhvP9Of7Ayq7Tr7L64gXpRm+iVv65YTCIsIfab9Z50dA1zJLCt4nSnkoM/sFCnj8 mMiATF1mPrfejZ512yscQUuh6xGZpWFiuPzxKa8utFdReUgd/axNpaqDxpElDqmtadpV i+51mOPpaQS8McB/LiqBQHxmsQITr9GWG2I0BW3RTIawD07Dva6JuTpDZo3aWujSK504 v8VrgNZA810W940x5nvPDGkzI2Yu0G/Cg4pgXWXP5DbiImrnIlnK6OkFtQCNed2S0mHU 5hzplrDl3r33XXeCj4d9+KRsmu5JWMwc829k+nyCDQvDAYMUs2Yu71MiMua22qKk21Nd s5vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=VaPv9in0Q0d5pY9Lnm4deq3TesKVG1iF4M3HxW9oxdw=; b=yOgdR0LpHpBYzcX1LDyX9bXUoeqqZQGAi99+WsmSx1cnpPFX2GCJqJcdjrbpZeqjqA HX0lDbVNh2GH/nl2YOWOa0YI6y9OyU9yMETO7BL4VDckhvogoZRB42c5aXaz+u4fj8Vk bNfBVfgoi0fzQVmg10cz85NUQ3Y2L4AxRwy/1CaIslQdwfs7G8dXvG02T2C+PZHpH85W 5PX34he14PmZDG/LvQAr31tWZlr0ggOQcEa/6D1+iBRUsSRE03/YruwzrMhanl8heGdY 9rV40KlwOUtyn0lyRSC67NlcKK80errgOry7BRK9UuA+usUQ5jnLNoL9V38EIkX4tL64 NPsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=WYtch715; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dp16si9926460ejc.640.2022.02.01.18.45.13; Tue, 01 Feb 2022 18:45:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=WYtch715; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232443AbiBACUu (ORCPT + 99 others); Mon, 31 Jan 2022 21:20:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231473AbiBACUu (ORCPT ); Mon, 31 Jan 2022 21:20:50 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC2FDC061714 for ; Mon, 31 Jan 2022 18:20:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=VaPv9in0Q0d5pY9Lnm4deq3TesKVG1iF4M3HxW9oxdw=; b=WYtch715pZRcNyhxCmmdPdqfVb ZvvzOSmL6yGAJhSSWvAdGVvVB1OriraaS+j9USYyk+pMBM+YqSWlBoiBtyeG7WBNz4Z5Nw5lCtATu QoZAjplgw10FePVkMNpbtvXDsPRt2KuJBeVx5dPhmaRF4Q//CcDaNdOzYoSbuPp+lNDcRahqY0zC5 NtPxbajxz6VZpkKhnhDnWIrsXMs+16AMG4Aq2IOPyB6n8smHjhS9qEoeHNV9ybp8QM0b4/FvjFoQE cEWpvX8JQabphnnZ1rQU3z3xTgKsuSCiUcDOAz590YawBuzwJLiZ2ZkkBC4M4HZblty6nPWAy4CAx gnVGx/2A==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nEimd-00BCwX-CU; Tue, 01 Feb 2022 02:20:39 +0000 Date: Tue, 1 Feb 2022 02:20:39 +0000 From: Matthew Wilcox To: Andrew Morton Cc: Michel Lespinasse , Linux-MM , linux-kernel@vger.kernel.org, kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Sebastian Andrzej Siewior Subject: Re: [PATCH v2 00/35] Speculative page faults Message-ID: References: <20220128131006.67712-1-michel@lespinasse.org> <20220131171434.89870a8f1ae294912e7ff19e@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220131171434.89870a8f1ae294912e7ff19e@linux-foundation.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 31, 2022 at 05:14:34PM -0800, Andrew Morton wrote: > On Fri, 28 Jan 2022 05:09:31 -0800 Michel Lespinasse wrote: > > The first step of a speculative page fault is to look up the vma and > > read its contents (currently by making a copy of the vma, though in > > principle it would be sufficient to only read the vma attributes that > > are used in page faults). The mmap sequence count is used to verify > > that there were no mmap writers concurrent to the lookup and copy steps. > > Note that walking rbtrees while there may potentially be concurrent > > writers is not an entirely new idea in linux, as latched rbtrees > > are already doing this. This is safe as long as the lookup is > > followed by a sequence check to verify that concurrency did not > > actually occur (and abort the speculative fault if it did). > > I'm surprised that descending the rbtree locklessly doesn't flat-out > oops the kernel. How are we assured that every pointer which is > encountered actually points at the right thing? Against things > which tear that tree down? It doesn't necessarily point at the _right_ thing. You may get entirely the wrong node in the tree if you race with a modification, but, as Michel says, you check the seqcount before you even look at the VMA (and if the seqcount indicates a modification, you throw away the result and fall back to the locked version). The rbtree always points to other rbtree nodes, so you aren't going to walk into some completely wrong data structure. > > The next step is to walk down the existing page table tree to find the > > current pte entry. This is done with interrupts disabled to avoid > > races with munmap(). > > Sebastian, could you please comment on this from the CONFIG_PREEMPT_RT > point of view? I am not a fan of this approach. For other reasons, I think we want to switch to RCU-freed page tables, and then we can walk the page tables with the RCU lock held. Some architectures already RCU-free the page tables, so I think it's just a matter of converting the rest.