Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp905341pxy; Wed, 28 Apr 2021 17:03:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwylExn1lc5utTVBVsMs2ybl/ROXUWRwhwraJhwV1lOGfTnMCAquV9wSqknvr7RLuEOo4kI X-Received: by 2002:a17:902:dac2:b029:ec:7fcb:1088 with SMTP id q2-20020a170902dac2b02900ec7fcb1088mr33279778plx.65.1619654599422; Wed, 28 Apr 2021 17:03:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619654599; cv=none; d=google.com; s=arc-20160816; b=vubms5OEdgU3nLxSZ7yzc+5QvHhxcWaTeEvnKYvAqop6yDA8TjDlhQvTSNzTJawVOV Z93QGxagbehrHm5yIe7EYz26griASkMK6jvUyOnJgtTaDlhpExftHUz6shV7bdxGLB2g uxkYd5mkrGYqjdKeWNz+hkYcBcXkZr0ASB7encyPu2pypnfMo3OylIOSFYoCHnu1AwGU /TPy7A3pK8ajvlxLk8sizDpLxq7O1RPGLYja5vnygPQLVYdEBbHYafv0q+mHswZqVtFQ 6jvt9oi9QFGMBt0IM414KDkhKVSKwu6/aMCWLBddXOfl+lvzTITaVDJTM5WaFx/+LbcD 9g9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:dkim-signature; bh=364G7l/8sx/U40osAdH/d+NTykEtV4SoCG8Fp6FtSTI=; b=MRDF2xJ4Ho4YGBFMAsNEOcZPWLUrQPo1BqRPRgcl9CPnHvlNx8G+C+JT1jXrOsZijb 5Q5gzTRNNoHqFHMRlMMukpMB3nFdgtwIeUNqwGP+g0XbJ3q545XsfV0a/wbBQmgRQEFx LYwcQxP2+9ZB0p0/O1q4V17Hqi1v3Lpf9a3V93YROtuty75WxxiJJr2FhYQOrWcOqC/M DQP427V/f0KBwNtoy7XRAaIkW0WtJNUowLIYPyBncb2m0ujv9j/UFuvCGi7wageoMNus fwUhrUO0az4mmiqIxN/TMxsZStvICJDDw1XT97SwL7Jl4zRQEpkec2aHzCX2S+63Ahsl dxTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (no key) header.i=@lespinasse.org header.s=srv-14-ed; dkim=pass (test mode) header.i=@lespinasse.org header.s=srv-14-rsa header.b=Vjbls5uw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lespinasse.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f18si1373515pjt.34.2021.04.28.17.03.05; Wed, 28 Apr 2021 17:03:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=neutral (no key) header.i=@lespinasse.org header.s=srv-14-ed; dkim=pass (test mode) header.i=@lespinasse.org header.s=srv-14-rsa header.b=Vjbls5uw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lespinasse.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229888AbhD2ADL (ORCPT + 99 others); Wed, 28 Apr 2021 20:03:11 -0400 Received: from server.lespinasse.org ([63.205.204.226]:47471 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229474AbhD2ADL (ORCPT ); Wed, 28 Apr 2021 20:03:11 -0400 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-ed; t=1619654545; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to : from; bh=364G7l/8sx/U40osAdH/d+NTykEtV4SoCG8Fp6FtSTI=; b=cmUQPcwJhUN8pkqBnlbWpiz8waNG8cGGmO/gbi+NRxow6uC9T7phmzgXaYj0EMpqgaNdI CUEiqm1DV7q0WPiCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-rsa; t=1619654545; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to : from; bh=364G7l/8sx/U40osAdH/d+NTykEtV4SoCG8Fp6FtSTI=; b=Vjbls5uwhuaxHrVgNM7Aq2JLRYRVCdW5Esr84dd4L38oBpEb7dTfneXT9MmQ9VSupdFdz RbcaWJA/OFcdoNFxaUlln3j5v8qkqaKkmFpLjXrYPvlvmUG6dppb95nmXj+2yzuRs9yiw9U u6bRuvWl5Zsu+fSlcjgZ9/HHVjyO5YTpCDPDYroj4km9ACktDIR6sxCtH6H07OfjCKumZZ5 dVnG//O5uFJ86ETD+eASuo6KWQ7s6NEHwZZ44k0qyqHu45ou3VVYCw8ev/0EY/J0RAadvmw VQ9x1MMW5iMVoeZpwIsQVRYVY+e5raBSB+T0OncODFSFeTQPT7VdK6p5vMEQ== Received: by server.lespinasse.org (Postfix, from userid 1000) id E09771602DB; Wed, 28 Apr 2021 17:02:25 -0700 (PDT) Date: Wed, 28 Apr 2021 17:02:25 -0700 From: Michel Lespinasse To: "Paul E. McKenney" , Andy Lutomirski Cc: Michel Lespinasse , Linux-MM , Laurent Dufour , Peter Zijlstra , Michal Hocko , Matthew Wilcox , Rik van Riel , Andrew Morton , Suren Baghdasaryan , Joel Fernandes , Rom Lemarchand , Linux-Kernel Subject: Re: [RFC PATCH 13/37] mm: implement speculative handling in __handle_mm_fault(). Message-ID: <20210429000225.GC10973@lespinasse.org> References: <20210407014502.24091-1-michel@lespinasse.org> <20210407014502.24091-14-michel@lespinasse.org> <20210428145823.GA856@lespinasse.org> <20210428161108.GP975577@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210428161108.GP975577@paulmck-ThinkPad-P17-Gen-1> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 28, 2021 at 09:11:08AM -0700, Paul E. McKenney wrote: > On Wed, Apr 28, 2021 at 08:13:53AM -0700, Andy Lutomirski wrote: > > On Wed, Apr 28, 2021 at 8:05 AM Michel Lespinasse wrote: > > > > > > On Wed, Apr 07, 2021 at 08:36:01AM -0700, Andy Lutomirski wrote: > > > > On 4/6/21 6:44 PM, Michel Lespinasse wrote: > > > > > The page table tree is walked with local irqs disabled, which prevents > > > > > page table reclamation (similarly to what fast GUP does). The logic is > > > > > otherwise similar to the non-speculative path, but with additional > > > > > restrictions: in the speculative path, we do not handle huge pages or > > > > > wiring new pages tables. > > > > > > > > Not on most architectures. Quoting the actual comment in mm/gup.c: > > > > > > > > > * Before activating this code, please be aware that the following assumptions > > > > > * are currently made: > > > > > * > > > > > * *) Either MMU_GATHER_RCU_TABLE_FREE is enabled, and tlb_remove_table() is used to > > > > > * free pages containing page tables or TLB flushing requires IPI broadcast. > > > > > > > > On MMU_GATHER_RCU_TABLE_FREE architectures, you cannot make the > > > > assumption that it is safe to dereference a pointer in a page table just > > > > because irqs are off. You need RCU protection, too. > > > > > > > > You have the same error in the cover letter. > > > > > > Hi Andy, > > > > > > Thanks for your comment. At first I thought did not matter, because we > > > only enable ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT on selected > > > architectures, and I thought MMU_GATHER_RCU_TABLE_FREE is not set on > > > these. But I was wrong - MMU_GATHER_RCU_TABLE_FREE is enabled on X86 > > > with paravirt. So I took another look at fast GUP to make sure I > > > actually understand it. > > > > > > This brings a question about lockless_pages_from_mm() - I see it > > > disabling interrupts, which it explains is necessary for disabling THP > > > splitting IPIs, but I do not see it taking an RCU read lock as would > > > be necessary for preventing paga table freeing on > > > MMU_GATHER_RCU_TABLE_FREE configs. I figure local_irq_save() > > > indirectly takes an rcu read lock somehow ? I think this is something > > > I should also mention in my explanation, and I have not seen a good > > > description of this on the fast GUP side... > > > > Sounds like a bug! That being said, based on my extremely limited > > understanding of how the common RCU modes work, local_irq_save() > > probably implies an RCU lock in at least some cases. Hi Paul! > > In modern kernels, local_irq_save() does have RCU reader semantics, > meaning that synchronize_rcu() will wait for pre-exiting irq-disabled > regions. It will also wait for pre-existing bh-disable, preempt-disable, > and of course rcu_read_lock() sections of code. Thanks Paul for confirming / clarifying this. BTW, it would be good to add this to the rcu header files, just so people have something to reference to when they depend on such behavior (like fast GUP currently does). Going back to my patch. I don't need to protect against THP splitting here, as I'm only handling the small page case. So when MMU_GATHER_RCU_TABLE_FREE is enabled, I *think* I could get away with using only an rcu read lock, instead of disabling interrupts which implicitly creates the rcu read lock. I'm not sure which way to go - fast GUP always disables interrupts regardless of the MMU_GATHER_RCU_TABLE_FREE setting, and I think there is a case to be made for following the fast GUP stes rather than trying to be smarter. Andy, do you have any opinion on this ? Or anyone else really ? Thanks, -- Michel "walken" Lespinasse