Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp1047932pxb; Tue, 1 Feb 2022 16:51:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJyCSvjbudBhpe2Y1OkVYouBJqQ1QXDzZzscbbczacJJf8w3D1zzbIoJ2bpJCiZw3EEXneMI X-Received: by 2002:a17:906:7313:: with SMTP id di19mr23412738ejc.160.1643763108309; Tue, 01 Feb 2022 16:51:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643763108; cv=none; d=google.com; s=arc-20160816; b=XfkET7NKDpZTF8kz5PfozpPZ6hCzVt5iZUh2Upg0LCE4FaZ0n4Etz+WMJsKeeEpJYt 9CPBDQXeRA76fkCZpBFJaDICXg2V3Fk0z6GC+SoL1yRDoKOLa5M4KABdWp+jAY9miRuo 4aj+3cQkiXS0cVgpagCedbQs/DDKGXrvbUZC7JpmPO0+mjEJL/3nqZ85H9PFC1tNP8IF Hli/GPVaUzgWbi1MBRXPbEgfFn9J9z7aHaqFdkElVFxPxRgi8WYHlMVmRU21Mt3uPoiu 3JRh96HAhHvR9kMlAq7dsqi3KPhF6OfcTR4h9m443VymmlbdNiHKXgTKM3BrC5xQKx5k sAgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=jb5GwGdZxPMeRm1TXWRq6YYpGWpGhyoMzkXjQ80D8W0=; b=uXY4OkN7mIDzwj8nqGXI8UMPz9UmF8YxeGrHGlKKD2bVIyKZSnqD0asEjRTnAoZB5/ Y5rTOvDqCQBTYdQjpz6Gga94e9a2zTSBe4oh9KdmT/XDU5JieZQVnUhoqtxsKbKi0Q1o Ed/G7LFXKITMbf3UMcahVN9LU53aL76NwyI9jc444zakOG3K7dZUXDKoDxVg4pcUzpIt pCvArrHslM/Ya85UYDFPo7VJPQ8CbdvIM9BjDQKbKK5FnFn3a5CZEXCkOsQaWv17vUMW +ZZiMacRswEGQV6ejxBhnpk6QSz5J+Bj+EqdwblN1SZRo+lQwLxqPJblw4gQurAOosyo Vlbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=oSO9eAEr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v2si2183784edc.280.2022.02.01.16.51.23; Tue, 01 Feb 2022 16:51:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=oSO9eAEr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231690AbiBABOj (ORCPT + 99 others); Mon, 31 Jan 2022 20:14:39 -0500 Received: from ams.source.kernel.org ([145.40.68.75]:43482 "EHLO ams.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231540AbiBABOi (ORCPT ); Mon, 31 Jan 2022 20:14:38 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A0E36B82CD0 for ; Tue, 1 Feb 2022 01:14:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 919CAC340EC; Tue, 1 Feb 2022 01:14:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1643678076; bh=iDZ+K89IPcIeDF2+KSlI8ytWO8Ygxgu09Pw+Zwf1JDA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=oSO9eAErVKISl9lFXT7Y6DX6NTzAX76VfwBTdjHjefW0x52T8Suv9AAlzX3jlgArX 1AA4Gx0j4h3uXhg9NvAUYxU1SXFWURyt0fWQy2LA7SM3YCbK4DF5Z2zbjYL6kMFz+n kb+85H0UUhtMzSraRMLKk6GGeVMHQ77zW89WOj94= Date: Mon, 31 Jan 2022 17:14:34 -0800 From: Andrew Morton To: Michel Lespinasse Cc: Linux-MM , linux-kernel@vger.kernel.org, kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Sebastian Andrzej Siewior Subject: Re: [PATCH v2 00/35] Speculative page faults Message-Id: <20220131171434.89870a8f1ae294912e7ff19e@linux-foundation.org> In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 28 Jan 2022 05:09:31 -0800 Michel Lespinasse wrote: > Patchset summary: > > Classical page fault processing takes the mmap read lock in order to > prevent races with mmap writers. In contrast, speculative fault > processing does not take the mmap read lock, and instead verifies, > when the results of the page fault are about to get committed and > become visible to other threads, that no mmap writers have been > running concurrently with the page fault. If the check fails, > speculative updates do not get committed and the fault is retried > in the usual, non-speculative way (with the mmap read lock held). > > The concurrency check is implemented using a per-mm mmap sequence count. > The counter is incremented at the beginning and end of each mmap write > operation. If the counter is initially observed to have an even value, > and has the same value later on, the observer can deduce that no mmap > writers have been running concurrently with it between those two times. > This is similar to a seqlock, except that readers never spin on the > counter value (they would instead revert to taking the mmap read lock), > and writers are allowed to sleep. One benefit of this approach is that > it requires no writer side changes, just some hooks in the mmap write > lock APIs that writers already use. > > The first step of a speculative page fault is to look up the vma and > read its contents (currently by making a copy of the vma, though in > principle it would be sufficient to only read the vma attributes that > are used in page faults). The mmap sequence count is used to verify > that there were no mmap writers concurrent to the lookup and copy steps. > Note that walking rbtrees while there may potentially be concurrent > writers is not an entirely new idea in linux, as latched rbtrees > are already doing this. This is safe as long as the lookup is > followed by a sequence check to verify that concurrency did not > actually occur (and abort the speculative fault if it did). I'm surprised that descending the rbtree locklessly doesn't flat-out oops the kernel. How are we assured that every pointer which is encountered actually points at the right thing? Against things which tear that tree down? > The next step is to walk down the existing page table tree to find the > current pte entry. This is done with interrupts disabled to avoid > races with munmap(). Sebastian, could you please comment on this from the CONFIG_PREEMPT_RT point of view? > Again, not an entirely new idea, as this repeats > a pattern already present in fast GUP. Similar precautions are also > taken when taking the page table lock. > > Breaking COW on an existing mapping may require firing MMU notifiers. > Some care is required to avoid racing with registering new notifiers. > This patchset adds a new per-cpu rwsem to handle this situation.