Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp292107pxb; Fri, 15 Jan 2021 03:29:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJzyEU+x061qF+CQHi9pxMs61gspwxyjT1oKI1nEBqQhOizM3H+1ymzCDwGIRpZrYhlQ5dVa X-Received: by 2002:a17:907:2116:: with SMTP id qn22mr8423217ejb.483.1610710160610; Fri, 15 Jan 2021 03:29:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610710160; cv=none; d=google.com; s=arc-20160816; b=Wj1i88ECSjPoBOBsMO6pK2I04/Mwy32Xsra8bmsszbTzbQGzfk43ohvgB/Qyl0R1Mw D+0MwGKPtVydLhUCcTEleZxNaAvOOq3HXOgXuTWmPKbCu4e1//hH9r1CrpdU6v98byfX rK7aJNUPgfewXa43mMbicAtI6PykWeXmA8NMgGMlQtlokRkX7dIrjEJiTpxnf1wtlP5h gTcix+0ivnsi7SpdDeRLxN+teic8QtJ+7ndTcRBOsZgePXKdA8AOntPJ3uzQ5pLrPFuo Pl86eeK1kUKQUvZEyNzZvFoUyqiDVaiujy3mfeWtrbN2eGSX1+eKcUOqOFxFoc1YEsEf 0Vfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=n8Zo16gp0hdMzNvU1Xq6MrgflHXagBE6FngCLutY4po=; b=d+AIM3RSttAAVTm1Urt2rm2rQpDmIzMnxONIBapQLBD5cJmmUJmXtPaZLroe7pFhqa eRQL9zpqhhuyMhO5AeUbcXxY0Xvde0Q1w8YziFD9bvi2SgXtC6JOY4Hyah1BX39XFtTU fq/NQJFDLfCvPHHkjWtWZRTSJEKTMH6I6WsRnvGYDf6uqg/l7sHeC41vhcNAh3pX280G XvU3CgJGOMKBIw05H1CIWkg5J6gihu5+as5gZsQzHmEv22m4ACALP54efbK6mhELYeLB P3EL6TB9rKoneB5pfMTBqVPON1PC56+8LyQDpsPxsvV87vECDosxOljvCVq4XP/LLffu rSHw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id uz26si2413516ejb.10.2021.01.15.03.28.56; Fri, 15 Jan 2021 03:29:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726085AbhAOL2E (ORCPT + 99 others); Fri, 15 Jan 2021 06:28:04 -0500 Received: from mx2.suse.de ([195.135.220.15]:59904 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726019AbhAOL2E (ORCPT ); Fri, 15 Jan 2021 06:28:04 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 4A57AACAC; Fri, 15 Jan 2021 11:27:22 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id AD77B1E0800; Fri, 15 Jan 2021 12:27:21 +0100 (CET) Date: Fri, 15 Jan 2021 12:27:21 +0100 From: Jan Kara To: John Hubbard Cc: Linus Torvalds , Andrea Arcangeli , Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai Subject: Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending Message-ID: <20210115112721.GF27380@quack2.suse.cz> References: <20210107200402.31095-1-aarcange@redhat.com> <20210107200402.31095-3-aarcange@redhat.com> <4100a6f5-ab0b-f7e5-962f-ea1dbcb1e47e@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4100a6f5-ab0b-f7e5-962f-ea1dbcb1e47e@nvidia.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 07-01-21 13:53:18, John Hubbard wrote: > On 1/7/21 1:29 PM, Linus Torvalds wrote: > > On Thu, Jan 7, 2021 at 12:59 PM Andrea Arcangeli wrote: > > > > > > The problem is it's not even possible to detect reliably if there's > > > really a long term GUP pin because of speculative pagecache lookups. > > > > So none of the normal code _needs_ that any more these days, which is > > what I think is so nice. Any pinning will do the COW, and then we have > > the logic to make sure it stays writable, and that keeps everything > > nicely coherent and is all fairly simple. > > > > And yes, it does mean that if somebody then explicitly write-protects > > a page, it may end up being COW'ed after all, but if you first pinned > > it, and then started playing with the protections of that page, why > > should you be surprised? > > > > So to me, this sounds like a "don't do that then" situation. > > > > Anybody who does page pinning and wants coherency should NOT TOUCH THE > > MAPPING IT PINNED. > > > > (And if you do touch it, it's your own fault, and you get to keep both > > of the broken pieces) > > > > Now, I do agree that from a QoI standpoint, it would be really lovely > > if we actually enforced it. I'm not entirely sure we can, but maybe it > > would be reasonable to use that > > > > mm->has_pinned && page_maybe_dma_pinned(page) > > > > at least as the beginning of a heuristic. > > > > In fact, I do think that "page_maybe_dma_pinned()" could possibly be > > made stronger than it is. Because at *THAT* point, we might say "we > > What exactly did you have in mind, to make it stronger? I think the > answer is in this email but I don't quite see it yet... > > Also, now seems to be a good time to mention that I've been thinking about > a number of pup/gup pinning cases (Direct IO, GPU/NIC, NVMe/storage peer > to peer with GUP/NIC, and HMM support for atomic operations from a device). > And it seems like the following approach would help: > > * Use pin_user_pages/FOLL_PIN for long-term pins. Long-term here (thanks > to Jason for this point) means "user space owns the lifetime". We might > even end up deleting either FOLL_PIN or FOLL_LONGTERM, because this would > make them mean the same thing. The idea is that there are no "short term" > pins of this kind of memory. > > * Continue to use FOLL_GET (only) for Direct IO. That's a big change of plans, > because several of us had thought that Direct IO needs FOLL_PIN. However, this > recent conversation, plus my list of cases above, seems to indicate otherwise. > That's because we only have one refcount approach for marking pages in this way, > and we should spend it on the long-term pinned pages. Those are both hard to > identify otherwise, and actionable once we identify them. Somewhat late to the game but I disagree here. I think direct IO still needs FOLL_PIN so that page_may_be_dma_pinned() returns true for it. At least for shared pages. Because filesystems/mm in the writeback path need to detect whether the page is pinned and thus its contents can change anytime without noticing, the page can be dirtied at random times etc. In that case we need to bounce the page during writeback (to avoid checksum failures), keep page as dirty in internal filesystem bookkeeping (and in MM as well) etc... Honza -- Jan Kara SUSE Labs, CR