Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp984705pxu; Mon, 26 Oct 2020 00:34:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOeN/gVQWO0iloTEEixrzRYd0f2792TD0ojnLyDngkZs+wk6zCJBW2+b13wp5wMw8bSP68 X-Received: by 2002:a17:907:2090:: with SMTP id pv16mr14134916ejb.506.1603697683573; Mon, 26 Oct 2020 00:34:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603697683; cv=none; d=google.com; s=arc-20160816; b=K9wxxt680Nw8s1qPJ0mSJqK1wXW1l1XrFWCzsFkhH+nF+e2C/wRX2UmhQIoo3fEzqg nOsB1HsaJyd5q10KjoS6aezy5Gi37qiQu2XKtHlt/xy9VpAvTgTmzfl5B3PPTqNG9ulG tPM5daTFCfJ+OYxIPrLRUHHKJjDNwGq5YVcuk6VrPUMOhFX7v90MBsN06VjaoiFZGIn3 5Sf4AeImO5q29M0cFMD2MmCziLMlJuEDIcw6uDV2ktPpKXn799uf0+Qn82YAslYzZjWB CvynfOGnigNvX0uwlS+xjl4RBRCJ/Vc70wptWLg2tzJ6RcSb9cDvIVKD+B9Ba3WSWo+f JSnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=qSdqOixYcTUUWwPEY+kW4kk0m43pmokWNe8HgQoh+Ek=; b=lYJOrf1XPVAfMVZGTSEgec1BIrw/CT0hcN1Ul+Iu1BTSI8Jr7mqZ+RMvL1x50Pauos SxtO0r1vPg9t08JMDCyOB4FQJRJha2EQlWqJbkC/0xwIxb4KCu1Flrt+r9hkk9MEL1mW wJDY/nMjjBfgBwSHeAnQ7z/mlviGR+a0CjmCyrMWuYYnwIUTSNqAur1bYSFL9qknnXem QbqmVn9Lk0pk/zqlHG2PS4zXo6oLXzvHayzxooYlnQ9Bet+pf9d2VAsoj+Bf2CzC9vwR axDhYQAX+BtPC5Ry1BqootSIF7gfLwmV5NgEnA7gmURL1sY+SsNv/1cy0fzCvkJy73tp 8RGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=UjxXxgVy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k25si7822280ejk.10.2020.10.26.00.34.21; Mon, 26 Oct 2020 00:34:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=UjxXxgVy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1421171AbgJZEWj (ORCPT + 99 others); Mon, 26 Oct 2020 00:22:39 -0400 Received: from casper.infradead.org ([90.155.50.34]:59982 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1421001AbgJZEWj (ORCPT ); Mon, 26 Oct 2020 00:22:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=qSdqOixYcTUUWwPEY+kW4kk0m43pmokWNe8HgQoh+Ek=; b=UjxXxgVyzDFJIFVUSt9bdjzayX iefTgd/ZqSsTt0Pm3HkL3ybwrwP1hcOChcgDH/U+DMrXwKwhQAGqs0Bi1w08k2Sp8nxYtL4hNr2J+ LgFvKiFSWF6V6uaEaoSJshFx1xYOth5YCvDywB1SiT/jlkjTasm+tnGVT7G895IApb2RbxKXtxRXP Iqz4miNMTI0QfosAQ4ChTCvOxFuMMLl5MVf+QT+bPQHtZCAgdZZOR85cr9aT3e3UnzU5cZNff87K3 pmeEb1GxbLjRDv+7sGfK/h7uff2Z/JmyX5GZjGCdHn5iklW1/GSzwqsp4hAiERiEEOHTD+HtYHzoN R8/as70A==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kWu19-00078M-0i; Mon, 26 Oct 2020 04:21:59 +0000 Date: Mon, 26 Oct 2020 04:21:58 +0000 From: Matthew Wilcox To: John Hubbard Cc: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Rientjes , Andrea Arcangeli , Kees Cook , Will Drewry , "Edgecombe, Rick P" , "Kleen, Andi" , Liran Alon , Mike Rapoport , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: Re: [RFCv2 08/16] KVM: Use GUP instead of copy_from/to_user() to access guest memory Message-ID: <20201026042158.GN20115@casper.infradead.org> References: <20201020061859.18385-1-kirill.shutemov@linux.intel.com> <20201020061859.18385-9-kirill.shutemov@linux.intel.com> <20201022114946.GR20115@casper.infradead.org> <30ce6691-fd70-76a2-8b61-86d207c88713@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <30ce6691-fd70-76a2-8b61-86d207c88713@nvidia.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 22, 2020 at 12:58:14PM -0700, John Hubbard wrote: > On 10/22/20 4:49 AM, Matthew Wilcox wrote: > > On Tue, Oct 20, 2020 at 01:25:59AM -0700, John Hubbard wrote: > > > Should copy_to_guest() use pin_user_pages_unlocked() instead of gup_unlocked? > > > We wrote a "Case 5" in Documentation/core-api/pin_user_pages.rst, just for this > > > situation, I think: > > > > > > > > > CASE 5: Pinning in order to write to the data within the page > > > ------------------------------------------------------------- > > > Even though neither DMA nor Direct IO is involved, just a simple case of "pin, > > > write to a page's data, unpin" can cause a problem. Case 5 may be considered a > > > superset of Case 1, plus Case 2, plus anything that invokes that pattern. In > > > other words, if the code is neither Case 1 nor Case 2, it may still require > > > FOLL_PIN, for patterns like this: > > > > > > Correct (uses FOLL_PIN calls): > > > pin_user_pages() > > > write to the data within the pages > > > unpin_user_pages() > > > > Case 5 is crap though. That bug should have been fixed by getting > > the locking right. ie: > > > > get_user_pages_fast(); > > lock_page(); > > kmap(); > > set_bit(); > > kunmap(); > > set_page_dirty() > > unlock_page(); > > > > I should have vetoed that patch at the time, but I was busy with other things. > > It does seem like lock_page() is better, for now at least, because it > forces the kind of synchronization with file system writeback that is > still yet to be implemented for pin_user_pages(). > > Long term though, Case 5 provides an alternative way to do this > pattern--without using lock_page(). Also, note that Case 5, *in > general*, need not be done page-at-a-time, unlike the lock_page() > approach. Therefore, Case 5 might potentially help at some call sites, > either for deadlock avoidance or performance improvements. > > In other words, once the other half of the pin_user_pages() plan is > implemented, either of these approaches should work. > > Or, are you thinking that there is never a situation in which Case 5 is > valid? I don't think the page pinning approach is ever valid. For file mappings, the infiniband registration should take a lease on the inode, blocking truncation. DAX shouldn't be using struct pages at all, so there shouldn't be anything there to pin. It's been five years since DAX was merged, and page pinning still doesn't work. How much longer before the people who are pushing it realise that it's fundamentally flawed?