Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp984733pxu; Mon, 26 Oct 2020 00:34:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxU2RnwyskLADbDfRS8VC+rHkW16TrYs1mzgyWv1OhYJmnjJY7oBN39p9T/QLJKymhEE+r4 X-Received: by 2002:a05:6402:10c7:: with SMTP id p7mr14477944edu.34.1603697685791; Mon, 26 Oct 2020 00:34:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603697685; cv=none; d=google.com; s=arc-20160816; b=zxtBgM7fiJ9ZmG5z9fscBZjnnTFBHCPtYaoX6MC6EiyF6msN5cG+OAMwB8aerKcMrp TiXCIuhGpFsh6DuFu+Db9av387HZ0JZwrF/SoL6Vavw2pCXzXQz/E2XqXSi9i+9ro3Jv 72q7zG0kdai7UlDzkyN8Y0WItc/GZQs012z5dnvoRsf1S5i2bZSrWz9NwgjIsfHxGCKL SvM9JXctlRG1U3neE9txeXZis2/rhYE9+fyDx+v3PcE/KI4f21uOexCDnZ5y1JHM2Wdw LRFZyVi7Fnt5zBUNfMaKgc6vOvbprldCyk0mIMgpwsF24fKzKrs/pg1QxReIwi1zqttG m+Jw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=OugSIDEgAyrbA+hBdM+S+wjKYpamFwazDYrzvQ7ACu0=; b=POzOq9cywX0TFGJu0MsU+1H4nztKtSa9uE8jrSndS2jQQF2uMc7ETGuT7x2Ii3BIa9 BBtxz0FMuFBp2tjjc++pcuBJ2KYWpzq4u/e/T0INFfrdkMeFQZsLGMIpCV8oWskFFgCy 9tcpCLpYYtsB7uUXvzpKmG68a2fbCwFR3z2VEDZaf0WTqfxV/74JYz2cLc4LqYH8lFJ7 xwVuxooWoiD5h5AGV6Yo7Y17fyDAakSTfFTu+GB6i0VW3E0zt8sJVxbx8kIfPOw0dw8D +rFxd1mDSjSpfuFnkAYzS0dNKW5FSWdPnAh2/4L25G54Df1YHnwL5BydyNE+wNnxXfPX KulQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=VMzQSXqQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jx2si6351858ejb.241.2020.10.26.00.34.24; Mon, 26 Oct 2020 00:34:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=VMzQSXqQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1421548AbgJZEoM (ORCPT + 99 others); Mon, 26 Oct 2020 00:44:12 -0400 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:4184 "EHLO hqnvemgate25.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1421510AbgJZEoL (ORCPT ); Mon, 26 Oct 2020 00:44:11 -0400 Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Sun, 25 Oct 2020 21:44:16 -0700 Received: from [10.2.57.113] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 26 Oct 2020 04:44:07 +0000 Subject: Re: [RFCv2 08/16] KVM: Use GUP instead of copy_from/to_user() to access guest memory To: Matthew Wilcox CC: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , "Peter Zijlstra" , Paolo Bonzini , "Sean Christopherson" , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Rientjes , Andrea Arcangeli , Kees Cook , Will Drewry , "Edgecombe, Rick P" , "Kleen, Andi" , "Liran Alon" , Mike Rapoport , , , , , "Kirill A. Shutemov" References: <20201020061859.18385-1-kirill.shutemov@linux.intel.com> <20201020061859.18385-9-kirill.shutemov@linux.intel.com> <20201022114946.GR20115@casper.infradead.org> <30ce6691-fd70-76a2-8b61-86d207c88713@nvidia.com> <20201026042158.GN20115@casper.infradead.org> From: John Hubbard Message-ID: Date: Sun, 25 Oct 2020 21:44:07 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201026042158.GN20115@casper.infradead.org> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1603687456; bh=OugSIDEgAyrbA+hBdM+S+wjKYpamFwazDYrzvQ7ACu0=; h=Subject:To:CC:References:From:Message-ID:Date:User-Agent: MIME-Version:In-Reply-To:Content-Type:Content-Language: Content-Transfer-Encoding:X-Originating-IP:X-ClientProxiedBy; b=VMzQSXqQYz00y6HdMRRIWN0A9Ldha+c7/7WdKnZwGZT+28JsXCO93EXGLTyXEcGIu wdy4+yKnswoval6Koi52Q/E6Wvzl4IC64Do8HTpw51+wqx/azBng6qtTzQY7W5voFH jJut2BtelObV3oyNUfB316xz5b1aQd219Otl/WtFdNKVONA2Z482P82MHhfwIPnoY9 OFKmkxJLq6yIFbgCj8rtuaMMrIg52Z4+1hFNbGYej8K+/ixbHctvSIV/9Icd4J1O0B kNGgWE2fS8cWFm2doc93+ifbzddx83+9n9apHfIo0DUlLYXKgAcT2C1QXFGl0m9ueF PMd6PV9rFRDcA== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/25/20 9:21 PM, Matthew Wilcox wrote: > On Thu, Oct 22, 2020 at 12:58:14PM -0700, John Hubbard wrote: >> On 10/22/20 4:49 AM, Matthew Wilcox wrote: >>> On Tue, Oct 20, 2020 at 01:25:59AM -0700, John Hubbard wrote: >>>> Should copy_to_guest() use pin_user_pages_unlocked() instead of gup_unlocked? >>>> We wrote a "Case 5" in Documentation/core-api/pin_user_pages.rst, just for this >>>> situation, I think: >>>> >>>> >>>> CASE 5: Pinning in order to write to the data within the page >>>> ------------------------------------------------------------- >>>> Even though neither DMA nor Direct IO is involved, just a simple case of "pin, >>>> write to a page's data, unpin" can cause a problem. Case 5 may be considered a >>>> superset of Case 1, plus Case 2, plus anything that invokes that pattern. In >>>> other words, if the code is neither Case 1 nor Case 2, it may still require >>>> FOLL_PIN, for patterns like this: >>>> >>>> Correct (uses FOLL_PIN calls): >>>> pin_user_pages() >>>> write to the data within the pages >>>> unpin_user_pages() >>> >>> Case 5 is crap though. That bug should have been fixed by getting >>> the locking right. ie: >>> >>> get_user_pages_fast(); >>> lock_page(); >>> kmap(); >>> set_bit(); >>> kunmap(); >>> set_page_dirty() >>> unlock_page(); >>> >>> I should have vetoed that patch at the time, but I was busy with other things. >> >> It does seem like lock_page() is better, for now at least, because it >> forces the kind of synchronization with file system writeback that is >> still yet to be implemented for pin_user_pages(). >> >> Long term though, Case 5 provides an alternative way to do this >> pattern--without using lock_page(). Also, note that Case 5, *in >> general*, need not be done page-at-a-time, unlike the lock_page() >> approach. Therefore, Case 5 might potentially help at some call sites, >> either for deadlock avoidance or performance improvements. >> >> In other words, once the other half of the pin_user_pages() plan is >> implemented, either of these approaches should work. >> >> Or, are you thinking that there is never a situation in which Case 5 is >> valid? > > I don't think the page pinning approach is ever valid. For file Could you qualify that? Surely you don't mean that the entire pin_user_pages story is a waste of time--I would have expected you to make more noise earlier if you thought that, yes? > mappings, the infiniband registration should take a lease on the inode, > blocking truncation. DAX shouldn't be using struct pages at all, so > there shouldn't be anything there to pin. > > It's been five years since DAX was merged, and page pinning still > doesn't work. How much longer before the people who are pushing it > realise that it's fundamentally flawed? > Is this a separate rant about *only* DAX, or is general RDMA in your sights too? :) thanks, -- John Hubbard NVIDIA