Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp2529450imc; Tue, 12 Mar 2019 16:25:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqwaOBlppQdl0K2usOK/wYDjr3TsgxzqF35R6tBdJo6VEmPm3nB2eXIlu53rZm+jex24FlsO X-Received: by 2002:a17:902:9a09:: with SMTP id v9mr41407956plp.225.1552433126003; Tue, 12 Mar 2019 16:25:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552433125; cv=none; d=google.com; s=arc-20160816; b=NBVqS5EywPrrOiFh+UfHSxOLe31Bv01/jiI5E8EaQfnNASLrA3olvEmAyrZUY3bIRx T6ksxuaCj6TUeNJa7ksQEJkwruOH+q2PM5Rn3twr8etMC4barNXdBmBgThLhXEIfKAUh +nmHkRee0k8bAxsEDX23uW8ZubcNgLPFQ5pufRJvC5Idt/th9KxQHF6AeEHTBwEmOWd/ h5zqZaO0s0SDSDPMvYsVPC32dC8oYbhCWxP3kRWpiw3ZU2r86DhRC6bq+i9ZwsAkVFVY VtC44o00UbA0cncbK9uyqbZjq6iCdpGZh9X3Vey4PQcrvcEwjDzhX2fEwO4PP37vsiz5 5VoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=+FO/aZAptqOuIx/FLxANaTDEsqif69hcVQygxSgbHdw=; b=usgUWnN6BA8yGIA6QT4o0wwvRoSjO5Uo9uMVONKRpThvEdVruhmclYhjMIkBZntTJ/ b4ohiZI7P0dRv8cXcqLw7+QaSjl4fMlwCJxaAl1nGhx2NpLhKAH82hhmz/nCPlhcxm24 r8UsvRFNFpmEWvuQlDBnEeEr3PgCnu5UEEeQ3uBNOqeS+w0mj30cAvHoxBX6+JARtQV0 tRsU5YWbXOAlWf1VsmPUtp2Cd5M4njGFApp4q5WoP7y05bJiVSlFjLIzO4ggQGGiHeu4 8RU1UjBlSIPNy6g10pk793x5WS7IchMpU3pFpb02mhdR7yvB0wuv8YWl3i/UrDYOUNmO +pcQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 59si9726164plp.100.2019.03.12.16.25.09; Tue, 12 Mar 2019 16:25:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727258AbfCLXYr (ORCPT + 99 others); Tue, 12 Mar 2019 19:24:47 -0400 Received: from mga02.intel.com ([134.134.136.20]:36994 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726684AbfCLXYq (ORCPT ); Tue, 12 Mar 2019 19:24:46 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Mar 2019 16:24:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,472,1544515200"; d="scan'208";a="131107170" Received: from iweiny-desk2.sc.intel.com ([10.3.52.157]) by fmsmga008.fm.intel.com with ESMTP; 12 Mar 2019 16:24:45 -0700 Date: Tue, 12 Mar 2019 08:23:16 -0700 From: Ira Weiny To: Dave Chinner Cc: Christopher Lameter , john.hubbard@gmail.com, Andrew Morton , linux-mm@kvack.org, Al Viro , Christian Benvenuti , Christoph Hellwig , Dan Williams , Dennis Dalessandro , Doug Ledford , Jan Kara , Jason Gunthorpe , Jerome Glisse , Matthew Wilcox , Michal Hocko , Mike Rapoport , Mike Marciniszyn , Ralph Campbell , Tom Talpey , LKML , linux-fsdevel@vger.kernel.org, John Hubbard Subject: Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions Message-ID: <20190312152316.GF1119@iweiny-DESK2.sc.intel.com> References: <20190306235455.26348-1-jhubbard@nvidia.com> <010001695b4631cd-f4b8fcbf-a760-4267-afce-fb7969e3ff87-000000@email.amazonses.com> <20190310224742.GK26298@dastard> <01000169705aecf0-76f2b83d-ac18-4872-9421-b4b6efe19fc7-000000@email.amazonses.com> <20190312103932.GD1119@iweiny-DESK2.sc.intel.com> <20190312221113.GF23020@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190312221113.GF23020@dastard> User-Agent: Mutt/1.11.1 (2018-12-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote: > On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote: > > IMHO I don't think that the copy_file_range() is going to carry us through the > > next wave of user performance requirements. RDMA, while the first, is not the > > only technology which is looking to have direct access to files. XDP is > > another.[1] > > Sure, all I doing here was demonstrating that people have been > trying to get local direct access to file mappings to DMA directly > into them for a long time. Direct Io games like these are now > largely unnecessary because we now have much better APIs to do > zero-copy data transfer between files (which can do hardware offload > if it is available!). > > It's the long term pins that RDMA does that are the problem here. > I'm asssuming that for XDP, you're talking about userspace zero copy > from files to the network hardware and vice versa? transmit is > simple (read-only mapping), but receive probably requires bpf > programs to ensure that data (minus headers) in the incoming packet > stream is correctly placed into the UMEM region? Yes, exactly. > > XDP receive seems pretty much like the same problem as RDMA writes > into the file. i.e. the incoming write DMAs are going to have to > trigger page faults if the UMEM is a long term pin so the filesystem > behaves correctly with this remote data placement. I'd suggest that > RDMA, XDP and anything other hardware that is going to pin > file-backed mappings for the long term need to use the same "inform > the fs of a write operation into it's mapping" mechanisms... Yes agreed. I have a hack patch I'm testing right now which allows the user to take a LAYOUT lease from user space and GUP triggers on that, either allowing or rejecting the pin based on the lease. I think this is the first step of what Jan suggested.[1] There is a lot more detail to work out with what happens if that lease needs to be broken. > > And if we start talking about wanting to do peer-to-peer DMA from > network/GPU device to storage device without going through a > file-backed CPU mapping, we still need to have the filesystem > involved to translate file offsets to storage locations the > filesystem has allocated for the data and to lock them down for as > long as the peer-to-peer DMA offload is in place. In effect, this > is the same problem as RDMA+FS-DAXs - the filesystem owns the file > offset to storage location mapping and manages storage access > arbitration, not the mm/vma mapping presented to userspace.... I've only daydreamed about Peer-to-peer transfers. But yes I think this is the direction we need to go. But The details of doing a GPU -> RDMA -> {network } -> RDMA -> FS DAX And back again... without CPU/OS involvement are only a twinkle in my eye... If that. Ira [1] https://lore.kernel.org/lkml/20190212160707.GA19076@quack2.suse.cz/