Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp1214130imj; Sun, 17 Feb 2019 00:21:59 -0800 (PST) X-Google-Smtp-Source: AHgI3Ib8/vMdL+lHz/kqMO2NYzWrxLuvxm2RgpVpX4a9IYwf43N8aUiKOlDqwerhyR5Ppjdi+fUZ X-Received: by 2002:a62:ed0f:: with SMTP id u15mr17910842pfh.188.1550391719574; Sun, 17 Feb 2019 00:21:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550391719; cv=none; d=google.com; s=arc-20160816; b=mJZqTpCTEoFZm9lDF7xaw3uM7J9zzJGCZRXTYloimxDbxu8Wi6w9YNcQVV6vO4BJkx Qli8XoYFbgLxS1Zghs1Q0CnB5csWkltp/pl7TktCk54dUOTQCrSHMIk+flh2ayNpoHhq LSqN1GFkhwRCcJPcNGrm4ZXGcgI8HYGSCzjfEO4eEktpcACU0c6c8J87y9wdjMoCcZNm q/VdtBx51ELIJ+L0K+MyoXK/IPJQnT59CEjkzYqjqdiS1osN82h3sMxx02sCor6cMSSM dcA5z0MRVCLckSCFfogw0Zr4qqRU8hgNk2WEt+qVKY1BAhkwCOJJ7JESKMOQESe28sBs JLzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:feedback-id:mime-version:user-agent :references:message-id:in-reply-to:subject:cc:to:from:date :dkim-signature; bh=UKkSLTLQtX2/YxR9YoIhylWLnplFLF92ZPUmmD+7zig=; b=ZI6Fpi5njQ/y1lIbzc01NmZORercWBgdbRq3FI8y1ulO0/QVTBL5FRFZJv5ifmFKIk cGdOrTSmEK+KDj3tG83pmwUaUOSlguB9WQuLzFmcle8/E8OQTxYUZDYh3Z83aQF1BIFP opGj8GrgLa1Rlbjqc/94LmI6FwD/6o7Ab2PRZ8tlcNAKsuK6R6wb0SwqLvC8vZDPcksa 5jcHtGWaUW3mAm/eNR/eigeRTcABahd4/BY0hEz7Ww951tMvIOux8F4F5jBu/XW5eyNd 6Ehs3dW1ND5tAd4zV50M/Q9MHgU5I/3WWqvj1HvO4Hg/ZLRuOP/Xgonw7GvhZgXujhqg Hqig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazonses.com header.s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug header.b=ZLC8erem; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k7si10157703plt.342.2019.02.17.00.21.42; Sun, 17 Feb 2019 00:21:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazonses.com header.s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug header.b=ZLC8erem; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729810AbfBQCyc (ORCPT + 99 others); Sat, 16 Feb 2019 21:54:32 -0500 Received: from a9-37.smtp-out.amazonses.com ([54.240.9.37]:48476 "EHLO a9-37.smtp-out.amazonses.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727175AbfBQCyc (ORCPT ); Sat, 16 Feb 2019 21:54:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug; d=amazonses.com; t=1550372071; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:MIME-Version:Content-Type:Feedback-ID; bh=3x4EHOIW6CCTZsLQgDwb2qN7fvHKquBtp2/7NciRkx4=; b=ZLC8eremojT71FLFUgaBmmj1KkwvdFjjNTLFOYoFlnk+umWvhFp+MkiC6mZuMSgg G+Dm+U3IXIGW6EcTY47XH5leKVOwCOt0aqCO8PgvF1UR+rIIcbXE5xbYWE59s+CkCCK jC0Lq+kV9Y0uI7Rkx65K7WnKWxPTCCyaVyWoWPuQ= Date: Sun, 17 Feb 2019 02:54:31 +0000 From: Christopher Lameter X-X-Sender: cl@nuc-kabylake To: Ira Weiny cc: Jason Gunthorpe , Matthew Wilcox , Dave Chinner , Jerome Glisse , Dan Williams , Jan Kara , Doug Ledford , lsf-pc@lists.linux-foundation.org, linux-rdma , Linux MM , Linux Kernel Mailing List , John Hubbard , Michal Hocko Subject: Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA In-Reply-To: <20190215233828.GB30818@iweiny-DESK2.sc.intel.com> Message-ID: <01000168f96067cc-053f7689-8362-49c5-85b6-3fe23ac7d4f4-000000@email.amazonses.com> References: <20190211180654.GB24692@ziepe.ca> <20190214202622.GB3420@redhat.com> <20190214205049.GC12668@bombadil.infradead.org> <20190214213922.GD3420@redhat.com> <20190215011921.GS20493@dastard> <01000168f1d25e3a-2857236c-a7cc-44b8-a5f3-f51c2cfe6ce4-000000@email.amazonses.com> <20190215180852.GJ12668@bombadil.infradead.org> <01000168f26d9e0c-aef3255a-5059-4657-b241-dae66663bbea-000000@email.amazonses.com> <20190215220031.GB8001@ziepe.ca> <20190215233828.GB30818@iweiny-DESK2.sc.intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SES-Outgoing: 2019.02.17-54.240.9.37 Feedback-ID: 1.us-east-1.fQZZZ0Xtj2+TD7V5apTT/NrT6QKuPgzCT/IC7XYgDKI=:AmazonSES Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 15 Feb 2019, Ira Weiny wrote: > > > > for filesystems and processes. The only problems come in for the things > > > > which bypass the page cache like O_DIRECT and DAX. > > > > > > It makes a lot of sense since the filesystems play COW etc games with the > > > pages and RDMA is very much like O_DIRECT in that the pages are modified > > > directly under I/O. It also bypasses the page cache in case you have > > > not noticed yet. > > > > It is quite different, O_DIRECT modifies the physical blocks on the > > storage, bypassing the memory copy. > > > > Really? I thought O_DIRECT allowed the block drivers to write to/from user > space buffers. But the _storage_ was still under the control of the block > drivers? It depends on what you see as the modification target. O_DIRECT uses memory as a target and source like RDMA. The block device is at the other end of the handling. > > RDMA modifies the memory copy. > > > > pages are necessary to do RDMA, and those pages have to be flushed to > > disk.. So I'm not seeing how it can be disconnected from the page > > cache? > > I don't disagree with this. RDMA does direct access to memory. If that memmory is a mmmap of a regular block device then we have a problem (this has not been a standard use case to my knowledge). The semantics are simmply different. RDMA expects memory to be pinned and always to be able to read and write from it. The block device/filesystem expects memory access to be controllable via the page permission. In particular access to be page need to be able to be stopped. This is fundamentally incompatible. RDMA access to such an mmapped section must preserve the RDMA semantics while the pinning is done and can only provide the access control after RDMA is finished. Pages in the RDMA range cannot be handled like normal page cache pages. This is in particular evident in the DAX case in which we have direct pass through even to the storage medium. And in this case write through can replace the page cache.