Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp631980imj; Thu, 7 Feb 2019 09:25:48 -0800 (PST) X-Google-Smtp-Source: AHgI3IYr/NEubc4Pbn+DfLGGRQvgmFnDJsHE65sBtkwsnOmai6P0/QJGmc2Rtl77McyYmqogNNPR X-Received: by 2002:a62:e201:: with SMTP id a1mr16978864pfi.75.1549560348655; Thu, 07 Feb 2019 09:25:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549560348; cv=none; d=google.com; s=arc-20160816; b=ketHP7txF5fQk4aDtam2D+o4OHF8D2tiyoF5IHqX2bbCS9hiq3X9iBTTJH3kEABNF9 HDr0YOAZCo9G/x2oDNhB7cYljD2ChQEiHQjP0YEtajEjGUYVmWaK7ibjAuh5ye3u185g QyS2/fLhc0OGvqUlagoRDmDV4fwpdkJHcUqHkeYSFciKDFD1LKlGaqM1Wq778f3OyBeJ Q/6E9/ncKqZQb47G/ZJf3T6xmBSZXTSIVWK4joozbzgdjlP/qMsYXe2N+yC01DjKyYBB nH/AKFPCUj8dNPWllgKgav6jReHvtGj3CTC8FteWbxL8g7whjcEaAnsIiFblXngVBiek JJvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=KaCACouyWcP3AHn7c7rvYHXFAgfIP9iLxHVqu/otoeA=; b=irKqLtbZoFm3ErMdHRoWcYQdx5+nUYSREXi7mHbMTGQlOJ+t++A8f4577kM9R7LyGN ZCxPaHPW4vf5zyiKMYtuD2lK7WSU7vGo3MwnCNy3ju6aDGBawKgDKRxSyNVCu4ot0nqP FkMD8UpjpIyIi6faU7fVKOaLsqmXe351giuS8mJT/K+IpuOurrBypAmqxQkFCDQcWlI9 5pMC/X5DlYYySY95UjFyzpVSignDs0ZgvoOEKc9gkf4ajRff1Of2fguYNhjQYXCbsbeF viWCS1VXYUuhUIpMu8dGg71APRm6TfN8vaSgOgYpP/b4uWzIhh5eOS63+Gkc4JFQN/f9 CVHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=uFjpHN01; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h62si3649642pge.78.2019.02.07.09.25.32; Thu, 07 Feb 2019 09:25:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=uFjpHN01; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726925AbfBGRYJ (ORCPT + 99 others); Thu, 7 Feb 2019 12:24:09 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:37076 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726270AbfBGRYJ (ORCPT ); Thu, 7 Feb 2019 12:24:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=KaCACouyWcP3AHn7c7rvYHXFAgfIP9iLxHVqu/otoeA=; b=uFjpHN01utFKoX5XohZKeYMVc dy3QXKNe+IGOmuHAt7tMQhY29FNWyN1wCCP7aB09vF7xxiK+W/3f5jDFt/TsnOx5hrOEVHD79bqJg oRpfGDj14NXeY2ZFeJ8HUn/Le2lCmWCMfiOB+ISwJKuv2+cexAax5pwwGsn61TQe7bWf9fzmdNFji 0116U/IZY1pvA6mg2+Y2jvBLyeoBdBphTe3n4cEYzwMC9/mR4UmV+EXC0bKMpys+HvtvcgKLSbCJ4 xdwbOotNdPfbXC7joW4xQCquZpeRBXTVD/AXGacSx54m6M/ncB7wKq0yMenS9s9hm7yw/iFCeCTEd l6r802ZNw==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1grnPC-0001CL-1M; Thu, 07 Feb 2019 17:24:06 +0000 Date: Thu, 7 Feb 2019 09:24:05 -0800 From: Matthew Wilcox To: Doug Ledford Cc: Dan Williams , Jason Gunthorpe , Dave Chinner , Christopher Lameter , Jan Kara , Ira Weiny , lsf-pc@lists.linux-foundation.org, linux-rdma , Linux MM , Linux Kernel Mailing List , John Hubbard , Jerome Glisse , Michal Hocko Subject: Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA Message-ID: <20190207172405.GY21860@bombadil.infradead.org> References: <20190206095000.GA12006@quack2.suse.cz> <20190206173114.GB12227@ziepe.ca> <20190206175233.GN21860@bombadil.infradead.org> <47820c4d696aee41225854071ec73373a273fd4a.camel@redhat.com> <01000168c43d594c-7979fcf8-b9c1-4bda-b29a-500efe001d66-000000@email.amazonses.com> <20190206210356.GZ6173@dastard> <20190206220828.GJ12227@ziepe.ca> <0c868bc615a60c44d618fb0183fcbe0c418c7c83.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 07, 2019 at 11:25:35AM -0500, Doug Ledford wrote: > * Really though, as I said in my email to Tom Talpey, this entire > situation is simply screaming that we are doing DAX networking wrong. > We shouldn't be writing the networking code once in every single > application that wants to do this. If we had a memory segment that we > shared from server to client(s), and in that memory segment we > implemented a clustered filesystem, then applications would simply mmap > local files and be done with it. If the file needed to move, the kernel > would update the mmap in the application, done. If you ask me, it is > the attempt to do this the wrong way that is resulting in all this > heartache. That said, for today, my recommendation would be to require > ODP hardware for XFS filesystem with the DAX option, but allow ext2 > filesystems to mount DAX filesystems on non-ODP hardware, and go in and > modify the ext2 filesystem so that on DAX mounts, it disables hole punch > and ftrunctate any time they would result in the forced removal of an > established mmap. I agree that something's wrong, but I think the fundamental problem is that there's no concept in RDMA of having an STag for storage rather than for memory. Imagine if we could associate an STag with a file descriptor on the server. The client could then perform an RDMA to that STag. On the server, we'd need lots of smarts in the card and in the OS to know how to treat that packet on arrival -- depending on what the file descriptor referred to, it might only have to write into the page cache, or it might set up an NVMe DMA, or it might resolve the underlying physical address and DMA directly to an NV-DIMM.