Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp632843imj; Thu, 7 Feb 2019 09:26:30 -0800 (PST) X-Google-Smtp-Source: AHgI3IbjA5OTr8YKPk0Qw2pgFk6LNfjlZVPs4dL8fNX5eN/BtiwCdDG/Nf9aWF+aqWyKjoOzrxdl X-Received: by 2002:a17:902:2ec1:: with SMTP id r59mr17869616plb.254.1549560390087; Thu, 07 Feb 2019 09:26:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549560390; cv=none; d=google.com; s=arc-20160816; b=YpGbAUBtsJ0w+kPHHtFJy11nUyaReL5+8YSKVS2xQvMsy+3XWkBSIpe6CV+y1Mhrd+ puz0+IHPmHaIrH4ysCKB2+ISy/1wosrvtl49OpupxxdP4f0lbK8X3BYTvB6pCd7svVlo PPDCV+OBxSVu7LzdMdqFKr9PFaQO1TOmFWhN9ocKv+XH7yes6+aGR+AfjVuniLqP1doN 90DNi/OrQvYwrvMxFth9V8pwPNti4dTibrjgrrF/xNES4zPyH3x8Dx9w1KRpBF0ISL4n Wi55irewHS5BbGephjSKhUoSAKC/2CowYl1/K09ngRAFfcKEdVqZWhM8gXavWwGkPHb9 TeHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=vOio33f9FVDZ2NSccoNQymn7G8Ph96WIZ34KkbB3/zQ=; b=zxEbxZmIR4251lRCxRuRcYtiDj+25RW286dqy6QvK6UoWzsFYkFlrSehpljjqDCKmQ P23pMucyNxijV5u+YLsYo+LAyWxYBA0Ah/BKp9FO0adTR5jtCmLoVHXxuByX64sytQ4g zznuUIpzg2sm9EOVkvKOWTHBmx9V/vdattoxjMQOASTUOs5sViiNTI6tGmyN+P5U3O1P 4eFxI55IAchhrX7ztgXFjo04Bzzf1IXsIQpKj8t2KV/0y5G4ye77GfiObOgF5BR9jl5U xE3Xy5Hm3x1nDdwu0YXeSWu2uV2GL1A5lVMGc3IocMBToSujMaC/3ObI6W1ahA0I1uAj CqBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s17si8112847plr.92.2019.02.07.09.26.13; Thu, 07 Feb 2019 09:26:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726994AbfBGRYN (ORCPT + 99 others); Thu, 7 Feb 2019 12:24:13 -0500 Received: from mga01.intel.com ([192.55.52.88]:34792 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726270AbfBGRYL (ORCPT ); Thu, 7 Feb 2019 12:24:11 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Feb 2019 09:24:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,344,1544515200"; d="scan'208";a="114459707" Received: from iweiny-desk2.sc.intel.com ([10.3.52.157]) by orsmga006.jf.intel.com with ESMTP; 07 Feb 2019 09:24:09 -0800 Date: Thu, 7 Feb 2019 09:23:53 -0800 From: Ira Weiny To: Dan Williams Cc: Doug Ledford , Jason Gunthorpe , Dave Chinner , Christopher Lameter , Matthew Wilcox , Jan Kara , lsf-pc@lists.linux-foundation.org, linux-rdma , Linux MM , Linux Kernel Mailing List , John Hubbard , Jerome Glisse , Michal Hocko Subject: Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA Message-ID: <20190207172352.GC29531@iweiny-DESK2.sc.intel.com> References: <20190206173114.GB12227@ziepe.ca> <20190206175233.GN21860@bombadil.infradead.org> <47820c4d696aee41225854071ec73373a273fd4a.camel@redhat.com> <01000168c43d594c-7979fcf8-b9c1-4bda-b29a-500efe001d66-000000@email.amazonses.com> <20190206210356.GZ6173@dastard> <20190206220828.GJ12227@ziepe.ca> <0c868bc615a60c44d618fb0183fcbe0c418c7c83.camel@redhat.com> <658363f418a6585a1ffc0038b86c8e95487e8130.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.1 (2018-12-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 06, 2019 at 07:13:16PM -0800, Dan Williams wrote: > On Wed, Feb 6, 2019 at 6:42 PM Doug Ledford wrote: > > > > On Wed, 2019-02-06 at 14:44 -0800, Dan Williams wrote: > > > On Wed, Feb 6, 2019 at 2:25 PM Doug Ledford wrote: > > > > Can someone give me a real world scenario that someone is *actually* > > > > asking for with this? > > > > > > I'll point to this example. At the 6:35 mark Kodi talks about the > > > Oracle use case for DAX + RDMA. > > > > > > https://youtu.be/ywKPPIE8JfQ?t=395 > > > > I watched this, and I see that Oracle is all sorts of excited that their > > storage machines can scale out, and they can access the storage and it > > has basically no CPU load on the storage server while performing > > millions of queries. What I didn't hear in there is why DAX has to be > > in the picture, or why Oracle couldn't do the same thing with a simple > > memory region exported directly to the RDMA subsystem, or why reflink or > > any of the other features you talk about are needed. So, while these > > things may legitimately be needed, this video did not tell me about > > how/why they are needed, just that RDMA is really, *really* cool for > > their use case and gets them 0% CPU utilization on their storage > > servers. I didn't watch the whole thing though. Do they get into that > > later on? Do they get to that level of technical discussion, or is this > > all higher level? > > They don't. The point of sharing that video was illustrating that RDMA > to persistent memory use case. That 0% cpu utilization is because the > RDMA target is not page-cache / anonymous on the storage box it's > directly to a file offset in DAX / persistent memory. A solution to > truncate lets that use case use more than just Device-DAX or ODP > capable adapters. That said, I need to let Ira jump in here because > saying layout leases solves the problem is not true, it's just the > start of potentially solving the problem. It's not clear to me what > the long tail of work looks like once the filesystem raises a > notification to the RDMA target process. This is exactly the problem which has been touched on by others throughout this thread. 1) To fully support leases on all hardware we will have to allow for RMDA processes to be killed when they don't respond to the lease a) If the process has done something bad (like truncate or hole punch) then the idea that "they get what they deserve" may be ok. b) However, if this is because of some underlying file system maintenance this is as Jason says unreasonable. It would be much better to tell the application "you can't do this" 2) To fully respond to a lease revocation involves a number of kernel changes in the RDMA stack but more importantly modifying every user space RDMA application to respond to a message from a channel they may not even be listening to. I think this is where Jason is getting very concerned. When you combine 1b and 2 you end up with a "non production" worthy solution. NOTE: This is somewhat true of ODP hardware as well since applications register each individual RDMA memory region as either ODP or not. So out of the box not all application would work automatically. Ira