Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp5738013ybi; Wed, 12 Jun 2019 07:39:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqy4BP2xLETZhkBu2J8ZLiEev1I8tmBSwPYrSAjEpU3YMMoGYCL/JCfi1Uq0eUKJ2eCdMzTz X-Received: by 2002:a17:90a:d582:: with SMTP id v2mr31969788pju.22.1560350379106; Wed, 12 Jun 2019 07:39:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560350379; cv=none; d=google.com; s=arc-20160816; b=aVrupvKVtuPvUAXvGX2HqGMBccU3vHu19JhvAimutrpoGfniniYoTkHcT1sR6w3DC/ 9NBs0+e2JxjSm+JKIs6Bt6vKQoDr+uCd0h7HwT72hL55fxdTvu7KR4K+ppGpBxRvDawy pveWtA2T9MgxeUgDafc3sMAhR4a3QBmiZyj1tvNYGQfwClShTbjIm1Lnahi9F55nXuyi YRnfYIYdJ+WlhsuNWtXzxuUdInd8YtO9XNYSDxc4GcEeFhaDk3PdZkuxnTuuIDJYdAPA 8931Q7/DUAkhTjqxFZcIoOV93nHMJlzsqE+PGoCit9VVxYdtHDsb1GZ7wmZ3VO/lkUHY H0BA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=AjAoi1FVinP3MMsi4fyWg8Tzng4Y4A1sEmlZAtaX5Pw=; b=s7cPQGkoMaMQ9aPlZEz2fjXBV+pEgM0rl+Q/agAhfS0V9I1ts4o09zi4IxVJAsqACT BIpUpAKjen6SvFQH7EiZhwUy4gHq4rSnKIuM2nuZEtF0PWykUOzrXvQR0GhKuKRJnpFh CaDdZmELBZ1+rkCZ+Nff5CYJAnmL0Ishw3//uaQpX0cLN4z8fqP3KTsaGEmR/5Dr4vgH 7H0i1zAzAKZiliYLu61RCSADycGyNE7oAxqTHvpDdEZa4AOALwO8hsg/FNAr+VAYoGJP +9ILkV6eMMVvqvLZQnrR+UPSfYP1RbDsyeSxK7Rz8me5uchQioSg8ULYPgWQCETirbq2 X+9A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q2si15401876pll.76.2019.06.12.07.39.18; Wed, 12 Jun 2019 07:39:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726996AbfFLK3V (ORCPT + 99 others); Wed, 12 Jun 2019 06:29:21 -0400 Received: from mx2.suse.de ([195.135.220.15]:42614 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726851AbfFLK3V (ORCPT ); Wed, 12 Jun 2019 06:29:21 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 16DA0AE07; Wed, 12 Jun 2019 10:29:19 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id BA8661E4328; Wed, 12 Jun 2019 12:29:17 +0200 (CEST) Date: Wed, 12 Jun 2019 12:29:17 +0200 From: Jan Kara To: Ira Weiny Cc: Jason Gunthorpe , Jan Kara , Dan Williams , Theodore Ts'o , Jeff Layton , Dave Chinner , Matthew Wilcox , linux-xfs@vger.kernel.org, Andrew Morton , John Hubbard , =?iso-8859-1?B?Suly9G1l?= Glisse , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-ext4@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH RFC 00/10] RDMA/FS DAX truncate proposal Message-ID: <20190612102917.GB14578@quack2.suse.cz> References: <20190606014544.8339-1-ira.weiny@intel.com> <20190606104203.GF7433@quack2.suse.cz> <20190606195114.GA30714@ziepe.ca> <20190606222228.GB11698@iweiny-DESK2.sc.intel.com> <20190607103636.GA12765@quack2.suse.cz> <20190607121729.GA14802@ziepe.ca> <20190607145213.GB14559@iweiny-DESK2.sc.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190607145213.GB14559@iweiny-DESK2.sc.intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri 07-06-19 07:52:13, Ira Weiny wrote: > On Fri, Jun 07, 2019 at 09:17:29AM -0300, Jason Gunthorpe wrote: > > On Fri, Jun 07, 2019 at 12:36:36PM +0200, Jan Kara wrote: > > > > > Because the pins would be invisible to sysadmin from that point on. > > > > It is not invisible, it just shows up in a rdma specific kernel > > interface. You have to use rdma netlink to see the kernel object > > holding this pin. > > > > If this visibility is the main sticking point I suggest just enhancing > > the existing MR reporting to include the file info for current GUP > > pins and teaching lsof to collect information from there as well so it > > is easy to use. > > > > If the ownership of the lease transfers to the MR, and we report that > > ownership to userspace in a way lsof can find, then I think all the > > concerns that have been raised are met, right? > > I was contemplating some new lsof feature yesterday. But what I don't > think we want is sysadmins to have multiple tools for multiple > subsystems. Or even have to teach lsof something new for every potential > new subsystem user of GUP pins. Agreed. > I was thinking more along the lines of reporting files which have GUP > pins on them directly somewhere (dare I say procfs?) and teaching lsof to > report that information. That would cover any subsystem which does a > longterm pin. So lsof already parses /proc//maps to learn about files held open by memory mappings. It could parse some other file as well I guess. The good thing about that would be that then "longterm pin" structure would just hold struct file reference. That would avoid any needs of special behavior on file close (the file reference in the "longterm pin" structure would make sure struct file and thus the lease stays around, we'd just need to make explicit lease unlock block until the "longterm pin" structure is freed). The bad thing is that it requires us to come up with a sane new proc interface for reporting "longterm pins" and associated struct file. Also we need to define what this interface shows if the pinned pages are in DRAM (either page cache or anon) and not on NVDIMM. > > > ugly to live so we have to come up with something better. The best I can > > > currently come up with is to have a method associated with the lease that > > > would invalidate the RDMA context that holds the pins in the same way that > > > a file close would do it. > > > > This is back to requiring all RDMA HW to have some new behavior they > > currently don't have.. > > > > The main objection to the current ODP & DAX solution is that very > > little HW can actually implement it, having the alternative still > > require HW support doesn't seem like progress. > > > > I think we will eventually start seein some HW be able to do this > > invalidation, but it won't be universal, and I'd rather leave it > > optional, for recovery from truely catastrophic errors (ie my DAX is > > on fire, I need to unplug it). > > Agreed. I think software wise there is not much some of the devices can do > with such an "invalidate". So out of curiosity: What does RDMA driver do when userspace just closes the file pointing to RDMA object? It has to handle that somehow by aborting everything that's going on... And I wanted similar behavior here. Honza -- Jan Kara SUSE Labs, CR