Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp303418yba; Thu, 25 Apr 2019 23:47:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqy93yQoFgl+nYA3DjDPhL2SfLdDWwMt/Ae8BvpcmLySG2p8zwbZnJcUee3LKkqKxLB/9YXM X-Received: by 2002:a62:e414:: with SMTP id r20mr9322118pfh.143.1556261221052; Thu, 25 Apr 2019 23:47:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556261221; cv=none; d=google.com; s=arc-20160816; b=fEhC+spDcgjAXnXL72WZa4e1DgT6SyIo2xmXPjrwcwILkC3S6QXalKBvdIqGW7E5QK kiHUWu/6v3ZjXnYsk5F0dyb3VF1ZN2xWGzdeMGeX8MBxbiLf+f1HTdzXP6aFSTs/WAuV zEGrbnfr2wx0hLXYEZK+kDb1Yq5y15bHJuEVMAedbxs/n8rte7bGM1DnXwmKIyPTwTCa KGBZyzBd2s58LaXWksHGfzWvWPpMk4+KKEkQds2s2wbFeyeN+EbEdhFiuoxLEC7HoTbg xrDNAMPVCRJVscGAtX7D7CK71alhjjWmQXfs5V2IjBroP83rFZwI2Otkp5qP6laj2y+T IOfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=GM/AyfiDdVPnH6Ntvyi8pR1gr2NlYEpRMclicwXOzxo=; b=XbHBn0JU1diK7ohN0fFLiINZPiMhJ30lDx+26Q0FgAJiwiQXRL3T82UTx9a5ID2mwX WDLLhyFyAOVwBAqgHqCuUUF+fVE3ZeTn5thPbZ7nm9PxJlZkPfbKGVo1kw9g2zYO9Zt/ 2AMbnnJexV9G1l+/VRUNdJ8uwgD4VoSzsMEM1QnU5hscgJUBdonOa6TMl0oumEWzeXqZ 11dPI8Useq+xYAsytgEuI95qVTgUCSbH+GBIhY21XXPeAXYFGQX7CZU5QPyHY7LKFHUU 6ZqSwLSoZulClh0tGDV+nDc3/pHDVOC55UyYgwFFkZjDvxLvlDT1V6qal5RfI/0smJFT AY3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f1si23726743pgm.373.2019.04.25.23.46.46; Thu, 25 Apr 2019 23:47:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727094AbfDZG2X (ORCPT + 99 others); Fri, 26 Apr 2019 02:28:23 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:33994 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725800AbfDZG2X (ORCPT ); Fri, 26 Apr 2019 02:28:23 -0400 Received: from dread.disaster.area (pa49-181-171-240.pa.nsw.optusnet.com.au [49.181.171.240]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 8E06D43C0BA; Fri, 26 Apr 2019 16:28:18 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92) (envelope-from ) id 1hJuLJ-0006GU-0F; Fri, 26 Apr 2019 16:28:17 +1000 Date: Fri, 26 Apr 2019 16:28:16 +1000 From: Dave Chinner To: Jerome Glisse Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [LSF/MM TOPIC] Direct block mapping through fs for device Message-ID: <20190426062816.GG1454@dread.disaster.area> References: <20190426013814.GB3350@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190426013814.GB3350@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 cx=a_idp_d a=LhzQONXuMOhFZtk4TmSJIw==:117 a=LhzQONXuMOhFZtk4TmSJIw==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=oexKYjalfGEA:10 a=7-415B0cAAAA:8 a=VUkQmcHZEWhioDgAkr4A:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 25, 2019 at 09:38:14PM -0400, Jerome Glisse wrote: > I see that they are still empty spot in LSF/MM schedule so i would like to > have a discussion on allowing direct block mapping of file for devices (nic, > gpu, fpga, ...). This is mm, fs and block discussion, thought the mm side > is pretty light ie only adding 2 callback to vm_operations_struct: The filesystem already has infrastructure for the bits it needs to provide. They are called file layout leases (how many times do I have to keep telling people this!), and what you do with the lease for the LBA range the filesystem maps for you is then something you can negotiate with the underlying block device. i.e. go look at how xfs_pnfs.c works to hand out block mappings to remote pNFS clients so they can directly access the underlying storage. Basically, anyone wanting to map blocks needs a file layout lease and then to manage the filesystem state over that range via these methods in the struct export_operations: int (*get_uuid)(struct super_block *sb, u8 *buf, u32 *len, u64 *offset); int (*map_blocks)(struct inode *inode, loff_t offset, u64 len, struct iomap *iomap, bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); Basically, before you read/write data, you map the blocks. if you've written data, then you need to commit the blocks (i.e. tell the fs they've been written to). The iomap will give you a contiguous LBA range and the block device they belong to, and you can then use that to whatever smart DMA stuff you need to do through the block device directly. If the filesystem wants the space back (e.g. because truncate) then the lease will be revoked. The client then must finish off it's outstanding operations, commit them and release the lease. To access the file range again, it must renew the lease and remap the file through ->map_blocks.... > So i would like to gather people feedback on general approach and few things > like: > - Do block device need to be able to invalidate such mapping too ? > > It is easy for fs the to invalidate as it can walk file mappings > but block device do not know about file. If you are needing the block device to invalidate filesystem level information, then your model is all wrong. > - Do we want to provide some generic implementation to share accross > fs ? We already have a generic interface, filesystems other than XFS will need to implement them. > - Maybe some share helpers for block devices that could track file > corresponding to peer mapping ? If the application hasn't supplied the peer with the file it needs to access, get a lease from and then map an LBA range out of, then you are doing it all wrong. Cheers, Dave. -- Dave Chinner david@fromorbit.com