Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp3646826img; Mon, 25 Mar 2019 14:49:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqwIGRzubznp12M/iVPLfUyahPVh8/0CntAQ3EBR63BQz2USjFK5qugu5nwC1YA6XI5XLmyw X-Received: by 2002:a63:6903:: with SMTP id e3mr25579157pgc.147.1553550586031; Mon, 25 Mar 2019 14:49:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553550586; cv=none; d=google.com; s=arc-20160816; b=GriwIrYfa8KDzsi5dT8GEHa+kJKc0qbliOqj661mdxWnqCTYwegmAU6oscEOxQ8/8k CoqJoXY7E7p/z8Fo71yIUkcNyr1f1GbHtbFpLFKEocLBvGDPO1+bEFOq1HeRpQO2sNrF L/9+bU/3hN5olWNCMWQecSfXTZa0GitK2MZ5YF1Yjo/oe/I75BcHr7/1vla50nfjYyVE NR/zWpalQhrNrAxM80SwlSexH7H1V+9TBERjP+MEd/kZH5bOnlFGPbIbGU1yN/cFgQTj 7hFU72OsDIOsD2q/2DouESgoIg8FULKsrXJiqLJuIQK4nnxKmAWFjiJti1cGEKzZ4YfA QvRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=onxJka+F9uKKMLVaiVQoxHthzGT9cz6YE6mr84qlisg=; b=rIqoSURwVVzRC/VZQBmArEDPW5sQGjn4iJ89sX9se7hPuR0aFHZRkq24ZreskXcC8q WKWwNm4Zrll7gVS7yrtSa+UWziGDy1wOOekV2SD6TrVu+WH1F4kqTpDdI/376BfGVwKB V5NICghfxiiIgfeGksS4bk78nxh+EsNbzitl0gyBBy6OSzjPxMvQJV9G2aFo2Eoudc/C cK0lka9ZEX48M/l8pKA/4uLlIU46NWwEE4CuKuyGwlz03wKH3qKdjk18uLP775lKCzSA bKPkUckb1KXBSYTNX+g4jad3y2+ET8784QXdPAVJH/7hhohEYZKQW7GzEWqeOFERLFKq rfhg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=fqSAIi0i; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r23si14090207pgv.406.2019.03.25.14.49.30; Mon, 25 Mar 2019 14:49:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=fqSAIi0i; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730387AbfCYVsm (ORCPT + 99 others); Mon, 25 Mar 2019 17:48:42 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:40586 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729489AbfCYVsl (ORCPT ); Mon, 25 Mar 2019 17:48:41 -0400 Received: by mail-wr1-f65.google.com with SMTP id t5so11927887wri.7 for ; Mon, 25 Mar 2019 14:48:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=onxJka+F9uKKMLVaiVQoxHthzGT9cz6YE6mr84qlisg=; b=fqSAIi0iGWxsIYfKuzexyoNiBKxdxWgoQcuvanYMvd/ek5M2AP0LWrRWvJ1jW8RU+g nd10w8YKF3lFE6g46g+4gwqWOJb2COFAVliDD2mnH/MNbXPZ4UZ5PkSiomhkexnUaPz2 gcw4RhxU1rgZSPcYezydl84LvIyhiuwxkkOx3hSpctp3saiIwUsSt9xFv6BzF8Y5El5x pwYvVbhDmKbKdmVQp23yYFJ8SwZTct65ThbymV+voI8jNOZJp7MeQ9dzEFNq+e3IEOHI hMHdI6LcI9R+P0BuzL+2YzgpcffIlsLYTFOWkKG3MOpMiqdJZamR7uP8kPi6PfvglTsf 8JNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=onxJka+F9uKKMLVaiVQoxHthzGT9cz6YE6mr84qlisg=; b=H0tfzZ1QCkDu1/nJWQTbKu1hjBzVC0XbbQ8BW4WUCi+HpxFd7irOw/lOn3wxlV5oJ2 G42IKl94RNWNQEJp+ccZyun0mwPkIJNtNcBorKjLxnfgjYTsNABM9LxtGhROAsXsWkqA tLzMbWL7qtgFNga+b6w6anG31SepX9cipWEQ4M0VQ5iNXJiJNk0sZ04hTovsnun/P+mx uRzdgT2wdjE3NykB3hUF3qphjNKCS0ffkECbC/sE5wAobiddzt+YR1SI2/NYeKNi0KfA qR16OlVW2AtX/J5MGOIi73Xw79Pfx7tw8UwBTILMUy5krMAdtuPoMkcaX+g1LYOUMKm1 osWw== X-Gm-Message-State: APjAAAXylO8S9XAWkf0nUbt6FsUN1SZImIulCe05nAb/n9wVqJfYLDRc uq47tXSzPaF2YHVXQpzfcmxCqT9s0mTE6vtNECpJvA== X-Received: by 2002:a5d:6b05:: with SMTP id v5mr10999257wrw.314.1553550518425; Mon, 25 Mar 2019 14:48:38 -0700 (PDT) MIME-Version: 1.0 References: <20190322025135.118201-1-fengc@google.com> <20190322025135.118201-2-fengc@google.com> <20190322150255.GA76423@google.com> <20190324175633.GA5826@google.com> <20190324204454.GA102207@google.com> In-Reply-To: From: Erick Reyes Date: Mon, 25 Mar 2019 14:48:27 -0700 Message-ID: Subject: Re: [RFC v2 1/3] dma-buf: give each buffer a full-fledged inode To: Chenbo Feng Cc: Joel Fernandes , Sandeep Patil , LKML , DRI mailing list , linux-media@vger.kernel.org, kernel-team@android.com, Sumit Semwal , Daniel Vetter , John Stultz Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In the original userspace implementation Greg wrote, he was iterating the directory entries in proc//map_files, doing readlink() on each to find out whether the entry was a dmabuf. This turned out to be very slow so we reworked it to parse proc//maps instead. On Mon, Mar 25, 2019 at 12:35 PM Chenbo Feng wrote: > > On Sun, Mar 24, 2019 at 1:45 PM Joel Fernandes wrote: > > > > Hi Sandeep, > > > > On Sun, Mar 24, 2019 at 10:56:33AM -0700, Sandeep Patil wrote: > > > On Fri, Mar 22, 2019 at 11:02:55AM -0400, Joel Fernandes wrote: > > > > On Thu, Mar 21, 2019 at 07:51:33PM -0700, Chenbo Feng wrote: > > > > > From: Greg Hackmann > > > > > > > > > > By traversing /proc/*/fd and /proc/*/map_files, processes with CAP_ADMIN > > > > > can get a lot of fine-grained data about how shmem buffers are shared > > > > > among processes. stat(2) on each entry gives the caller a unique > > > > > ID (st_ino), the buffer's size (st_size), and even the number of pages > > > > > currently charged to the buffer (st_blocks / 512). > > > > > > > > > > In contrast, all dma-bufs share the same anonymous inode. So while we > > > > > can count how many dma-buf fds or mappings a process has, we can't get > > > > > the size of the backing buffers or tell if two entries point to the same > > > > > dma-buf. On systems with debugfs, we can get a per-buffer breakdown of > > > > > size and reference count, but can't tell which processes are actually > > > > > holding the references to each buffer. > > > > > > > > > > Replace the singleton inode with full-fledged inodes allocated by > > > > > alloc_anon_inode(). This involves creating and mounting a > > > > > mini-pseudo-filesystem for dma-buf, following the example in fs/aio.c. > > > > > > > > > > Signed-off-by: Greg Hackmann > > > > > > > > I believe Greg's address needs to be updated on this patch otherwise the > > > > emails would just bounce, no? I removed it from the CC list. You can still > > > > keep the SOB I guess but remove it from the CC list when sending. > > > > > > > > Also about the minifs, just playing devil's advocate for why this is needed. > > > > > > > > Since you are already adding the size information to /proc/pid/fdinfo/ , > > > > can just that not be used to get the size of the buffer? What is the benefit > > > > of getting this from stat? The other way to get the size would be through > > > > another IOCTL and that can be used to return other unique-ness related metadata > > > > as well. Neither of these need creation of a dedicated inode per dmabuf. > > > > > > Can you give an example of "unique-ness related data" here? The inode seems > > > like the best fit cause its already unique, no? > > > > I was thinking dma_buf file pointer, but I agree we need the per-inode now (see below). > > > > > > Also what is the benefit of having st_blocks from stat? AFAIK, that is the > > > > same as the buffer's size which does not change for the lifetime of the > > > > buffer. In your patch you're doing this when 'struct file' is created which > > > > AIUI is what reflects in the st_blocks: > > > > + inode_set_bytes(inode, dmabuf->size); > > > > > > Can some of the use cases / data be trimmed down? I think so. For example, I > > > never understood what we do with map_files here (or why). It is perfectly > > > fine to just get the data from /proc//fd and /proc//maps. I guess > > > the map_files bit is for consistency? > > > > It just occured to me that since /proc/ > one of the fields, so indeed an inode per buf is a very good idea for the > > user to distinguish buffers just by reading /proc/ alone.. > > > > I also, similar to you, don't think map_files is useful for this usecase. > > map_files are just symlinks named as memory ranges and pointing to a file. In > > this case the symlink will point to default name "dmabuf" that doesn't exist. > > If one does stat(2) on a map_file link, then it just returns the inode number > > of the symlink, not what the map_file is pointing to, which seems kind of > > useless. > > > I might be wrong but I don't think we did anything special for the > map_files in this patch. I think the confusion might be from commit > message where Greg mentioned the map_files to describe the behavior of > shmem buffer when comparing it with dma-buf. The file system > implementation and the file allocation action in this patch are just > some minimal effort to associate each dma_buf object with an inode and > properly populate the size information in the file object. And we > didn't use proc/pid/map_files at all in the android implementation > indeed. > > > > Which makes me think both maps and map_files can be made more useful if we can > > also make DMA_BUF_SET_NAME in the patch change the underlying dentry's name > > from the default "dmabuf" to "dmabuf:" ? > > > > That would be useful because: > > 1. It should make /proc/pid/maps also have the name than always showing > > "dmabuf". > > 2. It should make map_files also point to the name of the buffer than just > > "dmabuf". Note that memfd_create(2) already takes a name and the maps_file > > for this points to the name of the buffer created and showing it in both maps > > and map_files. > > > > I think this also removes the need for DMA_BUF_GET_NAME ioctl since the > > dentry's name already has the information. I can try to look into that... > > BTW any case we should not need GET_NAME ioctl since fdinfo already has the > > name after SET_NAME is called. So let us drop that API? > > > > > May be, to make it generic, we make the tracking part optional somehow to > > > avoid the apparent wastage on other systems. > > > > Yes, that's also fine. But I think if we can bake tracking into existing > > mechanism and keep it always On, then that's also good for all other dmabuf > > users as well and simplifies the kernel configuration for vendors. > > > > > > I am not against adding of inode per buffer, but I think we should have this > > > > debate and make the right design choice here for what we really need. > > > > > > sure. > > > > Right, so just to summarize: > > - The intention here is /proc//maps will give uniqueness (via the inode > > number) between different memory ranges. That I think is the main benefit > > of the patch. > > - stat gives the size of buffer as does fdinfo > > - fdinfo is useful to get the reference count of number of sharers of the > > buffer. > > - map_files is not that useful for this usecase but can be made useful if > > we can name the underlying file's dentry to something other than "dmabuf". > > - GET_NAME is not needed since fdinfo already has the SET_NAMEd name. > > > > Do you agree? > > > Thanks for summarize it, I will look into the GET_NAME/SET_NAME ioctl > to make it more useful as you suggested above. Also, I will try to add > some test to verify the behavior. > > > > Just to lay it out, there is a cost to unique inode. Each struct inode is 560 > > bytes on mainline with x86_64_defconfig. With 1000 buffers, we're looking at > > ~ 0.5MB of allocation. However I think I am convinced we need to do it > > considering the advantages, and the size is trivial considering advantages. > > Arguably large number dmabuf allocations are more likely to succeed with > > devices with larger memory resources anyway :) > > > > It is good to have this discussion. > > > > thanks, > > > > - Joel > > > > -- > > You received this message because you are subscribed to the Google Groups "kernel-team" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > >