Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754097AbbFJBjw (ORCPT ); Tue, 9 Jun 2015 21:39:52 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:55378 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753418AbbFJBjp (ORCPT ); Tue, 9 Jun 2015 21:39:45 -0400 Date: Tue, 9 Jun 2015 18:39:02 -0700 From: Calvin Owens To: Andrew Morton CC: Alexey Dobriyan , "Eric W. Biederman" , Al Viro , Miklos Szeredi , Zefan Li , Oleg Nesterov , Joe Perches , David Howells , , , Andy Lutomirski , Cyrill Gorcunov , Kees Cook , "Kirill A. Shutemov" Subject: Re: [PATCH v6] procfs: Always expose /proc//map_files/ and make it readable Message-ID: <20150610013902.GA176908@mail.thefacebook.com> References: <1432005006-3428-1-git-send-email-calvinowens@fb.com> <1433821173-2804704-1-git-send-email-calvinowens@fb.com> <20150609141300.b80eeec15b2c379146816c06@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline In-Reply-To: <20150609141300.b80eeec15b2c379146816c06@linux-foundation.org> User-Agent: Mutt/1.5.20 (2009-12-10) X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151,1.0.33,0.0.0000 definitions=2015-06-10_02:2015-06-09,2015-06-10,1970-01-01 signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5757 Lines: 144 On Tuesday 06/09 at 14:13 -0700, Andrew Morton wrote: > On Mon, 8 Jun 2015 20:39:33 -0700 Calvin Owens wrote: > > > Currently, /proc//map_files/ is restricted to CAP_SYS_ADMIN, and > > is only exposed if CONFIG_CHECKPOINT_RESTORE is set. > > > > This interface very useful because it allows userspace to stat() > > deleted files that are still mapped by some process, which enables a > > much quicker and more accurate answer to the question "How much disk > > space is being consumed by files that are deleted but still mapped?" > > than is currently possible. > > Why is that information useful? > > I could perhaps think of some use for "How much disk space is being > consumed by files that are deleted but still open", but to count the > mmapped-then-unlinked files while excluding the opened-then-unlinked > files seems damned peculiar. Let's phrase the question a bit more generically: "How much disk space is being consumed by files that have been unlinked, but are still referenced by some process?" There are two pieces to this problem: 1) Unlinked files that are still open (whether mapped or not) 2) Unlinked files that are not open, but are still mapped You can track down everything in (1) using /proc//fd/*, and you can use stat() to figure out how much space they're using. But directly measuring how much space (2) consumes is actually not currently possible from userspace: there's no way to stat() the files. You can get the inode number from /proc//maps, but that still doesn't get you anywhere because it's been unlinked from the filesystem. So I'm not looking to measure (2) and exclude (1): I'm looking to have a way to directly measure (2) at all. The reason I say "directly", and I say "quicker and more accurate" in the original message, is that there is a very ugly way to answer this question right now: you sum up the number of blocks used by every file on the disk and subtract it from what statfs() tells you. This obviously stinks, and becomes untenable once your filesystem is large enough. > IOW, this changelog failed to explain the value of the patch. Bad > changelog! Please sell it to us. Preferably with real-world use > cases. The real-world use case is catching long-lived processes that leak references to temporary files and waste space on the disk. When such processes leak file-backed mappings, this wasted space is especially difficult to detect until it gets out of hand. The map_files/ interface eliminates this difficulty. I've included a little test program at the end of this file to illustrate what I'm getting at here. It creates a file at /tmp/DELETEDFILE: calvinowens@Haydn:~$ gcc test.c calvinowens@Haydn:~$ ./a.out & [1] 5832 Holding mapping at 0x7fe74d1ea000 calvinowens@Haydn:~$ lsof -p `pgrep a.out` COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME a.out 5832 calvinowens cwd DIR 254,1 4096 3413033 /home/calvinowens a.out 5832 calvinowens rtd DIR 254,1 4096 2 / a.out 5832 calvinowens txt REG 254,1 7512 3408268 /home/calvinowens/a.out a.out 5832 calvinowens mem REG 254,1 1729984 4456767 /lib/x86_64-linux-gnu/libc-2.19.so a.out 5832 calvinowens mem REG 254,1 140928 4456619 /lib/x86_64-linux-gnu/ld-2.19.so a.out 5832 calvinowens mem REG 0,32 32768 184946 /tmp/DELETEDFILE a.out 5832 calvinowens 0u CHR 136,2 0t0 5 /dev/pts/2 a.out 5832 calvinowens 1u CHR 136,2 0t0 5 /dev/pts/2 a.out 5832 calvinowens 2u CHR 136,2 0t0 5 /dev/pts/2 calvinowens@Haydn:~$ killall a.out [1]+ Terminated ./a.out calvinowens@Haydn:~$ gcc -DDO_UNLINK test.c calvinowens@Haydn:~$ ./a.out & [1] 5842 Holding mapping at 0x7fec8ae63000 calvinowens@Haydn:~$ lsof -p `pgrep a.out` COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME a.out 5842 calvinowens cwd DIR 254,1 4096 3413033 /home/calvinowens a.out 5842 calvinowens rtd DIR 254,1 4096 2 / a.out 5842 calvinowens txt REG 254,1 7640 3408268 /home/calvinowens/a.out a.out 5842 calvinowens mem REG 254,1 1729984 4456767 /lib/x86_64-linux-gnu/libc-2.19.so a.out 5842 calvinowens mem REG 254,1 140928 4456619 /lib/x86_64-linux-gnu/ld-2.19.so a.out 5842 calvinowens DEL REG 0,32 184946 /tmp/DELETEDFILE a.out 5842 calvinowens 0u CHR 136,2 0t0 5 /dev/pts/2 a.out 5842 calvinowens 1u CHR 136,2 0t0 5 /dev/pts/2 a.out 5842 calvinowens 2u CHR 136,2 0t0 5 /dev/pts/2 Notice the gap under "SIZE/OFF" in the 2nd output? This is because lsof has no possible way to actually determine the leaked file's size. That's the functionality "hole" I'm trying to fill with this patch. Does that all seem sensible? Thanks, Calvin -- #include #include #include #include #include #include #include #include int main(void) { int ret, fd; void *map; fd = open("/tmp/DELETEDFILE", O_CREAT|O_TRUNC|O_RDWR, 0777); if (fd == -1) return -1; ret = ftruncate(fd, 32768); if (ret == -1) return -1; map = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd, 0); if (map == MAP_FAILED) return -1; close(fd); #ifdef DO_UNLINK unlink("/tmp/DELETEDFILE"); #endif printf("Holding mapping at %p\n", map); while (1) sleep(UINT_MAX); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/