Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp870147imj; Fri, 15 Feb 2019 08:08:59 -0800 (PST) X-Google-Smtp-Source: AHgI3IaskBgIkK9gtz9nIDa25KlTCDuvZGpeYSMFBJwM4lguVe0ZIj50eRBmdDsKI47aTjuvzvgs X-Received: by 2002:a62:e005:: with SMTP id f5mr10866893pfh.64.1550246939442; Fri, 15 Feb 2019 08:08:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550246939; cv=none; d=google.com; s=arc-20160816; b=safXNMS+IEpgQeBXH2aguQVJZzYc3Cw7swxwz1aYPQWqILssW5+tQyAhpfxzM0KdQd VgMx96aVpHGGmmL1o7oXd3H383YRddMD1XaWNZZ6LbvjVHHAA/XXE0hUBS0mEdrcr1SZ 2+JmHIrZ/MZY+PMqwOUuc5NSlnmCT1G6MQDOio7FMstucb3TCZ0SkbkI8h2fq52c+gAA c8RsD/Ouo3e/W8gTDmW4E44uU5ggdPgJfe2JMHLfoBx5Y9//DPrjkuWxJnEWPUqGP6hV rOMrU9MLgLgFFL+Htu583Yppe9wFQvK8GKg0w5rMq3T7ZOkh8y1kjeLm0NlQZGps+MCK LTdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=t8q/0PA8uXwKk5D1EfntF2AkELIreh2vw65y6k9f1Eg=; b=UyO2C/EobMA20sFxQwvQV9NzPWygr7SgoyreOFtNkMg0lpNDa8GmZmA/fwigChyda4 SZP4jwFx6+y5/Pxism9PLJ2C1byXZ915Q5ZEdJEno7V6M/SQisgJIjJOMNSkwWxCh20Q bTuue3OgLkTm0WB9cA/XOTa0Kvu4lBATha6HPu4dq8TNqDNqEhVYbjTOqJLgrWhCA9cI UCTKVTRiItRIeQZM/4/uYL3dOjr2cdw0Fi323pdI2XZtOAKxvtFhsvs4OnkQUJJPGO/6 9N8vCjjVU7FeUzzidcgjY/WdzRVssEOCaJljAQ2y5OLHX02iCljsiyOCOYPoU8aGpALt ee6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=T+oJqY33; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 137si4018095pgc.237.2019.02.15.08.08.43; Fri, 15 Feb 2019 08:08:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=T+oJqY33; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729742AbfBOKiu (ORCPT + 99 others); Fri, 15 Feb 2019 05:38:50 -0500 Received: from mail-ot1-f67.google.com ([209.85.210.67]:43116 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726335AbfBOKit (ORCPT ); Fri, 15 Feb 2019 05:38:49 -0500 Received: by mail-ot1-f67.google.com with SMTP id n71so15704941ota.10 for ; Fri, 15 Feb 2019 02:38:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=t8q/0PA8uXwKk5D1EfntF2AkELIreh2vw65y6k9f1Eg=; b=T+oJqY33MghHK629guMGsBziuXSkShHTeTNFsbgkXTHx+IAW/KBPGcAwcfHBiZbAnP KON1rsrDEOhxmAx01IJ0+Ew0Q/7WenpKhDlR59uHaieaDS/Ocf/yeIoOSX+6y761zIcz Y6UiW6nDWoUBKK/oDK6YqhcgIgeq51uxrnq9AL0oAmRzXdJ2OBvjcMyyyN/NnshVBPJy 2emnezxTtq1y/rGvdN/vRQbjb6VXNWURLsdCOOOhcerUCGxDl8M4mhSnzjW17Lpfvf5Q XMDCsKL9p8wt1t9JdKj0aijckQoRjc/gxqukBCJFohUpx1OrY9bEyHWuCvmChRGn/W8n tluQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=t8q/0PA8uXwKk5D1EfntF2AkELIreh2vw65y6k9f1Eg=; b=cZtNIE/gPAf6uEeAv31up1DtrkI/Xljt21EA0Tjyj1ecgrUUCg+18gf0bf/vPSpPXs oK+japE0CWaGZHEKp1ej2OK+bm3J5UCQjTM5q5YrV5yqTMBBnLhwT1G+JTSIHiLTb8/u JjvDQ3JKV264JjzC/t5DPgtbduZHDe+TAXhs/IXR4zAtjkPobTCZ0TQOWsg0FyunpLnX YqkrnL4W05nLq7bZSpEZSeTQGQt79YG8mfa1qvQ74Liz+/tjK/H1leNWFzzg5g7RmDWy 9+RH2VTUTB/1OFNpiUeTy4WZC1Ehn6slxFE3cW3a2I9E+CbOYaq7Cil5qtprKS6VfNA+ /SSQ== X-Gm-Message-State: AHQUAua0Z4EEj4zlpX0wjv9Kj4xK29SnHf/khZNKvNi7DhcZrMcckC8e 6mbqpGGhWX7pJlw6ub886NM1Kw== X-Received: by 2002:a9d:6a:: with SMTP id 97mr5681878ota.313.1550227128321; Fri, 15 Feb 2019 02:38:48 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v20sm1998411otk.77.2019.02.15.02.38.46 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 15 Feb 2019 02:38:47 -0800 (PST) Date: Fri, 15 Feb 2019 02:38:40 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Darrick J. Wong" cc: Andrew Morton , Matej Kupljen , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hughd@google.com Subject: Re: tmpfs inode leakage when opening file with O_TMP_FILE In-Reply-To: <20190215002631.GB6474@magnolia> Message-ID: References: <20190214154402.5d204ef2aa109502761ab7a0@linux-foundation.org> <20190215002631.GB6474@magnolia> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 14 Feb 2019, Darrick J. Wong wrote: > [cc the shmem maintainer and the mm list] Yup, thanks - Matej also did so the day after sending to linux-kernel. > > On Thu, Feb 14, 2019 at 03:44:02PM -0800, Andrew Morton wrote: > > (cc linux-fsdevel) Okay, thanks, but a tmpfs peculiarity we think. > > > > On Mon, 11 Feb 2019 15:18:11 +0100 Matej Kupljen wrote: > > > > > Hi, > > > > > > it seems that when opening file on file system that is mounted on > > > tmpfs with the O_TMPFILE flag and using linkat call after that, it > > > uses 2 inodes instead of 1. > > > > > > This is simple test case: > > > > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > > > > #define TEST_STRING "Testing\n" > > > > > > #define TMP_PATH "/tmp/ping/" > > > #define TMP_FILE "file.txt" > > > > > > > > > int main(int argc, char* argv[]) > > > { > > > char path[PATH_MAX]; > > > int fd; > > > int rc; > > > > > > fd = open(TMP_PATH, __O_TMPFILE | O_RDWR, > > > S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | > > > S_IROTH | S_IWOTH); > > > > > > rc = write(fd, TEST_STRING, strlen(TEST_STRING)); > > > > > > snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd); > > > linkat(AT_FDCWD, path, AT_FDCWD, TMP_PATH TMP_FILE, AT_SYMLINK_FOLLOW); > > > close(fd); > > > > > > return 0; > > > } > > > > > > I have checked indoes with "df -i" tool. The first inode is used when > > > the call to open is executed and the second one when the call to > > > linkat is executed. > > > It is not decreased when close is executed. > > > > > > I have also tested this on an ext4 mounted fs and there only one inode is used. > > > > > > I tested this on: > > > $ cat /etc/lsb-release > > > DISTRIB_ID=Ubuntu > > > DISTRIB_RELEASE=18.04 > > > DISTRIB_CODENAME=bionic > > > DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS" > > > > > > $ uname -a > > > Linux Orion 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC > > > 2018 x86_64 x86_64 x86_64 GNU/Linux > > Heh, tmpfs and its weird behavior where each new link counts as a new > inode because "each new link needs a new dentry, pinning lowmem, and > tmpfs dentries cannot be pruned until they are unlinked." That's very much a peculiarity of tmpfs, so agreed: it's what I expect to be the cause, but I've not actually tracked it through and fixed yet. > > It seems to have this behavior on 5.0-rc6 too: Yes, it does. > > $ /bin/df -i /tmp ; ./c ; /bin/df -i /tmp > Filesystem Inodes IUsed IFree IUse% Mounted on > tmp 1019110 17 1019093 1% /tmp > Filesystem Inodes IUsed IFree IUse% Mounted on > tmp 1019110 19 1019091 1% /tmp > > Probably because shmem_tmpfile -> shmem_get_inode -> shmem_reserve_inode > which decrements ifree when we create the tmpfile, and then the > d_tmpfile decrements i_nlink to zero. Now we have iused=1, nlink=0, > assuming iused=itotal-ifree like usual. > > Then the linkat call does: > > shmem_link -> shmem_reserve_inode > > which decrements ifree again and increments i_nlink to 1. Now we have > iused=2, nlink=1. > > The program exits, which closes the file. /tmp/ping/file.txt still > exists and we haven't evicted inodes yet, so nothing much happens. > > But then I added in rm -rf /tmp/ping/file.txt to see what happens. > shmem_unlink contains this: > > if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)) > shmem_free_inode(inode->i_sb); > > So shmem_iunlink *doesnt* decrement ifree but does drop the nlink, so > our state is now iused=2, nlink=0. > > Now we evict the inode, which decrements ifree, so iused=1 and the inode > goes away. Oops, we just leaked an ifree. > > I /think/ the proper fix is to change shmem_link to decrement ifree only > if the inode has nonzero nlink, e.g. > > /* > * No ordinary (disk based) filesystem counts links as inodes; > * but each new link needs a new dentry, pinning lowmem, and > * tmpfs dentries cannot be pruned until they are unlinked. If > * we're linking an O_TMPFILE file into the tmpfs we can skip > * this because there's still only one link to the inode. > */ > if (inode->i_nlink > 0) { > ret = shmem_reserve_inode(inode->i_sb); > if (ret) > goto out; > } > > Says me who was crawling around poking at O_TMPFILE behavior all morning. Thanks for the Cc on that patch: I thought at first that you were coincidentally fixing up Matej's observation, but from its commit message no. That work is just a generic cleanup to suit XFS needs, and won't change the tmpfs behavior one way or the other. > Not sure if that's right; what happens to the old dentry? I'm relieved to see your "/think/" above and "Not sure" there :) Me too. It is so easy to get these counting things wrong, especially when distributed between the generic and the specific file system. I'm not going to attempt a pronouncement until I've had time to sink properly into it at the weekend, when I'll follow your guide and work it through - thanks a lot for getting this far, Darrick. Hugh > > --D > > > > If you need any more information, please let me know. > > > > > > And please CC me when replying, I am not subscribed to the list. > > > > > > Thanks and BR, > > > Matej