Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp3125374imj; Mon, 18 Feb 2019 20:35:16 -0800 (PST) X-Google-Smtp-Source: AHgI3IbrrpyBpL5AJXuTxB66HOed5sD3ugEHYl0+taVE2I5xk6y99jkk0htudgudKtiA1XP0TGtu X-Received: by 2002:a17:902:b097:: with SMTP id p23mr27621078plr.36.1550550915959; Mon, 18 Feb 2019 20:35:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550550915; cv=none; d=google.com; s=arc-20160816; b=moxLSK6G8jlrU3AzmQbV2HNTNFYYdjma/4wemjEcxixBwOa1mCVFTc55kBSmBL0WzE S1KL75RnLVf+YVPnbfNYJ52xmSviR3xJ24QqTn7MtTtFUI/vDdeXyXO8D8xdZ2KgXJFf 6EwCFwy2ubRFGpJbS+3BAi0LbjNN7y3nNpHaT2HZUEos6aXpgJ4k/0AfOJRUZ17M7TC9 /lgixqhhblmGmtSPtQRD7gccpHYPH+O/nCE93+cGeVwRgwNSuVrxBUCjW+J7horh9UC2 fPce40H/BP10hYoaHVQh0NSu8gd3jkd+GQrYxOF21qLvG8x3XkJPu1IHJdIYKNzPZhEY bQNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=cS1DH879CcIl+bs6b1wVapY22RVLC7csNXI6tVxFYVA=; b=kqkAKeqTLHUdcDeF86nQ2VmT4930IPKnuht1hO4D4VECNiZtFabNAcQpRgENdaKBc9 c/1Kk8OjwhoheZ4fRirnrx3mOBR5cTxKie65GFtAUMdk/akv4BblSb+BNKI2apSJMZkc nyxpQ82Fs5GWzN9heo8O6DMoTVDuJ+kkulclN8sYgVPZGdLI9jk2OusMd0TuQETR5Pag jBAn1U1m6Br7NAp0pbutZisa/b0M6n5s0ogBLLmNghovPmi17uo6WXXjnmFZlt6gC1dr 1p21QdRmD/Vl+qZvcbHB5Z6mlKbi4tLtU8N/3t2HYjwPdPJDZ9tAGY8kiEE6woNtnGH9 B1/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=Ds0m1BM2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 38si16072232pld.62.2019.02.18.20.35.01; Mon, 18 Feb 2019 20:35:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=Ds0m1BM2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726388AbfBSEeY (ORCPT + 99 others); Mon, 18 Feb 2019 23:34:24 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:41358 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725770AbfBSEeY (ORCPT ); Mon, 18 Feb 2019 23:34:24 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x1J4YG05071284; Tue, 19 Feb 2019 04:34:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2018-07-02; bh=cS1DH879CcIl+bs6b1wVapY22RVLC7csNXI6tVxFYVA=; b=Ds0m1BM2QxOJ1KJN0N2goUBwoMP5EWX5pR1QMMZWCWHcyBitekTTRFXRCiKgxpndQ1Un RRAiGkbatWXNAL4DP/KgNlgRq7IATMm1yAyKf0S2YDmzy9DXSro7CSBvWoXesIF+mDHT ooieNsOjgRKOiJH07uyN9+0nPTwTJiaNnzIhF691mzfwe6D4JZdcrWv3Dg5p0kbHpEjW PJzjEHU9mwz4vq1K06m55ivhHwspKi5kBZrfnQwk6+0fQaumzjYydlj0xEZIc8K5M63n uZO4tQZNwtfJ64qTcUNR1UF/SQDl7dBL0NAMMiTkGKoJ2ceP7w+DtmAvr1+JJhTI6F2y Uw== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2qpb5r8qn7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 19 Feb 2019 04:34:16 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id x1J4YADm022085 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 19 Feb 2019 04:34:11 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x1J4YAHO025021; Tue, 19 Feb 2019 04:34:10 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 18 Feb 2019 20:34:10 -0800 Date: Mon, 18 Feb 2019 20:34:08 -0800 From: "Darrick J. Wong" To: Hugh Dickins Cc: Andrew Morton , Matej Kupljen , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: tmpfs inode leakage when opening file with O_TMP_FILE Message-ID: <20190219043408.GG6503@magnolia> References: <20190214154402.5d204ef2aa109502761ab7a0@linux-foundation.org> <20190215002631.GB6474@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9171 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902190034 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 18, 2019 at 08:23:20PM -0800, Hugh Dickins wrote: > On Fri, 15 Feb 2019, Hugh Dickins wrote: > > On Thu, 14 Feb 2019, Darrick J. Wong wrote: > > > > On Mon, 11 Feb 2019 15:18:11 +0100 Matej Kupljen wrote: > > > > > > > > > > it seems that when opening file on file system that is mounted on > > > > > tmpfs with the O_TMPFILE flag and using linkat call after that, it > > > > > uses 2 inodes instead of 1. > ... > > > > > > Heh, tmpfs and its weird behavior where each new link counts as a new > > > inode because "each new link needs a new dentry, pinning lowmem, and > > > tmpfs dentries cannot be pruned until they are unlinked." > > > > That's very much a peculiarity of tmpfs, so agreed: it's what I expect > > to be the cause, but I've not actually tracked it through and fixed yet. > ... > > > > > I /think/ the proper fix is to change shmem_link to decrement ifree only > > > if the inode has nonzero nlink, e.g. > > > > > > /* > > > * No ordinary (disk based) filesystem counts links as inodes; > > > * but each new link needs a new dentry, pinning lowmem, and > > > * tmpfs dentries cannot be pruned until they are unlinked. If > > > * we're linking an O_TMPFILE file into the tmpfs we can skip > > > * this because there's still only one link to the inode. > > > */ > > > if (inode->i_nlink > 0) { > > > ret = shmem_reserve_inode(inode->i_sb); > > > if (ret) > > > goto out; > > > } > > > > > > Says me who was crawling around poking at O_TMPFILE behavior all morning. > > > Not sure if that's right; what happens to the old dentry? > > Not sure what you mean by "what happens to the old dentry?" > But certainly the accounting feels a bit like a shell game, > and my attempts to explain it have not satisfied even me. > > The way I'm finding it helpful to think, is to imagine tmpfs' > count of inodes actually to be implemented as a count of dentries. > And the 1 for the last remaining goes away in the shmem_free_inode() > at the end of shmem_evict_inode(). Does that answer "what happens"? > > Since applying the patch, I have verified (watching "dentry" and > "shmem_inode_cache" in /proc/slabinfo) that doing Matej's sequence > repeatedly does not leak any "df -i" nor dentries nor inodes. > > > > > I'm relieved to see your "/think/" above and "Not sure" there :) > > Me too. It is so easy to get these counting things wrong, especially > > when distributed between the generic and the specific file system. > > > > I'm not going to attempt a pronouncement until I've had time to > > sink properly into it at the weekend, when I'll follow your guide > > and work it through - thanks a lot for getting this far, Darrick. > > I have now sunk into it, and sure that I agree with your patch, > filled out below (I happen to have changed "inode->i_nlink > 0" to > "inode->i_nlink" just out of some personal preference at the time). > One can argue that it's not technically quite the right place, but > it is the place where we can detect the condition without getting > into unnecessary further complications, and does the job well enough. > > May I change "Suggested-by: Darrick J. Wong " > to your "Signed-off-by" before sending on to Andrew "From" you? That's fine with me! > Thanks! > Hugh > > [PATCH] tmpfs: fix link accounting when a tmpfile is linked in > > tmpfs has a peculiarity of accounting hard links as if they were separate > inodes: so that when the number of inodes is limited, as it is by default, > a user cannot soak up an unlimited amount of unreclaimable dcache memory > just by repeatedly linking a file. > > But when v3.11 added O_TMPFILE, and the ability to use linkat() on the fd, > we missed accommodating this new case in tmpfs: "df -i" shows that an > extra "inode" remains accounted after the file is unlinked and the fd > closed and the actual inode evicted. If a user repeatedly links tmpfiles > into a tmpfs, the limit will be hit (ENOSPC) even after they are deleted. > > Just skip the extra reservation from shmem_link() in this case: there's > a sense in which this first link of a tmpfile is then cheaper than a > hard link of another file, but the accounting works out, and there's > still good limiting, so no need to do anything more complicated. > > Fixes: f4e0c30c191 ("allow the temp files created by open() to be linked to") > Reported-by: Matej Kupljen > Suggested-by: Darrick J. Wong Or if you prefer: Reviewed-by: Darrick J. Wong --D > Signed-off-by: Hugh Dickins > --- > > mm/shmem.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > --- 5.0-rc7/mm/shmem.c 2019-01-06 19:15:45.764805103 -0800 > +++ linux/mm/shmem.c 2019-02-18 13:56:48.388032606 -0800 > @@ -2854,10 +2854,14 @@ static int shmem_link(struct dentry *old > * No ordinary (disk based) filesystem counts links as inodes; > * but each new link needs a new dentry, pinning lowmem, and > * tmpfs dentries cannot be pruned until they are unlinked. > + * But if an O_TMPFILE file is linked into the tmpfs, the > + * first link must skip that, to get the accounting right. > */ > - ret = shmem_reserve_inode(inode->i_sb); > - if (ret) > - goto out; > + if (inode->i_nlink) { > + ret = shmem_reserve_inode(inode->i_sb); > + if (ret) > + goto out; > + } > > dir->i_size += BOGO_DIRENT_SIZE; > inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);