Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp98811rdb; Sun, 28 Jan 2024 16:00:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IF0Hdzx2NvsGR55BZjo1YdclTDb2TZ7hRFCYelibLYprZ+maNs8f/lsjzDrzE4EnHN2IJY5 X-Received: by 2002:a05:6a20:3f16:b0:19b:5666:b9c5 with SMTP id az22-20020a056a203f1600b0019b5666b9c5mr3609857pzb.73.1706486432801; Sun, 28 Jan 2024 16:00:32 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706486432; cv=pass; d=google.com; s=arc-20160816; b=OPRPzkvpqavNFBbjY79o72VK/4GNwQvUpRdCpH7yK8KYJBEknEYZi2YbIeWqNjg+Tx lxK3sq5uf/d8Wkx41+DkL4prn1t01kGo+Xp4h+DpIytwpk9uzD6sQqX+9itQgoJs3989 ViiDHBuNtj4Mcgoy7vo2mcJ8EVNpEHCxuEkU6/nhhL8wZs68u2pVS5eSgfyzSYKL6NNM b42ZKu34FeMx70tJqqHNQbToHLP8Ro1SRBPAfpRXAnIPu7+EU/9Xxu/mtObPlIP+j01H UhiGLViOvNthHj/J1J9S42l4t3GLJhdvlbw0P8KpB41Ozwwk8MhiH0RZHMopP/fEDz5P wCGg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date; bh=A8eMkqBNTOfdWJ+VOe4CGBwDfXWcEyfiVYseqHaTIy4=; fh=+NHEOwOI8sgHa/emZ3nIKC4IhKoRFLn6O7wLrfs7tlM=; b=arLUdDz7U9F4wItH3UVTtRyNvM765p2n/2MALPdmHZtDXi15Iv+dQ667Cg/h+K7d+w LQf1gXKbkzRN8fB7knZxvXFvIXdgtkD5ehv9+DPtqnBUjkEOm4djo2SdoDyQmQlg+k6j zd8ZQkoT9tcnIkLxv1wk0pEykyDYCxH+FcHjCQ6QksshQ+u+8VzoyqOb43TMNdM40uwK T8yvNtFwkOwfmHKjZe+4XgzPhwTtBpRrYiUwSfyYgeXYo/+XsfpDY7A3l+1XeLaxYMOD ZNQks17ewhB66iCdDLorIvVoFZw4I7SUzI36JYiBVg26dv/hVrQGuhB4gNJMCe5CfgMg dnwA== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-42024-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42024-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id m8-20020a6562c8000000b005d8b519e1afsi3042598pgv.804.2024.01.28.16.00.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jan 2024 16:00:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-42024-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-42024-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42024-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 469F1285AD6 for ; Mon, 29 Jan 2024 00:00:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 968734597D; Sun, 28 Jan 2024 23:59:54 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDD0F45955; Sun, 28 Jan 2024 23:59:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706486394; cv=none; b=dOvbRRBxK4K4U3vyc64IgUs6VVFZsc689YNlqc5cyXoGi3HlnycoBT8IHXaA1aL2XJZ4tCEQlore8AVa1tLpnJUYzTLus7UjshxdqpwHNz1vSpSkCTwxR5/hh2AwjDM7XmUUU4sGvGfrXTkec5iMHEx3pzCX3VKHGjFtAGnqhjM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706486394; c=relaxed/simple; bh=up86wC8bCRQUbONGrMWxwGT7s8fe8Aqj1rj4oaNkE10=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=l6BUGO7X25Jkt34e1T9PPn1cXLq9HHWdZ3vwZxAjVFMY4Odk3SnleenqTg7qMLLCVG8jkrQ0iLwQO7F7QGj7w1WVPGlfoqIU+86Ez7wj/d5IfMHdOf8r5SQ64U/MSmdsNJbKTgpi35ivZ19gCcX3xROz+DhPWPR86DI+ITzCikU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9228AC433C7; Sun, 28 Jan 2024 23:59:51 +0000 (UTC) Date: Sun, 28 Jan 2024 18:59:43 -0500 From: Steven Rostedt To: Linus Torvalds Cc: Masami Hiramatsu , Mathieu Desnoyers , LKML , Linux Trace Devel , Christian Brauner , Ajay Kaher , Geert Uytterhoeven , linux-fsdevel Subject: Re: [PATCH] eventfs: Have inodes have unique inode numbers Message-ID: <20240128185943.6920388b@rorschach.local.home> In-Reply-To: References: <20240126150209.367ff402@gandalf.local.home> <20240126162626.31d90da9@gandalf.local.home> <20240128175111.69f8b973@rorschach.local.home> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 28 Jan 2024 15:24:09 -0800 Linus Torvalds wrote: > On Sun, 28 Jan 2024 at 14:51, Steven Rostedt wrote: > > > > I was working on getting rid of ei->dentry, but then I hit: > > > > void eventfs_remove_dir(struct eventfs_inode *ei) > > { [..] > > > > Where it deletes the all the existing dentries in a tree. Is this a > > valid place to keep ei->dentry? > > No, when you remove the directory, just leave the child dentries > alone. Remember: they are purely caches, and you either have My fear is that they can be accessed again when there's no file around. Thus, the dentry and the inode are created when accessed with the fops loaded that calls into the tracing code. For example, when some creates a kprobe: # cd /sys/kernel/tracing # echo 'p:sched schedule' > kprobe_events # ls events/kprobes/sched/enable enable Now I do; # echo > kprobe_events Currently if I do: # ls events/kprobes/sched/enable ls: cannot access 'events/kprobes/sched/enable' But because the dentry doesn't go away immediately, it can cause issues. I rather not force them to be freed every time the dentry goes to zero, as that could cause more overhead in tracing where the tooling already does multiple scans of the eventfs directory for various reasons. If that dentry is still there, and someone does: echo 1 > events/kprobes/sched/enable after the ei was removed, wouldn't it call back into the open callback of the inode represented by that dentry? If that happens, it will call into the kprobe code and reference structures that have already been freed. Note, before adding that simple_recursive_removal() or my open coded version I had earlier, the kprobe files would stick around after the kprobe was freed, and I was able to crash the kernel. > > - somebody is still using it (you can 'rmdir()' a directory that some > other process has as its cwd, for example), which keeps it alive and > active anyway Wouldn't it be bad if the dentry hung around after the rmdir. You don't want to be able to access files after rmdir has finished. > > - when the last user is done, the dcache code will just free the > dentries anyway But it doesn't. That debug code I had before that showed the ei->dentry in show_event_dentries file would show that the dentries exist for some time when their ref count was zero. They only got freed when on drop_caches or memory reclaim. I like the fact that they hang around that we are not constantly allocating them for every reference. > > so there's no reason to remove any of the dentries by hand - and in > fact simple_recursive_removal() never did that anyway for anything > that was still busy. > > For a pure cached set of dentries (that have no users), doing the last > "dput()" on a directory will free that directory dentry, but it will > also automatically free all the unreachable children recursively. But it doesn't free it. It just makes it available to be freed. > > Sure, simple_recursive_removal() does other things (sets inode flags > to S_DEAD, does fsnotify etc), but none of those should actually > matter. > > I think that whole logic is simply left-over from when the dentries > weren't a filesystem cache, but were the *actual* filesystem. So it > actually became actively wrong when you started doing your own backing > store, but it just didn't hurt (except for code legibility). > > Of course, eventfs is slightly odd and special in that this isn't a > normal "rmdir()", so it can happen with files still populated. And > those children will stick around and be useless baggage until they > are shrunk under memory pressure. > > But I don't think it should *semantically* matter, exactly because > they always could stay around anyway due to having users. It does, because things like kprobes will call into tracefs to free its files so that it can free its own resources as the open callback will reference those resources. If those dentries sick around and allow the user to open the file again, it will cause a use after free bug. It does keep track of opens so the kprobe code will not be freed if a task has a reference already. You get an -EBUSY on the removal of kprobes if something has a reference to it. But on deleting, the return from the eventfs code that deleted the directory, is expected that there will be no more opens on the kprobe files. With stale dentries hanging around after the directory is deleted, that is not the case. > > There are some cacheability knobs like > > .d_delete = always_delete_dentry, > > which probably makes sense for any virtual filesystem (it limits > caching of dentries with no more users - normally the dcache will > happily keep caching dentries for the *next* user, but that may or may > not make sense for virtual filesystems) But if it were freed every time there's no more users, then when you do a stat() it will get referenced then freed (as the dentry ref count goes to zero after the stat() is completed). then the open/write/close will create and delete it, and this could be done several times for the same file. If the dentry is freed every time its ref count goes to zero, then it will need to be created for every operation. > > So there is tuning that can be done, but in general, I think you > should actively think of the dcache as just "it's just a cache, leave > stale stuff alone" I would like to keep it as a cache, but deleting every time the ref count goes to zero is not much of a cache. -- Steve