Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp139315rdb; Sun, 28 Jan 2024 18:09:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IFiM8sNsMAf4kOZUpUOK9HL2XoG50qZUWZ92H0vWNRHR30Wina9HfKzFoKstbjQAkD4e8MG X-Received: by 2002:a17:902:ee04:b0:1d8:a6ac:14be with SMTP id z4-20020a170902ee0400b001d8a6ac14bemr4666176plb.67.1706494194835; Sun, 28 Jan 2024 18:09:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706494194; cv=pass; d=google.com; s=arc-20160816; b=bac7zD6pU2yEOWPULABL2Ukx6yrnmRX08G54IohZL4qKceSo7LJuWi1L+dlhoQo4J1 BqlW56RzqBGQnm1MoffvtvdHr2Y0lXDCM4IjeSkXIyzy+jQ4T+5yiBWTMsvcAo0FVtkf mxP/FBIXaZBWdiq9rACXkSS5pbkXlxVSsjKqHRmrGY3RSnKOpnYvuidWukiNuy/6gwMY 8s7K9h+WVzLFIJbGj9pRifFuXDSdBFFtodZ+aLAsPwIb3JcC9VTFII/Yopf6TJ90oWv9 x3LI9dgsOeFYOdaiopQU5G3kOEcBDJ/spdtErH0hfCQXdCmBeZpDWnkQHNbsu2TzXVH7 p4Gg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date; bh=+/+7pjOGRmQsoj++8FkNuNSq6eIOazNgL8l3M1PfMNw=; fh=+NHEOwOI8sgHa/emZ3nIKC4IhKoRFLn6O7wLrfs7tlM=; b=VDYbSmlvmUeJ78ufKspHrfC5vCC5I5X67Q87vYh6smPoiBjxV0Cc6Xg5kuTpHaPeQr DWuCwr0OPgs5ma+qqmAakyTytOf6KzC+80j8+F16JnNOm0QxWmhsXmZ/bBnpVyYfFA5X ANEN3Q3zINziUP8Jj5WA9YX+HX1Z188Iych22TiBYyFdJ+pFq1R/XIKefI31K7n9fxED rOe9UF2yaLNZXU6tZLAwZOp0qSidAm+LIZQFmuRBLOYWwkmYSVfcHqitb6Q+kpvUatQ0 m5t/a7e1pFc0nNtF79gCwIE3mZ//zyUxWFui572i9tlxlLGcoMuFoY6svbifCGKmQn+Q ZZhw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-42078-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42078-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id y7-20020a1709027c8700b001d50ecf8686si4732424pll.520.2024.01.28.18.09.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jan 2024 18:09:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-42078-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-42078-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42078-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 7D2B828173D for ; Mon, 29 Jan 2024 02:09:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6DA12EED7; Mon, 29 Jan 2024 02:09:41 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A85DADF44; Mon, 29 Jan 2024 02:09:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706494180; cv=none; b=Edj/qNs5xPwBE91yBypj7Eu6sU4m1ByiGy31P5l8HikBsnb0cuwpJCm9LTUuagmQtuUiZgzXTtKhkqiaonwHi1/d/nUFb8c6LMRxZ3yqzzy8l/3wZUVo1eDVcFJqQbgkbTDTRX7Sp0GH83ZDFJ+zd3Atb7+FVr5w5yztsyzZ9ek= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706494180; c=relaxed/simple; bh=ZI5bfeMiWLdc2ODGRFM8dJyUMH39PhBXqP6SHy8L1Jc=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=CDFeYx47GtOG3l5kbo0nP3dypI4ofqm2kqmKdhYyqprOmFqwgxEqetGLB2lod2l46mxIEhcb/btDQjiyl0JB6gsoIRafwcjan0XzfAKNuEqJLktTFq7BuONYkUC2lOpVZ00p9WWp6DarEHBvJBOSe9fA15YEjHecAVZBzcLdVZE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7607FC433C7; Mon, 29 Jan 2024 02:09:39 +0000 (UTC) Date: Sun, 28 Jan 2024 21:09:38 -0500 From: Steven Rostedt To: Linus Torvalds Cc: Masami Hiramatsu , Mathieu Desnoyers , LKML , Linux Trace Devel , Christian Brauner , Ajay Kaher , Geert Uytterhoeven , linux-fsdevel Subject: Re: [PATCH] eventfs: Have inodes have unique inode numbers Message-ID: <20240128210938.436fc3b4@rorschach.local.home> In-Reply-To: References: <20240126150209.367ff402@gandalf.local.home> <20240126162626.31d90da9@gandalf.local.home> <20240128175111.69f8b973@rorschach.local.home> <20240128185943.6920388b@rorschach.local.home> <20240128192108.6875ecf4@rorschach.local.home> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 28 Jan 2024 17:00:08 -0800 Linus Torvalds wrote: > On Sun, 28 Jan 2024 at 16:21, Steven Rostedt wrote: > > > > > > > > Wouldn't it be bad if the dentry hung around after the rmdir. You don't > > > want to be able to access files after rmdir has finished. > > Steven, I already told you that that is NORMAL. > > This is how UNIX filesystems work. Try this: > > mkdir dummy > cd dummy > echo "Hello" > hello > ( sleep 10; cat ) < hello & Running strace on the above we have: openat(AT_FDCWD, "hello", O_RDONLY) = 3 dup2(3, 0) = 0 close(3) = 0 newfstatat(AT_FDCWD, "/usr/local/sbin/sleep", 0x7ffee0e44a60, 0) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/bin/sleep", 0x7ffee0e44a60, 0) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/sbin/sleep", 0x7ffee0e44a60, 0) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/bin/sleep", {st_mode=S_IFREG|0755, st_size=43888, ...}, 0) = 0 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], NULL, 8) = 0 So the file is ***opened*** and gets a referenced. YES I DEAL WITH THIS!!! This works fine! I have no problems with this. > rm hello > cd .. > rmdir dummy > > and guess what? It will print "hello" after that file has been > removed, and the whole directory is gone. > > YOU NEED TO DEAL WITH THIS. And I do very well THANK YOU! But if this does not call that simple_recursive_removal() the dentry *STAYS AROUND* and things CAN OPEN IT AFTER IT HAS BEEN REMOVED! That's equivalent to doing an ls on a directory you just deleted with rmdir and you still see the files. Note, eventfs has no call to rmdir here. It's a virtual file system. The directories disappear without user accessing the directory. Same for /proc. The directories for pids come and go when processes fork and exit. You don't want someone to be able to access /proc/1234 *after* the task 1234 exited and was cleaned up by its parent. Do you? And I'm not talking about if something opened the files in /proc/1234 before the task disappeared. That is dealt with. Just like if a file in eventfs is opened and the backing data is to be deleted. It either prevents the deletion, or in some cases it uses ref counts to know that something still has one of its files open. And it won't delete the data until everything has closed it. But after a file or directory has been deleted, NO file system allows it to be opened again. This isn't about something opening a file in eventfs and getting a reference to it, and then the file or directory is being deleted. That's covered. I'm talking about the directory being deleted and then allowing something to open a file within it AFTER the deletion has occurred. If a dentry is still around, THAT CAN HAPPEN! With a dentry still around with nothing accessing it, and you remove the data it represents, if you don't make that dentry invisible to user space, it can be opened AFTER it has been deleted. Without calling d_invalidate (which calls shrink_dcache_parent) on the dentry, it is still visible. Even with a ref count of zero and nothing has it opened. That means you can open that file again AFTER it has been deleted. The vfs_rmdir() calls shrink_dcache_parent() that looks to prune the dcache to make it not visible any more. But vfs_rmdir isn't ever called for eventfs. procfs calls d_invalidate which removes the dentry from being visible to the file system. I *use* to do that too until Al Viro suggested that I use the simple_recursive_removal() call that does all that for me. > > > And thinking about this more, this is one thing that is different with > > eventfs than a normal file system. The rmdir in most cases where > > directories are deleted in eventfs will fail if there's any open files > > within it. > > No. > > Stop thinking that eventfs is special. It's not. It's not special with respect to other virtual file systems, but virtual file systems *are* special compared to regular file systems. Why? Because regular file systems go through the VFS layer for pretty much *all* interactions with them. Virtual file systems interact with the kernel without going through VFS layer. In normal file systems, to remove a directory you have to go through rmdir which does all the nice things your are telling me about. But virtual file systems directories (except for tmpfs) have their directories removed by other means. The VFS layer *has no idea* that a directory is removed. With eventfs calling that simple_recursive_removal() tells the VFS layer this directory is being deleted, just as if someone called rmdir(). If I don't call that function VFS will think the directory is still around and be happy to allow users to open files in it AFTER the directory has been deleted. Your example above does not do what I'm talking about here. It shows something OPENING a file and then deleting the directory. Yes, if you have an opened reference to something and it gets deleted, you still have access to that reference. But you should not be able to get a new reference to something after it has been deleted. > > You need to deal with the realities of having made a filesystem. And > one of those realities is that you don't control the dentries, and you > can't randomly cache dentry state and then do things behind the VFS > layer's back. I'm not. I'm trying to let VFS know a directory is deleted. Because when you delete a kprobe, the directory that has the control files for that kprobe (like enabling it) go away too. I have to let VFS know that the directory is deleted, just like procfs has to tell it when a directory for a process id is no more. You don't kill tasks with: rmdir /proc/1234 And you don't delete kprobes with: rmdir events/kprobe/sched > > So remove that broken function. Really. You did a filesystem, and > that means that you had better play by the VFS rules. > > End of discussion. And I do it just like debugfs when it deletes files outside of VFS or procfs, and pretty much most virtual file systems. > > Now, you can then make your own "read" and "lookup" etc functions say > "if the backing store data has been marked dead, I'll not do this". > That's *YOUR* data structures, and that's your choice. > > But you need to STOP thinking you can play games with dentries. And > you need to stop making up BS arguments for why you should be able > to. > > So if you are thinking of a "Here's how to do a virtual filesystem" > talk, I would suggest you start with one single word: "Don't". > > I'm convinced that we have made it much too easy to do a half-arsed > virtual filesystem. And eventfs is way beyond half-arsed. > > It's now gone from half-arsed to "I told you how to do this right, and > you are still arguing". That makes it full-arsed. > > So just stop the arsing around, and just get rid of those _broken_ dentry games. Sorry, but you didn't prove your point. The example you gave me is already covered. Please tell me when a kprobe goes away, how do I let VFS know? Currently the tracing code (like kprobes and synthetic events) calls eventfs_remove_dir() with just a reference to that ei eventfs_inode structure. I currently use the ei->dentry to tell VFS "this directory is being deleted". What other means do I have to accomplish the same thing? -- Steve