2023-07-13 11:41:34

by Ajay Kaher

[permalink] [raw]
Subject: [PATCH v4 00/10] tracing: introducing eventfs

Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.

Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.

Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.

v3:
Patch 3,4,5,7,9:
removed all the eventfs_rwsem code and replaced it with an srcu
lock for the readers, and a mutex to synchronize the writers of
the list.

Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10

Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
as it really should not be exposed to all users.

Patch 5: added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).

have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).

Patch 6: renamed:
eventfs_create_file() -> create_file()
eventfs_create_dir() -> create_dir()

v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9

v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case

Ajay Kaher (9):
tracefs: Rename some tracefs function
eventfs: Implement eventfs dir creation functions
eventfs: Implement eventfs file add functions
eventfs: Implement eventfs file, directory remove function
eventfs: Implement functions to create eventfs files and directories
eventfs: Implement eventfs lookup, read, open functions
eventfs: Implement tracefs_inode_cache
eventfs: Move tracing/events to eventfs
test: ftrace: Fix kprobe test for eventfs

Steven Rostedt (Google) (1):
tracing: Require all trace events to have a TRACE_SYSTEM

fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 711 ++++++++++++++++++
fs/tracefs/inode.c | 124 ++-
fs/tracefs/internal.h | 25 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 32 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 78 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +-
.../test.d/kprobe/kprobe_args_string.tc | 4 +-
10 files changed, 930 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h

--
2.39.0



2023-07-13 11:41:45

by Ajay Kaher

[permalink] [raw]
Subject: [PATCH v4 08/10] eventfs: Implement tracefs_inode_cache

Creating tracefs_inode_cache which is a cache of tracefs_inode.
Adding helping functions:
tracefs_alloc_inode()
tracefs_free_inode()

Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
---
fs/tracefs/inode.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 7dc692a5fee1..76820d3e97eb 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -21,13 +21,33 @@
#include <linux/parser.h>
#include <linux/magic.h>
#include <linux/slab.h>
+#include "internal.h"

#define TRACEFS_DEFAULT_MODE 0700
+static struct kmem_cache *tracefs_inode_cachep __ro_after_init;

static struct vfsmount *tracefs_mount;
static int tracefs_mount_count;
static bool tracefs_registered;

+static struct inode *tracefs_alloc_inode(struct super_block *sb)
+{
+ struct tracefs_inode *ti;
+
+ ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL);
+ if (!ti)
+ return NULL;
+
+ ti->flags = 0;
+
+ return &ti->vfs_inode;
+}
+
+static void tracefs_free_inode(struct inode *inode)
+{
+ kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
+}
+
static ssize_t default_read_file(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
@@ -346,6 +366,9 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root)
}

static const struct super_operations tracefs_super_operations = {
+ .alloc_inode = tracefs_alloc_inode,
+ .free_inode = tracefs_free_inode,
+ .drop_inode = generic_delete_inode,
.statfs = simple_statfs,
.remount_fs = tracefs_remount,
.show_options = tracefs_show_options,
@@ -675,10 +698,26 @@ bool tracefs_initialized(void)
return tracefs_registered;
}

+static void init_once(void *foo)
+{
+ struct tracefs_inode *ti = (struct tracefs_inode *) foo;
+
+ inode_init_once(&ti->vfs_inode);
+}
+
static int __init tracefs_init(void)
{
int retval;

+ tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache",
+ sizeof(struct tracefs_inode),
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD|
+ SLAB_ACCOUNT),
+ init_once);
+ if (!tracefs_inode_cachep)
+ return -ENOMEM;
+
retval = sysfs_create_mount_point(kernel_kobj, "tracing");
if (retval)
return -EINVAL;
--
2.39.0


2023-07-13 11:42:10

by Ajay Kaher

[permalink] [raw]
Subject: [PATCH v4 05/10] eventfs: Implement eventfs file, directory remove function

Adding eventfs_remove(), this function will recursively remove
dir or file info from eventfs.

added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).

have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).

Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
---
fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++
include/linux/tracefs.h | 4 ++
2 files changed, 114 insertions(+)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 75dc8953d813..322a77be5a56 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -45,6 +45,7 @@ struct eventfs_file {
};

static DEFINE_MUTEX(eventfs_mutex);
+DEFINE_STATIC_SRCU(eventfs_srcu);

static const struct file_operations eventfs_file_operations = {
};
@@ -299,3 +300,112 @@ int eventfs_add_file(const char *name, umode_t mode,
mutex_unlock(&eventfs_mutex);
return 0;
}
+
+static LLIST_HEAD(free_list);
+
+static void eventfs_workfn(struct work_struct *work)
+{
+ struct eventfs_file *ef, *tmp;
+ struct llist_node *llnode;
+
+ llnode = llist_del_all(&free_list);
+ llist_for_each_entry_safe(ef, tmp, llnode, llist) {
+ if (ef->created && ef->dentry)
+ dput(ef->dentry);
+ kfree(ef->name);
+ kfree(ef->ei);
+ kfree(ef);
+ }
+}
+
+DECLARE_WORK(eventfs_work, eventfs_workfn);
+
+static void free_ef(struct rcu_head *head)
+{
+ struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
+
+ if (!llist_add(&ef->llist, &free_list))
+ return;
+
+ queue_work(system_unbound_wq, &eventfs_work);
+}
+
+
+
+/**
+ * eventfs_remove_rec - remove eventfs dir or file from list
+ * @ef: eventfs_file to be removed.
+ *
+ * This function recursively remove eventfs_file which
+ * contains info of file or dir.
+ */
+static void eventfs_remove_rec(struct eventfs_file *ef, int level)
+{
+ struct eventfs_file *ef_child;
+
+ if (!ef)
+ return;
+ /*
+ * Check recursion depth. It should never be greater than 3:
+ * 0 - events/
+ * 1 - events/group/
+ * 2 - events/group/event/
+ * 3 - events/group/event/file
+ */
+ if (WARN_ON_ONCE(level > 3))
+ return;
+
+ if (ef->ei) {
+ /* search for nested folders or files */
+ list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
+ lockdep_is_held(&eventfs_mutex)) {
+ eventfs_remove_rec(ef_child, level + 1);
+ }
+ }
+
+ if (ef->created && ef->dentry)
+ d_invalidate(ef->dentry);
+
+ list_del_rcu(&ef->list);
+ call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
+}
+
+/**
+ * eventfs_remove - remove eventfs dir or file from list
+ * @ef: eventfs_file to be removed.
+ *
+ * This function acquire the eventfs_mutex lock and call eventfs_remove_rec()
+ */
+void eventfs_remove(struct eventfs_file *ef)
+{
+ if (!ef)
+ return;
+
+ mutex_lock(&eventfs_mutex);
+ eventfs_remove_rec(ef, 0);
+ mutex_unlock(&eventfs_mutex);
+}
+
+/**
+ * eventfs_remove_events_dir - remove eventfs dir or file from list
+ * @dentry: events's dentry to be removed.
+ *
+ * This function remove events main directory
+ */
+void eventfs_remove_events_dir(struct dentry *dentry)
+{
+ struct tracefs_inode *ti;
+ struct eventfs_inode *ei;
+
+ if (!dentry || !dentry->d_inode)
+ return;
+
+ ti = get_tracefs(dentry->d_inode);
+ if (!ti || !(ti->flags & TRACEFS_EVENT_INODE))
+ return;
+
+ ei = ti->private;
+ d_invalidate(dentry);
+ dput(dentry);
+ kfree(ei);
+}
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index a51312ff803c..2c08edd4a739 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -40,6 +40,10 @@ int eventfs_add_top_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fops);

+void eventfs_remove(struct eventfs_file *ef);
+
+void eventfs_remove_events_dir(struct dentry *dentry);
+
struct dentry *tracefs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fops);
--
2.39.0


2023-07-13 11:42:37

by Ajay Kaher

[permalink] [raw]
Subject: [PATCH v4 07/10] eventfs: Implement eventfs lookup, read, open functions

Adding following inode_operations, file_operations and helper functions to
eventfs:
dcache_dir_open_wrapper()
eventfs_root_lookup()
eventfs_release()
eventfs_set_ef_status_free()
eventfs_post_create_dir()

inode_operations, file_operations will be called from vfs.

Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
---
fs/tracefs/event_inode.c | 194 ++++++++++++++++++++++++++++++++++++++-
include/linux/tracefs.h | 2 +
2 files changed, 194 insertions(+), 2 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 34b5d3d8005b..7167340ac29e 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -67,7 +67,7 @@ DEFINE_STATIC_SRCU(eventfs_srcu);
* If tracefs is not enabled in the kernel, the value -%ENODEV will be
* returned.
*/
-struct dentry *create_file(const char *name, umode_t mode,
+static struct dentry *create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fop)
{
@@ -123,7 +123,7 @@ struct dentry *create_file(const char *name, umode_t mode,
* If tracefs is not enabled in the kernel, the value -%ENODEV will be
* returned.
*/
-struct dentry *create_dir(const char *name, umode_t mode,
+static struct dentry *create_dir(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fop,
const struct inode_operations *iop)
@@ -157,10 +157,200 @@ struct dentry *create_dir(const char *name, umode_t mode,
return eventfs_end_creating(dentry);
}

+/**
+ * eventfs_set_ef_status_free - set the ef->status to free
+ * @dentry: dentry who's status to be freed
+ *
+ * eventfs_set_ef_status_free will be called if no more
+ * reference remains
+ */
+void eventfs_set_ef_status_free(struct dentry *dentry)
+{
+ struct tracefs_inode *ti_parent;
+ struct eventfs_file *ef;
+
+ ti_parent = get_tracefs(dentry->d_parent->d_inode);
+ if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
+ return;
+
+ ef = dentry->d_fsdata;
+ if (!ef)
+ return;
+ ef->created = false;
+ ef->dentry = NULL;
+}
+
+/**
+ * eventfs_post_create_dir - post create dir routine
+ * @ef: eventfs_file of recently created dir
+ *
+ * Files with-in eventfs dir should know dentry of parent dir
+ */
+static void eventfs_post_create_dir(struct eventfs_file *ef)
+{
+ struct eventfs_file *ef_child;
+ struct tracefs_inode *ti;
+
+ /* srcu lock already held */
+ /* fill parent-child relation */
+ list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
+ srcu_read_lock_held(&eventfs_srcu)) {
+ ef_child->d_parent = ef->dentry;
+ }
+
+ ti = get_tracefs(ef->dentry->d_inode);
+ ti->private = ef->ei;
+}
+
+/**
+ * eventfs_root_lookup - lookup routine to create file/dir
+ * @dir: directory in which lookup to be done
+ * @dentry: file/dir dentry
+ * @flags:
+ *
+ * Used to create dynamic file/dir with-in @dir, search with-in ei
+ * list, if @dentry found go ahead and create the file/dir
+ */
+
+static struct dentry *eventfs_root_lookup(struct inode *dir,
+ struct dentry *dentry,
+ unsigned int flags)
+{
+ struct tracefs_inode *ti;
+ struct eventfs_inode *ei;
+ struct eventfs_file *ef;
+ struct dentry *ret = NULL;
+ int idx;
+
+ ti = get_tracefs(dir);
+ if (!(ti->flags & TRACEFS_EVENT_INODE))
+ return NULL;
+
+ ei = ti->private;
+ idx = srcu_read_lock(&eventfs_srcu);
+ list_for_each_entry_srcu(ef, &ei->e_top_files, list,
+ srcu_read_lock_held(&eventfs_srcu)) {
+ if (strcmp(ef->name, dentry->d_name.name))
+ continue;
+ ret = simple_lookup(dir, dentry, flags);
+ if (ef->created)
+ continue;
+ mutex_lock(&eventfs_mutex);
+ ef->created = true;
+ if (ef->ei)
+ ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
+ ef->data, ef->fop, ef->iop);
+ else
+ ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
+ ef->data, ef->fop);
+
+ if (IS_ERR_OR_NULL(ef->dentry)) {
+ ef->created = false;
+ mutex_unlock(&eventfs_mutex);
+ } else {
+ if (ef->ei)
+ eventfs_post_create_dir(ef);
+ ef->dentry->d_fsdata = ef;
+ mutex_unlock(&eventfs_mutex);
+ dput(ef->dentry);
+ }
+ break;
+ }
+ srcu_read_unlock(&eventfs_srcu, idx);
+ return ret;
+}
+
+/**
+ * eventfs_release - called to release eventfs file/dir
+ * @inode: inode to be released
+ * @file: file to be released (not used)
+ */
+static int eventfs_release(struct inode *inode, struct file *file)
+{
+ struct tracefs_inode *ti;
+ struct eventfs_inode *ei;
+ struct eventfs_file *ef;
+ int idx;
+
+ ti = get_tracefs(inode);
+ if (!(ti->flags & TRACEFS_EVENT_INODE))
+ return -EINVAL;
+
+ ei = ti->private;
+ idx = srcu_read_lock(&eventfs_srcu);
+ list_for_each_entry_srcu(ef, &ei->e_top_files, list,
+ srcu_read_lock_held(&eventfs_srcu)) {
+ if (ef->created)
+ dput(ef->dentry);
+ }
+ srcu_read_unlock(&eventfs_srcu, idx);
+ return dcache_dir_close(inode, file);
+}
+
+/**
+ * dcache_dir_open_wrapper - eventfs open wrapper
+ * @inode: not used
+ * @file: dir to be opened (to create it's child)
+ *
+ * Used to dynamic create file/dir with-in @file, all the
+ * file/dir will be created. If already created then reference
+ * will be increased
+ */
+static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
+{
+ struct tracefs_inode *ti;
+ struct eventfs_inode *ei;
+ struct eventfs_file *ef;
+ struct dentry *dentry = file_dentry(file);
+ struct inode *f_inode = file_inode(file);
+ int idx;
+
+ ti = get_tracefs(f_inode);
+ if (!(ti->flags & TRACEFS_EVENT_INODE))
+ return -EINVAL;
+
+ ei = ti->private;
+ idx = srcu_read_lock(&eventfs_srcu);
+ list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
+ if (ef->created) {
+ dget(ef->dentry);
+ continue;
+ }
+ mutex_lock(&eventfs_mutex);
+ ef->created = true;
+
+ inode_lock(dentry->d_inode);
+ if (ef->ei)
+ ef->dentry = create_dir(ef->name, ef->mode, dentry,
+ ef->data, ef->fop, ef->iop);
+ else
+ ef->dentry = create_file(ef->name, ef->mode, dentry,
+ ef->data, ef->fop);
+ inode_unlock(dentry->d_inode);
+
+ if (IS_ERR_OR_NULL(ef->dentry)) {
+ ef->created = false;
+ } else {
+ if (ef->ei)
+ eventfs_post_create_dir(ef);
+ ef->dentry->d_fsdata = ef;
+ }
+ mutex_unlock(&eventfs_mutex);
+ }
+ srcu_read_unlock(&eventfs_srcu, idx);
+ return dcache_dir_open(inode, file);
+}
+
static const struct file_operations eventfs_file_operations = {
+ .open = dcache_dir_open_wrapper,
+ .read = generic_read_dir,
+ .iterate_shared = dcache_readdir,
+ .llseek = generic_file_llseek,
+ .release = eventfs_release,
};

static const struct inode_operations eventfs_root_dir_inode_operations = {
+ .lookup = eventfs_root_lookup,
};

/**
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 47c1b4d21735..4d30b0cafc5f 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -51,6 +51,8 @@ void eventfs_remove(struct eventfs_file *ef);

void eventfs_remove_events_dir(struct dentry *dentry);

+void eventfs_set_ef_status_free(struct dentry *dentry);
+
struct dentry *tracefs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fops);
--
2.39.0


2023-07-13 11:42:45

by Ajay Kaher

[permalink] [raw]
Subject: [PATCH v4 03/10] eventfs: Implement eventfs dir creation functions

Adding eventfs_file structure which will hold properties of file or dir.

Adding following functions to add dir in eventfs:

eventfs_create_events_dir() will directly create events dir within
tracing folder.

eventfs_add_subsystem_dir() adds the information of subsystem_dir
to eventfs, and dynamically creates subsystem_dir directories when
they are accessed.

eventfs_add_dir() adds the information of the dir, within a
subsystem_dir, to eventfs and dynamically creates these directories
when they are accessed.

Adding tracefs_inode structure, this will help eventfs to keep track
of inode, flags and pointer to private date.

Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
---
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 217 +++++++++++++++++++++++++++++++++++++++
fs/tracefs/inode.c | 8 +-
fs/tracefs/internal.h | 25 +++++
include/linux/tracefs.h | 11 ++
5 files changed, 258 insertions(+), 4 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h

diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile
index 7c35a282b484..73c56da8e284 100644
--- a/fs/tracefs/Makefile
+++ b/fs/tracefs/Makefile
@@ -1,5 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
tracefs-objs := inode.o
+tracefs-objs += event_inode.o

obj-$(CONFIG_TRACING) += tracefs.o

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
new file mode 100644
index 000000000000..4e7a8eccaa0b
--- /dev/null
+++ b/fs/tracefs/event_inode.c
@@ -0,0 +1,217 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * event_inode.c - part of tracefs, a pseudo file system for activating tracing
+ *
+ * Copyright (C) 2020-22 VMware Inc, author: Steven Rostedt (VMware) <[email protected]>
+ * Copyright (C) 2020-22 VMware Inc, author: Ajay Kaher <[email protected]>
+ *
+ * eventfs is used to show trace events with one set of dentries
+ *
+ * eventfs stores meta-data of files/dirs and skip to create object of
+ * inodes/dentries. As and when requires, eventfs will create the
+ * inodes/dentries for only required files/directories. Also eventfs
+ * would delete the inodes/dentries once no more requires but preserve
+ * the meta data.
+ */
+#include <linux/fsnotify.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/workqueue.h>
+#include <linux/security.h>
+#include <linux/tracefs.h>
+#include <linux/kref.h>
+#include <linux/delay.h>
+#include "internal.h"
+
+struct eventfs_inode {
+ struct list_head e_top_files;
+};
+
+struct eventfs_file {
+ const char *name;
+ struct dentry *d_parent;
+ struct dentry *dentry;
+ struct list_head list;
+ struct eventfs_inode *ei;
+ const struct file_operations *fop;
+ const struct inode_operations *iop;
+ union {
+ struct rcu_head rcu;
+ struct llist_node llist; /* For freeing after RCU */
+ };
+ void *data;
+ umode_t mode;
+ bool created;
+};
+
+static DEFINE_MUTEX(eventfs_mutex);
+
+static const struct file_operations eventfs_file_operations = {
+};
+
+static const struct inode_operations eventfs_root_dir_inode_operations = {
+};
+
+/**
+ * eventfs_prepare_ef - helper function to prepare eventfs_file
+ * @name: the name of the file/directory to create.
+ * @mode: the permission that the file should have.
+ * @fop: struct file_operations that should be used for this file/directory.
+ * @iop: struct inode_operations that should be used for this file/directory.
+ * @data: something that the caller will want to get to later on. The
+ * inode.i_private pointer will point to this value on the open() call.
+ *
+ * This function allocates and fills the eventfs_file structure.
+ */
+static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode,
+ const struct file_operations *fop,
+ const struct inode_operations *iop,
+ void *data)
+{
+ struct eventfs_file *ef;
+
+ ef = kzalloc(sizeof(*ef), GFP_KERNEL);
+ if (!ef)
+ return ERR_PTR(-ENOMEM);
+
+ ef->name = kstrdup(name, GFP_KERNEL);
+ if (!ef->name) {
+ kfree(ef);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ if (S_ISDIR(mode)) {
+ ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL);
+ if (!ef->ei) {
+ kfree(ef->name);
+ kfree(ef);
+ return ERR_PTR(-ENOMEM);
+ }
+ INIT_LIST_HEAD(&ef->ei->e_top_files);
+ } else {
+ ef->ei = NULL;
+ }
+
+ ef->iop = iop;
+ ef->fop = fop;
+ ef->mode = mode;
+ ef->data = data;
+ return ef;
+}
+
+/**
+ * eventfs_create_events_dir - create the trace event structure
+ * @name: the name of the directory to create.
+ * @parent: parent dentry for this file. This should be a directory dentry
+ * if set. If this parameter is NULL, then the directory will be
+ * created in the root of the tracefs filesystem.
+ *
+ * This function creates the top of the trace event directory.
+ */
+struct dentry *eventfs_create_events_dir(const char *name,
+ struct dentry *parent)
+{
+ struct dentry *dentry = tracefs_start_creating(name, parent);
+ struct eventfs_inode *ei;
+ struct tracefs_inode *ti;
+ struct inode *inode;
+
+ if (IS_ERR(dentry))
+ return dentry;
+
+ ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+ if (!ei)
+ return ERR_PTR(-ENOMEM);
+ inode = tracefs_get_inode(dentry->d_sb);
+ if (unlikely(!inode)) {
+ kfree(ei);
+ tracefs_failed_creating(dentry);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ INIT_LIST_HEAD(&ei->e_top_files);
+
+ ti = get_tracefs(inode);
+ ti->flags |= TRACEFS_EVENT_INODE;
+ ti->private = ei;
+
+ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
+ inode->i_op = &eventfs_root_dir_inode_operations;
+ inode->i_fop = &eventfs_file_operations;
+
+ /* directory inodes start off with i_nlink == 2 (for "." entry) */
+ inc_nlink(inode);
+ d_instantiate(dentry, inode);
+ inc_nlink(dentry->d_parent->d_inode);
+ fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
+ return tracefs_end_creating(dentry);
+}
+
+/**
+ * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later
+ * @name: the name of the file to create.
+ * @parent: parent dentry for this dir.
+ *
+ * This function adds eventfs subsystem dir to list.
+ * And all these dirs are created on the fly when they are looked up,
+ * and the dentry and inodes will be removed when they are done.
+ */
+struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
+ struct dentry *parent)
+{
+ struct tracefs_inode *ti_parent;
+ struct eventfs_inode *ei_parent;
+ struct eventfs_file *ef;
+
+ if (!parent)
+ return ERR_PTR(-EINVAL);
+
+ ti_parent = get_tracefs(parent->d_inode);
+ ei_parent = ti_parent->private;
+
+ ef = eventfs_prepare_ef(name,
+ S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
+ &eventfs_file_operations,
+ &eventfs_root_dir_inode_operations, NULL);
+
+ if (IS_ERR(ef))
+ return ef;
+
+ mutex_lock(&eventfs_mutex);
+ list_add_tail(&ef->list, &ei_parent->e_top_files);
+ ef->d_parent = parent;
+ mutex_unlock(&eventfs_mutex);
+ return ef;
+}
+
+/**
+ * eventfs_add_dir - add eventfs dir to list to create later
+ * @name: the name of the file to create.
+ * @ef_parent: parent eventfs_file for this dir.
+ *
+ * This function adds eventfs dir to list.
+ * And all these dirs are created on the fly when they are looked up,
+ * and the dentry and inodes will be removed when they are done.
+ */
+struct eventfs_file *eventfs_add_dir(const char *name,
+ struct eventfs_file *ef_parent)
+{
+ struct eventfs_file *ef;
+
+ if (!ef_parent)
+ return ERR_PTR(-EINVAL);
+
+ ef = eventfs_prepare_ef(name,
+ S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
+ &eventfs_file_operations,
+ &eventfs_root_dir_inode_operations, NULL);
+
+ if (IS_ERR(ef))
+ return ef;
+
+ mutex_lock(&eventfs_mutex);
+ list_add_tail(&ef->list, &ef_parent->ei->e_top_files);
+ ef->d_parent = ef_parent->dentry;
+ mutex_unlock(&eventfs_mutex);
+ return ef;
+}
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index b0348efc0238..7ef3a02766f5 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -127,7 +127,7 @@ static const struct inode_operations tracefs_dir_inode_operations = {
.rmdir = tracefs_syscall_rmdir,
};

-static struct inode *tracefs_get_inode(struct super_block *sb)
+struct inode *tracefs_get_inode(struct super_block *sb)
{
struct inode *inode = new_inode(sb);
if (inode) {
@@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = {
};
MODULE_ALIAS_FS("tracefs");

-static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
+struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
{
struct dentry *dentry;
int error;
@@ -437,7 +437,7 @@ static struct dentry *tracefs_start_creating(const char *name, struct dentry *pa
return dentry;
}

-static struct dentry *tracefs_failed_creating(struct dentry *dentry)
+struct dentry *tracefs_failed_creating(struct dentry *dentry)
{
inode_unlock(d_inode(dentry->d_parent));
dput(dentry);
@@ -445,7 +445,7 @@ static struct dentry *tracefs_failed_creating(struct dentry *dentry)
return NULL;
}

-static struct dentry *tracefs_end_creating(struct dentry *dentry)
+struct dentry *tracefs_end_creating(struct dentry *dentry)
{
inode_unlock(d_inode(dentry->d_parent));
return dentry;
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
new file mode 100644
index 000000000000..c443a0c32a8c
--- /dev/null
+++ b/fs/tracefs/internal.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _TRACEFS_INTERNAL_H
+#define _TRACEFS_INTERNAL_H
+
+enum {
+ TRACEFS_EVENT_INODE = BIT(1),
+};
+
+struct tracefs_inode {
+ unsigned long flags;
+ void *private;
+ struct inode vfs_inode;
+};
+
+static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
+{
+ return container_of(inode, struct tracefs_inode, vfs_inode);
+}
+
+struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
+struct dentry *tracefs_end_creating(struct dentry *dentry);
+struct dentry *tracefs_failed_creating(struct dentry *dentry);
+struct inode *tracefs_get_inode(struct super_block *sb);
+
+#endif /* _TRACEFS_INTERNAL_H */
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 99912445974c..432e5e6f7901 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -21,6 +21,17 @@ struct file_operations;

#ifdef CONFIG_TRACING

+struct eventfs_file;
+
+struct dentry *eventfs_create_events_dir(const char *name,
+ struct dentry *parent);
+
+struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
+ struct dentry *parent);
+
+struct eventfs_file *eventfs_add_dir(const char *name,
+ struct eventfs_file *ef_parent);
+
struct dentry *tracefs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fops);
--
2.39.0


2023-07-14 15:57:24

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 03/10] eventfs: Implement eventfs dir creation functions


Some more nits.

On Thu, 13 Jul 2023 17:03:17 +0530
Ajay Kaher <[email protected]> wrote:

> Adding eventfs_file structure which will hold properties of file or dir.

Add eventfs_file structure which will hold the properties of the eventfs
files and directories.

>
> Adding following functions to add dir in eventfs:

Add following functions to create the directories in eventfs:

>
> eventfs_create_events_dir() will directly create events dir within
> tracing folder.

eventfs_create_events_dir() will create the top level "events" directory
within the tracefs file system.

>
> eventfs_add_subsystem_dir() adds the information of subsystem_dir
> to eventfs, and dynamically creates subsystem_dir directories when
> they are accessed.

eventfs_add_subsystem_dir() creates an eventfs_file descriptor with the
given name of the subsystem.

>
> eventfs_add_dir() adds the information of the dir, within a
> subsystem_dir, to eventfs and dynamically creates these directories
> when they are accessed.

eventfs_add_dir() creates an eventfs_file descriptor with the given name of
the directory and attached to a eventfs_file of a subsystem.

>
> Adding tracefs_inode structure, this will help eventfs to keep track
> of inode, flags and pointer to private date.

Add tracefs_inode structure to hold the inodes, flags and pointers to
private data used by eventfs.

>
> Signed-off-by: Ajay Kaher <[email protected]>
> Co-developed-by: Steven Rostedt (VMware) <[email protected]>
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> Tested-by: Ching-lin Yu <[email protected]>
> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-lkp/[email protected]
> ---
> fs/tracefs/Makefile | 1 +
> fs/tracefs/event_inode.c | 217 +++++++++++++++++++++++++++++++++++++++
> fs/tracefs/inode.c | 8 +-
> fs/tracefs/internal.h | 25 +++++
> include/linux/tracefs.h | 11 ++
> 5 files changed, 258 insertions(+), 4 deletions(-)
> create mode 100644 fs/tracefs/event_inode.c
> create mode 100644 fs/tracefs/internal.h
>
> diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile
> index 7c35a282b484..73c56da8e284 100644
> --- a/fs/tracefs/Makefile
> +++ b/fs/tracefs/Makefile
> @@ -1,5 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> tracefs-objs := inode.o
> +tracefs-objs += event_inode.o
>
> obj-$(CONFIG_TRACING) += tracefs.o
>
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> new file mode 100644
> index 000000000000..4e7a8eccaa0b
> --- /dev/null
> +++ b/fs/tracefs/event_inode.c
> @@ -0,0 +1,217 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * event_inode.c - part of tracefs, a pseudo file system for activating tracing
> + *
> + * Copyright (C) 2020-22 VMware Inc, author: Steven Rostedt (VMware) <[email protected]>
> + * Copyright (C) 2020-22 VMware Inc, author: Ajay Kaher <[email protected]>
> + *
> + * eventfs is used to show trace events with one set of dentries

eventfs is used to dynamically create inodes and dentries based on the
meta data provided by the tracing system.


> + *
> + * eventfs stores meta-data of files/dirs and skip to create object of
> + * inodes/dentries. As and when requires, eventfs will create the
> + * inodes/dentries for only required files/directories. Also eventfs
> + * would delete the inodes/dentries once no more requires but preserve
> + * the meta data.

eventfs stores the meta-data of files/dirs and holds off on creating
inodes/dentries of the files. When accessed, the eventfs will create the
inodes/dentries in a just-in-time (JIT) manner. The eventfs will clean up
and delete the inodes/dentries when they are no longer referenced.


> + */
> +#include <linux/fsnotify.h>
> +#include <linux/fs.h>
> +#include <linux/namei.h>
> +#include <linux/workqueue.h>
> +#include <linux/security.h>
> +#include <linux/tracefs.h>
> +#include <linux/kref.h>
> +#include <linux/delay.h>
> +#include "internal.h"
> +
> +struct eventfs_inode {
> + struct list_head e_top_files;
> +};
> +

We probably want to document the below structure, and only add
fields as they are added by the patches.

> +struct eventfs_file {
> + const char *name;
> + struct dentry *d_parent;
> + struct dentry *dentry;

dentry is never assigned, although at the end in eventfs_add_dir() we have:

ef->d_parent = ef_parent->dentry;

Both the above assignment and entry in the structure should be removed from
this patch. It makes it easier to review. Because that assignment is
meaningless. When the above line is added in later patches, it should have
meaning.

When breaking up patches, you don't just want to cut and paste in blocks of
functions. You want to build the functionality, so the added code makes
sense.

> + struct list_head list;
> + struct eventfs_inode *ei;
> + const struct file_operations *fop;
> + const struct inode_operations *iop;

> + union {
> + struct rcu_head rcu;
> + struct llist_node llist; /* For freeing after RCU */
> + };

The above union should be added when the clean up code is added.

> + void *data;
> + umode_t mode;

> + bool created;

The "created" field should be added when its use case is added.

> +};
> +
> +static DEFINE_MUTEX(eventfs_mutex);
> +
> +static const struct file_operations eventfs_file_operations = {
> +};
> +
> +static const struct inode_operations eventfs_root_dir_inode_operations = {
> +};
> +
> +/**
> + * eventfs_prepare_ef - helper function to prepare eventfs_file
> + * @name: the name of the file/directory to create.
> + * @mode: the permission that the file should have.
> + * @fop: struct file_operations that should be used for this file/directory.
> + * @iop: struct inode_operations that should be used for this file/directory.
> + * @data: something that the caller will want to get to later on. The
> + * inode.i_private pointer will point to this value on the open() call.
> + *
> + * This function allocates and fills the eventfs_file structure.
> + */
> +static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode,
> + const struct file_operations *fop,
> + const struct inode_operations *iop,
> + void *data)
> +{
> + struct eventfs_file *ef;
> +
> + ef = kzalloc(sizeof(*ef), GFP_KERNEL);
> + if (!ef)
> + return ERR_PTR(-ENOMEM);
> +
> + ef->name = kstrdup(name, GFP_KERNEL);
> + if (!ef->name) {
> + kfree(ef);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + if (S_ISDIR(mode)) {
> + ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL);
> + if (!ef->ei) {
> + kfree(ef->name);
> + kfree(ef);
> + return ERR_PTR(-ENOMEM);
> + }
> + INIT_LIST_HEAD(&ef->ei->e_top_files);
> + } else {
> + ef->ei = NULL;
> + }
> +
> + ef->iop = iop;
> + ef->fop = fop;
> + ef->mode = mode;
> + ef->data = data;
> + return ef;
> +}
> +
> +/**
> + * eventfs_create_events_dir - create the trace event structure
> + * @name: the name of the directory to create.
> + * @parent: parent dentry for this file. This should be a directory dentry
> + * if set. If this parameter is NULL, then the directory will be
> + * created in the root of the tracefs filesystem.
> + *
> + * This function creates the top of the trace event directory.
> + */
> +struct dentry *eventfs_create_events_dir(const char *name,
> + struct dentry *parent)
> +{
> + struct dentry *dentry = tracefs_start_creating(name, parent);
> + struct eventfs_inode *ei;
> + struct tracefs_inode *ti;
> + struct inode *inode;
> +
> + if (IS_ERR(dentry))
> + return dentry;
> +
> + ei = kzalloc(sizeof(*ei), GFP_KERNEL);
> + if (!ei)
> + return ERR_PTR(-ENOMEM);
> + inode = tracefs_get_inode(dentry->d_sb);
> + if (unlikely(!inode)) {
> + kfree(ei);
> + tracefs_failed_creating(dentry);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + INIT_LIST_HEAD(&ei->e_top_files);
> +
> + ti = get_tracefs(inode);
> + ti->flags |= TRACEFS_EVENT_INODE;
> + ti->private = ei;
> +
> + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> + inode->i_op = &eventfs_root_dir_inode_operations;
> + inode->i_fop = &eventfs_file_operations;
> +
> + /* directory inodes start off with i_nlink == 2 (for "." entry) */
> + inc_nlink(inode);
> + d_instantiate(dentry, inode);
> + inc_nlink(dentry->d_parent->d_inode);
> + fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
> + return tracefs_end_creating(dentry);
> +}
> +
> +/**
> + * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later
> + * @name: the name of the file to create.
> + * @parent: parent dentry for this dir.
> + *
> + * This function adds eventfs subsystem dir to list.
> + * And all these dirs are created on the fly when they are looked up,
> + * and the dentry and inodes will be removed when they are done.
> + */
> +struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
> + struct dentry *parent)
> +{
> + struct tracefs_inode *ti_parent;
> + struct eventfs_inode *ei_parent;
> + struct eventfs_file *ef;
> +
> + if (!parent)
> + return ERR_PTR(-EINVAL);
> +
> + ti_parent = get_tracefs(parent->d_inode);
> + ei_parent = ti_parent->private;
> +
> + ef = eventfs_prepare_ef(name,
> + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> + &eventfs_file_operations,
> + &eventfs_root_dir_inode_operations, NULL);
> +
> + if (IS_ERR(ef))
> + return ef;
> +
> + mutex_lock(&eventfs_mutex);
> + list_add_tail(&ef->list, &ei_parent->e_top_files);
> + ef->d_parent = parent;
> + mutex_unlock(&eventfs_mutex);
> + return ef;
> +}
> +
> +/**
> + * eventfs_add_dir - add eventfs dir to list to create later
> + * @name: the name of the file to create.
> + * @ef_parent: parent eventfs_file for this dir.
> + *
> + * This function adds eventfs dir to list.
> + * And all these dirs are created on the fly when they are looked up,
> + * and the dentry and inodes will be removed when they are done.
> + */
> +struct eventfs_file *eventfs_add_dir(const char *name,
> + struct eventfs_file *ef_parent)
> +{
> + struct eventfs_file *ef;
> +
> + if (!ef_parent)
> + return ERR_PTR(-EINVAL);
> +
> + ef = eventfs_prepare_ef(name,
> + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> + &eventfs_file_operations,
> + &eventfs_root_dir_inode_operations, NULL);
> +
> + if (IS_ERR(ef))
> + return ef;
> +
> + mutex_lock(&eventfs_mutex);
> + list_add_tail(&ef->list, &ef_parent->ei->e_top_files);

> + ef->d_parent = ef_parent->dentry;

As mentioned above. Let's not add the above assignment yet, as the
ef_parent->dentry will not be anything but NULL here.

-- Steve


> + mutex_unlock(&eventfs_mutex);
> + return ef;
> +}
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index b0348efc0238..7ef3a02766f5 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -127,7 +127,7 @@ static const struct inode_operations tracefs_dir_inode_operations = {
> .rmdir = tracefs_syscall_rmdir,
> };
>
> -static struct inode *tracefs_get_inode(struct super_block *sb)
> +struct inode *tracefs_get_inode(struct super_block *sb)
> {
> struct inode *inode = new_inode(sb);
> if (inode) {
> @@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = {
> };
> MODULE_ALIAS_FS("tracefs");
>
> -static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
> +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
> {
> struct dentry *dentry;
> int error;
> @@ -437,7 +437,7 @@ static struct dentry *tracefs_start_creating(const char *name, struct dentry *pa
> return dentry;
> }
>
> -static struct dentry *tracefs_failed_creating(struct dentry *dentry)
> +struct dentry *tracefs_failed_creating(struct dentry *dentry)
> {
> inode_unlock(d_inode(dentry->d_parent));
> dput(dentry);
> @@ -445,7 +445,7 @@ static struct dentry *tracefs_failed_creating(struct dentry *dentry)
> return NULL;
> }
>
> -static struct dentry *tracefs_end_creating(struct dentry *dentry)
> +struct dentry *tracefs_end_creating(struct dentry *dentry)
> {
> inode_unlock(d_inode(dentry->d_parent));
> return dentry;
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> new file mode 100644
> index 000000000000..c443a0c32a8c
> --- /dev/null
> +++ b/fs/tracefs/internal.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _TRACEFS_INTERNAL_H
> +#define _TRACEFS_INTERNAL_H
> +
> +enum {
> + TRACEFS_EVENT_INODE = BIT(1),
> +};
> +
> +struct tracefs_inode {
> + unsigned long flags;
> + void *private;
> + struct inode vfs_inode;
> +};
> +
> +static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
> +{
> + return container_of(inode, struct tracefs_inode, vfs_inode);
> +}
> +
> +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
> +struct dentry *tracefs_end_creating(struct dentry *dentry);
> +struct dentry *tracefs_failed_creating(struct dentry *dentry);
> +struct inode *tracefs_get_inode(struct super_block *sb);
> +
> +#endif /* _TRACEFS_INTERNAL_H */
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 99912445974c..432e5e6f7901 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -21,6 +21,17 @@ struct file_operations;
>
> #ifdef CONFIG_TRACING
>
> +struct eventfs_file;
> +
> +struct dentry *eventfs_create_events_dir(const char *name,
> + struct dentry *parent);
> +
> +struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
> + struct dentry *parent);
> +
> +struct eventfs_file *eventfs_add_dir(const char *name,
> + struct eventfs_file *ef_parent);
> +
> struct dentry *tracefs_create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fops);


2023-07-14 17:11:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 05/10] eventfs: Implement eventfs file, directory remove function

Some more nits:

Subject:

eventfs: Implement removal of meta data from eventfs


On Thu, 13 Jul 2023 17:03:19 +0530
Ajay Kaher <[email protected]> wrote:

> Adding eventfs_remove(), this function will recursively remove
> dir or file info from eventfs.

When events are removed from tracefs, the eventfs must be aware of this.
The eventfs_remove() removes the meta data from eventfs so that it will no
longer create the files associated with that event.

When an instance is removed from tracefs, eventfs_remove_events_dir() will
remove and clean up the entire "events" directory.


>
> added a recursion check to eventfs_remove_rec() as it is really
> dangerous to have unchecked recursion in the kernel (we do have
> a fixed size stack).

The above doesn't need to be in the change log. It's an update from the
previous version. The eventfs_remove_rec() is added here, so the recursion
check was not.

>
> have the free use srcu callbacks. After the srcu grace periods
> are done, it adds the eventfs_file onto a llist (lockless link
> list) and wakes up a work queue. Then the work queue does the
> freeing (this needs to be done in task/workqueue context, as
> srcu callbacks are done in softirq context).

If you want to document the above, we can say:

The helper function eventfs_remove_rec() is used to clean up and free the
associated data from eventfs for both of the added functions. SRCU is used
to protect the lists of meta data stored in the eventfs. The eventfs_mutex
is used to protect the content of the items in the list.

As lookups may be happening as deletions of events are made, the freeing
of of dentry/inodes and relative information is done after the SRCU grace
period has passed. As the callback of SRCU is in a softirq context, a work
queue is added to perform the cleanups in a task context.

The struct evenfs_file is given a union of an rcu_head and a llist_node.
The SRCU callback uses the rcu_head from this structure to insert it into
the SRCU queue. When the SRCU grace periods are complete, the callback
will then insert the eventfs_file struct onto a lockless llist using the
llist_node of the structure. A union is used as this process is just a
hand off from SRCU to workqueue, and only one field is necessary for this
to work.

You can also add the above as a comment in the code (and keep it in the
change log as well).

-- Steve


>
> Signed-off-by: Ajay Kaher <[email protected]>
> Co-developed-by: Steven Rostedt (VMware) <[email protected]>
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> Tested-by: Ching-lin Yu <[email protected]>
> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
> ---
> fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++
> include/linux/tracefs.h | 4 ++
> 2 files changed, 114 insertions(+)
>
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 75dc8953d813..322a77be5a56 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -45,6 +45,7 @@ struct eventfs_file {
> };
>
> static DEFINE_MUTEX(eventfs_mutex);
> +DEFINE_STATIC_SRCU(eventfs_srcu);
>
> static const struct file_operations eventfs_file_operations = {
> };
> @@ -299,3 +300,112 @@ int eventfs_add_file(const char *name, umode_t mode,
> mutex_unlock(&eventfs_mutex);
> return 0;
> }
> +
> +static LLIST_HEAD(free_list);
> +
> +static void eventfs_workfn(struct work_struct *work)
> +{
> + struct eventfs_file *ef, *tmp;
> + struct llist_node *llnode;
> +
> + llnode = llist_del_all(&free_list);
> + llist_for_each_entry_safe(ef, tmp, llnode, llist) {
> + if (ef->created && ef->dentry)
> + dput(ef->dentry);
> + kfree(ef->name);
> + kfree(ef->ei);
> + kfree(ef);
> + }
> +}
> +
> +DECLARE_WORK(eventfs_work, eventfs_workfn);
> +
> +static void free_ef(struct rcu_head *head)
> +{
> + struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
> +
> + if (!llist_add(&ef->llist, &free_list))
> + return;
> +
> + queue_work(system_unbound_wq, &eventfs_work);
> +}
> +
> +
> +
> +/**
> + * eventfs_remove_rec - remove eventfs dir or file from list
> + * @ef: eventfs_file to be removed.
> + *
> + * This function recursively remove eventfs_file which
> + * contains info of file or dir.
> + */
> +static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> +{
> + struct eventfs_file *ef_child;
> +
> + if (!ef)
> + return;
> + /*
> + * Check recursion depth. It should never be greater than 3:
> + * 0 - events/
> + * 1 - events/group/
> + * 2 - events/group/event/
> + * 3 - events/group/event/file
> + */
> + if (WARN_ON_ONCE(level > 3))
> + return;
> +
> + if (ef->ei) {
> + /* search for nested folders or files */
> + list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
> + lockdep_is_held(&eventfs_mutex)) {
> + eventfs_remove_rec(ef_child, level + 1);
> + }
> + }
> +
> + if (ef->created && ef->dentry)
> + d_invalidate(ef->dentry);
> +
> + list_del_rcu(&ef->list);
> + call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> +}
> +
> +/**
> + * eventfs_remove - remove eventfs dir or file from list
> + * @ef: eventfs_file to be removed.
> + *
> + * This function acquire the eventfs_mutex lock and call eventfs_remove_rec()
> + */
> +void eventfs_remove(struct eventfs_file *ef)
> +{
> + if (!ef)
> + return;
> +
> + mutex_lock(&eventfs_mutex);
> + eventfs_remove_rec(ef, 0);
> + mutex_unlock(&eventfs_mutex);
> +}
> +
> +/**
> + * eventfs_remove_events_dir - remove eventfs dir or file from list
> + * @dentry: events's dentry to be removed.
> + *
> + * This function remove events main directory
> + */
> +void eventfs_remove_events_dir(struct dentry *dentry)
> +{
> + struct tracefs_inode *ti;
> + struct eventfs_inode *ei;
> +
> + if (!dentry || !dentry->d_inode)
> + return;
> +
> + ti = get_tracefs(dentry->d_inode);
> + if (!ti || !(ti->flags & TRACEFS_EVENT_INODE))
> + return;
> +
> + ei = ti->private;
> + d_invalidate(dentry);
> + dput(dentry);
> + kfree(ei);
> +}
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index a51312ff803c..2c08edd4a739 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -40,6 +40,10 @@ int eventfs_add_top_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fops);
>
> +void eventfs_remove(struct eventfs_file *ef);
> +
> +void eventfs_remove_events_dir(struct dentry *dentry);
> +
> struct dentry *tracefs_create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fops);


2023-07-14 20:38:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 08/10] eventfs: Implement tracefs_inode_cache


I wonder if this should be patch 2?

On Thu, 13 Jul 2023 17:03:22 +0530
Ajay Kaher <[email protected]> wrote:

> Creating tracefs_inode_cache which is a cache of tracefs_inode.

Create a kmem cache of tracefs_inodes. To be more efficient, as there are
lots of tracefs inodes, create its own cache. This also allows to see how
many tracefs inodes have been created.

> Adding helping functions:

Add helper functions:

> tracefs_alloc_inode()
> tracefs_free_inode()
>

-- Steve

> Signed-off-by: Ajay Kaher <[email protected]>
> Co-developed-by: Steven Rostedt (VMware) <[email protected]>
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> Tested-by: Ching-lin Yu <[email protected]>
> ---
> fs/tracefs/inode.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index 7dc692a5fee1..76820d3e97eb 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -21,13 +21,33 @@
> #include <linux/parser.h>
> #include <linux/magic.h>
> #include <linux/slab.h>
> +#include "internal.h"
>
> #define TRACEFS_DEFAULT_MODE 0700
> +static struct kmem_cache *tracefs_inode_cachep __ro_after_init;
>
> static struct vfsmount *tracefs_mount;
> static int tracefs_mount_count;
> static bool tracefs_registered;
>
> +static struct inode *tracefs_alloc_inode(struct super_block *sb)
> +{
> + struct tracefs_inode *ti;
> +
> + ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL);
> + if (!ti)
> + return NULL;
> +
> + ti->flags = 0;
> +
> + return &ti->vfs_inode;
> +}
> +
> +static void tracefs_free_inode(struct inode *inode)
> +{
> + kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
> +}
> +
> static ssize_t default_read_file(struct file *file, char __user *buf,
> size_t count, loff_t *ppos)
> {
> @@ -346,6 +366,9 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root)
> }
>
> static const struct super_operations tracefs_super_operations = {
> + .alloc_inode = tracefs_alloc_inode,
> + .free_inode = tracefs_free_inode,
> + .drop_inode = generic_delete_inode,
> .statfs = simple_statfs,
> .remount_fs = tracefs_remount,
> .show_options = tracefs_show_options,
> @@ -675,10 +698,26 @@ bool tracefs_initialized(void)
> return tracefs_registered;
> }
>
> +static void init_once(void *foo)
> +{
> + struct tracefs_inode *ti = (struct tracefs_inode *) foo;
> +
> + inode_init_once(&ti->vfs_inode);
> +}
> +
> static int __init tracefs_init(void)
> {
> int retval;
>
> + tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache",
> + sizeof(struct tracefs_inode),
> + 0, (SLAB_RECLAIM_ACCOUNT|
> + SLAB_MEM_SPREAD|
> + SLAB_ACCOUNT),
> + init_once);
> + if (!tracefs_inode_cachep)
> + return -ENOMEM;
> +
> retval = sysfs_create_mount_point(kernel_kobj, "tracing");
> if (retval)
> return -EINVAL;


2023-07-14 20:45:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] eventfs: Implement eventfs lookup, read, open functions


Some more nits.

On Thu, 13 Jul 2023 17:03:21 +0530
Ajay Kaher <[email protected]> wrote:

> Adding following inode_operations, file_operations and helper functions to
> eventfs:

Add the inode_operations, file_operations, and helper functions to eventfs:

> dcache_dir_open_wrapper()
> eventfs_root_lookup()
> eventfs_release()
> eventfs_set_ef_status_free()
> eventfs_post_create_dir()
>
> inode_operations, file_operations will be called from vfs.

The inode_operations and file_operations functions will be called from the
VFS layer.

>
> Signed-off-by: Ajay Kaher <[email protected]>
> Co-developed-by: Steven Rostedt (VMware) <[email protected]>
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> Tested-by: Ching-lin Yu <[email protected]>
> ---
> fs/tracefs/event_inode.c | 194 ++++++++++++++++++++++++++++++++++++++-
> include/linux/tracefs.h | 2 +
> 2 files changed, 194 insertions(+), 2 deletions(-)
>
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 34b5d3d8005b..7167340ac29e 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -67,7 +67,7 @@ DEFINE_STATIC_SRCU(eventfs_srcu);
> * If tracefs is not enabled in the kernel, the value -%ENODEV will be
> * returned.
> */


> -struct dentry *create_file(const char *name, umode_t mode,
> +static struct dentry *create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fop)
> {
> @@ -123,7 +123,7 @@ struct dentry *create_file(const char *name, umode_t
> mode,
> * If tracefs is not enabled in the kernel, the value -%ENODEV will be
> * returned.
> */
> -struct dentry *create_dir(const char *name, umode_t mode,
> +static struct dentry *create_dir(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fop,
> const struct inode_operations *iop)

Please fold these changes into the patches that added these functions. They
should have been created as static functions.

If you are worried about the warnings that they caused by being static and
not used, perhaps these patches should be reordered, and we add this patch
first, with those functions as stubs.

That is, build the VFS functions first with just place holders for the
internals, and then fill in the internals. That would actually make more
sense in reviewing the code.

> @@ -157,10 +157,200 @@ struct dentry *create_dir(const char *name,
> umode_t mode, return eventfs_end_creating(dentry);
> }
>
> +/**
> + * eventfs_set_ef_status_free - set the ef->status to free
> + * @dentry: dentry who's status to be freed
> + *
> + * eventfs_set_ef_status_free will be called if no more
> + * reference remains

"references remain"

> + */
> +void eventfs_set_ef_status_free(struct dentry *dentry)
> +{
> + struct tracefs_inode *ti_parent;
> + struct eventfs_file *ef;
> +
> + ti_parent = get_tracefs(dentry->d_parent->d_inode);
> + if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> + return;
> +
> + ef = dentry->d_fsdata;
> + if (!ef)
> + return;
> + ef->created = false;
> + ef->dentry = NULL;
> +}
> +
> +/**
> + * eventfs_post_create_dir - post create dir routine
> + * @ef: eventfs_file of recently created dir
> + *
> + * Files with-in eventfs dir should know dentry of parent dir

* Map the meta-data of files within an eventfs dir to their parent dentry.

> + */
> +static void eventfs_post_create_dir(struct eventfs_file *ef)
> +{
> + struct eventfs_file *ef_child;
> + struct tracefs_inode *ti;
> +
> + /* srcu lock already held */
> + /* fill parent-child relation */
> + list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
> + srcu_read_lock_held(&eventfs_srcu)) {
> + ef_child->d_parent = ef->dentry;
> + }
> +
> + ti = get_tracefs(ef->dentry->d_inode);
> + ti->private = ef->ei;
> +}
> +
> +/**
> + * eventfs_root_lookup - lookup routine to create file/dir
> + * @dir: directory in which lookup to be done

"in which a lookup is being done"

> + * @dentry: file/dir dentry
> + * @flags:

?

"unused" or "to pass as flags parameter to simple lookup"?

> + *
> + * Used to create dynamic file/dir with-in @dir, search with-in ei
> + * list, if @dentry found go ahead and create the file/dir

"within" is a real word ;-)

* Used to create a dynamic file/dir within @dir. Use the eventfs_inode
* list of meta data to find the information needed to create the file/dir.

> + */
> +
> +static struct dentry *eventfs_root_lookup(struct inode *dir,
> + struct dentry *dentry,
> + unsigned int flags)
> +{
> + struct tracefs_inode *ti;
> + struct eventfs_inode *ei;
> + struct eventfs_file *ef;
> + struct dentry *ret = NULL;
> + int idx;
> +
> + ti = get_tracefs(dir);
> + if (!(ti->flags & TRACEFS_EVENT_INODE))
> + return NULL;
> +
> + ei = ti->private;
> + idx = srcu_read_lock(&eventfs_srcu);
> + list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> + srcu_read_lock_held(&eventfs_srcu)) {
> + if (strcmp(ef->name, dentry->d_name.name))
> + continue;
> + ret = simple_lookup(dir, dentry, flags);
> + if (ef->created)
> + continue;
> + mutex_lock(&eventfs_mutex);
> + ef->created = true;
> + if (ef->ei)
> + ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
> + ef->data, ef->fop, ef->iop);
> + else
> + ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
> + ef->data, ef->fop);
> +
> + if (IS_ERR_OR_NULL(ef->dentry)) {
> + ef->created = false;
> + mutex_unlock(&eventfs_mutex);
> + } else {
> + if (ef->ei)
> + eventfs_post_create_dir(ef);
> + ef->dentry->d_fsdata = ef;
> + mutex_unlock(&eventfs_mutex);
> + dput(ef->dentry);
> + }
> + break;
> + }
> + srcu_read_unlock(&eventfs_srcu, idx);
> + return ret;
> +}
> +
> +/**
> + * eventfs_release - called to release eventfs file/dir
> + * @inode: inode to be released
> + * @file: file to be released (not used)
> + */
> +static int eventfs_release(struct inode *inode, struct file *file)
> +{
> + struct tracefs_inode *ti;
> + struct eventfs_inode *ei;
> + struct eventfs_file *ef;
> + int idx;
> +
> + ti = get_tracefs(inode);
> + if (!(ti->flags & TRACEFS_EVENT_INODE))
> + return -EINVAL;
> +
> + ei = ti->private;
> + idx = srcu_read_lock(&eventfs_srcu);
> + list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> + srcu_read_lock_held(&eventfs_srcu)) {
> + if (ef->created)
> + dput(ef->dentry);
> + }
> + srcu_read_unlock(&eventfs_srcu, idx);
> + return dcache_dir_close(inode, file);
> +}
> +
> +/**
> + * dcache_dir_open_wrapper - eventfs open wrapper
> + * @inode: not used
> + * @file: dir to be opened (to create it's child)

"its child"

> + *
> + * Used to dynamic create file/dir with-in @file, all the
> + * file/dir will be created. If already created then reference
> + * will be increased

* Used to dynamically create the file/dir within @file. @file is really a
* directory and all the files/dirs of the children within @file will be
* created. If any of the files/dirs have already been created, their
* reference count will be incremented.


> + */
> +static int dcache_dir_open_wrapper(struct inode *inode, struct file
> *file) +{
> + struct tracefs_inode *ti;
> + struct eventfs_inode *ei;
> + struct eventfs_file *ef;
> + struct dentry *dentry = file_dentry(file);
> + struct inode *f_inode = file_inode(file);
> + int idx;
> +
> + ti = get_tracefs(f_inode);
> + if (!(ti->flags & TRACEFS_EVENT_INODE))
> + return -EINVAL;
> +
> + ei = ti->private;
> + idx = srcu_read_lock(&eventfs_srcu);
> + list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
> + if (ef->created) {
> + dget(ef->dentry);
> + continue;
> + }
> + mutex_lock(&eventfs_mutex);
> + ef->created = true;
> +
> + inode_lock(dentry->d_inode);
> + if (ef->ei)
> + ef->dentry = create_dir(ef->name, ef->mode, dentry,
> + ef->data, ef->fop, ef->iop);
> + else
> + ef->dentry = create_file(ef->name, ef->mode, dentry,
> + ef->data, ef->fop);
> + inode_unlock(dentry->d_inode);
> +
> + if (IS_ERR_OR_NULL(ef->dentry)) {
> + ef->created = false;
> + } else {
> + if (ef->ei)
> + eventfs_post_create_dir(ef);
> + ef->dentry->d_fsdata = ef;
> + }
> + mutex_unlock(&eventfs_mutex);
> + }
> + srcu_read_unlock(&eventfs_srcu, idx);
> + return dcache_dir_open(inode, file);
> +}
> +
> static const struct file_operations eventfs_file_operations = {
> + .open = dcache_dir_open_wrapper,
> + .read = generic_read_dir,
> + .iterate_shared = dcache_readdir,
> + .llseek = generic_file_llseek,
> + .release = eventfs_release,
> };
>
> static const struct inode_operations eventfs_root_dir_inode_operations =
> {
> + .lookup = eventfs_root_lookup,
> };
>
> /**
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 47c1b4d21735..4d30b0cafc5f 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -51,6 +51,8 @@ void eventfs_remove(struct eventfs_file *ef);
>
> void eventfs_remove_events_dir(struct dentry *dentry);
>
> +void eventfs_set_ef_status_free(struct dentry *dentry);

Shouldn't this be internal to the tracefs code, and not something exposed
to users of tracefs?

> +
> struct dentry *tracefs_create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fops);

-- Steve

2023-07-14 23:38:03

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs

On Thu, 13 Jul 2023 17:03:14 +0530
Ajay Kaher <[email protected]> wrote:

> Events Tracing infrastructure contains lot of files, directories
> (internally in terms of inodes, dentries). And ends up by consuming
> memory in MBs. We can have multiple events of Events Tracing, which
> further requires more memory.
>
> Instead of creating inodes/dentries, eventfs could keep meta-data and
> skip the creation of inodes/dentries. As and when require, eventfs will
> create the inodes/dentries only for required files/directories.
> Also eventfs would delete the inodes/dentries once no more requires
> but preserve the meta data.
>
> Tracing events took ~9MB, with this approach it took ~4.5MB
> for ~10K files/dir.

I think we are very close to getting this in for the next merge window. I
ran several tests and so far it's holding up!

I made a bunch of nits for this series, but nothing major. Mostly fixing up
change logs and comments, as well as some naming conventions and
reorganizing the series a little bit.

Anyway, I'm hoping that v5 will be ready to go into linux-next.

Thanks a lot Ajay for working on this!

-- Steve


>
> v3:
> Patch 3,4,5,7,9:
> removed all the eventfs_rwsem code and replaced it with an srcu
> lock for the readers, and a mutex to synchronize the writers of
> the list.
>
> Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
>
> Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
> as it really should not be exposed to all users.
>
> Patch 5: added a recursion check to eventfs_remove_rec() as it is really
> dangerous to have unchecked recursion in the kernel (we do have
> a fixed size stack).
>
> have the free use srcu callbacks. After the srcu grace periods
> are done, it adds the eventfs_file onto a llist (lockless link
> list) and wakes up a work queue. Then the work queue does the
> freeing (this needs to be done in task/workqueue context, as
> srcu callbacks are done in softirq context).
>
> Patch 6: renamed:
> eventfs_create_file() -> create_file()
> eventfs_create_dir() -> create_dir()
>
> v2:
> Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
> Patch 02: moved from v1 1/9
> Patch 03: moved from v1 2/9
> As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
> helper function to add files or directories to eventfs
> fix WARNING reported by kernel test robot in v1 8/9
> Patch 04: moved from v1 3/9
> used eventfs_prepare_ef() to add files
> fix WARNING reported by kernel test robot in v1 8/9
> Patch 05: moved from v1 4/9
> fix compiling warning reported by kernel test robot in v1 4/9
> Patch 06: moved from v1 5/9
> Patch 07: moved from v1 6/9
> Patch 08: moved from v1 7/9
> Patch 09: moved from v1 8/9
> rebased because of v3 01/10
> Patch 10: moved from v1 9/9
>
> v1:
> Patch 1: add header file
> Patch 2: resolved kernel test robot issues
> protecting eventfs lists using nested eventfs_rwsem
> Patch 3: protecting eventfs lists using nested eventfs_rwsem
> Patch 4: improve events cleanup code to fix crashes
> Patch 5: resolved kernel test robot issues
> removed d_instantiate_anon() calls
> Patch 6: resolved kernel test robot issues
> fix kprobe test in eventfs_root_lookup()
> protecting eventfs lists using nested eventfs_rwsem
> Patch 7: remove header file
> Patch 8: pass eventfs_rwsem as argument to eventfs functions
> called eventfs_remove_events_dir() instead of tracefs_remove()
> from event_trace_del_tracer()
> Patch 9: new patch to fix kprobe test case
>
> Ajay Kaher (9):
> tracefs: Rename some tracefs function
> eventfs: Implement eventfs dir creation functions
> eventfs: Implement eventfs file add functions
> eventfs: Implement eventfs file, directory remove function
> eventfs: Implement functions to create eventfs files and directories
> eventfs: Implement eventfs lookup, read, open functions
> eventfs: Implement tracefs_inode_cache
> eventfs: Move tracing/events to eventfs
> test: ftrace: Fix kprobe test for eventfs
>
> Steven Rostedt (Google) (1):
> tracing: Require all trace events to have a TRACE_SYSTEM
>
> fs/tracefs/Makefile | 1 +
> fs/tracefs/event_inode.c | 711 ++++++++++++++++++
> fs/tracefs/inode.c | 124 ++-
> fs/tracefs/internal.h | 25 +
> include/linux/trace_events.h | 1 +
> include/linux/tracefs.h | 32 +
> kernel/trace/trace.h | 2 +-
> kernel/trace/trace_events.c | 78 +-
> .../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +-
> .../test.d/kprobe/kprobe_args_string.tc | 4 +-
> 10 files changed, 930 insertions(+), 52 deletions(-)
> create mode 100644 fs/tracefs/event_inode.c
> create mode 100644 fs/tracefs/internal.h
>


2023-07-16 17:59:48

by Ajay Kaher

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs



> On 15-Jul-2023, at 4:28 AM, Steven Rostedt <[email protected]> wrote:
>
> !! External Email
>
> On Thu, 13 Jul 2023 17:03:14 +0530
> Ajay Kaher <[email protected]> wrote:
>
>> Events Tracing infrastructure contains lot of files, directories
>> (internally in terms of inodes, dentries). And ends up by consuming
>> memory in MBs. We can have multiple events of Events Tracing, which
>> further requires more memory.
>>
>> Instead of creating inodes/dentries, eventfs could keep meta-data and
>> skip the creation of inodes/dentries. As and when require, eventfs will
>> create the inodes/dentries only for required files/directories.
>> Also eventfs would delete the inodes/dentries once no more requires
>> but preserve the meta data.
>>
>> Tracing events took ~9MB, with this approach it took ~4.5MB
>> for ~10K files/dir.
>
> I think we are very close to getting this in for the next merge window. I
> ran several tests and so far it's holding up!
>
> I made a bunch of nits for this series, but nothing major. Mostly fixing up
> change logs and comments, as well as some naming conventions and
> reorganizing the series a little bit.
>
> Anyway, I'm hoping that v5 will be ready to go into linux-next.
>
> Thanks a lot Ajay for working on this!


Thanks Steve, hopefully I will fix all the pending nits in v5.
Here is the checkpatch.pl report:


./scripts/checkpatch.pl v4/*
--------------------------
v4/0000-cover-letter.patch
--------------------------
total: 0 errors, 0 warnings, 0 lines checked

v4/0000-cover-letter.patch has no obvious style problems and is ready for submission.
------------------------------------------------------------------
v4/0001-tracing-Require-all-trace-events-to-have-a-TRACE_SYS.patch
------------------------------------------------------------------
total: 0 errors, 0 warnings, 22 lines checked

v4/0001-tracing-Require-all-trace-events-to-have-a-TRACE_SYS.patch has no obvious style problems and is ready for submission.
--------------------------------------------------
v4/0002-tracefs-Rename-some-tracefs-function.patch
--------------------------------------------------
total: 0 errors, 0 warnings, 71 lines checked

v4/0002-tracefs-Rename-some-tracefs-function.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------
v4/0003-eventfs-Implement-eventfs-dir-creation-functions.patch
--------------------------------------------------------------
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#52:
new file mode 100644

WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'.
#194: FILE: fs/tracefs/event_inode.c:138:
+ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;

WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'.
#229: FILE: fs/tracefs/event_inode.c:173:
+ S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,

WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'.
#261: FILE: fs/tracefs/event_inode.c:205:
+ S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,

total: 0 errors, 4 warnings, 297 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

v4/0003-eventfs-Implement-eventfs-dir-creation-functions.patch has style problems, please review.
----------------------------------------------------------
v4/0004-eventfs-Implement-eventfs-file-add-functions.patch
----------------------------------------------------------
total: 0 errors, 0 warnings, 101 lines checked

v4/0004-eventfs-Implement-eventfs-file-add-functions.patch has no obvious style problems and is ready for submission.
------------------------------------------------------------------
v4/0005-eventfs-Implement-eventfs-file-directory-remove-func.patch
------------------------------------------------------------------
total: 0 errors, 0 warnings, 129 lines checked

v4/0005-eventfs-Implement-eventfs-file-directory-remove-func.patch has no obvious style problems and is ready for submission.
------------------------------------------------------------------
v4/0006-eventfs-Implement-functions-to-create-eventfs-files-.patch
------------------------------------------------------------------
total: 0 errors, 0 warnings, 182 lines checked

v4/0006-eventfs-Implement-functions-to-create-eventfs-files-.patch has no obvious style problems and is ready for submission.
------------------------------------------------------------------
v4/0007-eventfs-Implement-eventfs-lookup-read-open-functions.patch
------------------------------------------------------------------
total: 0 errors, 0 warnings, 224 lines checked

v4/0007-eventfs-Implement-eventfs-lookup-read-open-functions.patch has no obvious style problems and is ready for submission.
---------------------------------------------------
v4/0008-eventfs-Implement-tracefs_inode_cache.patch
---------------------------------------------------
total: 0 errors, 0 warnings, 68 lines checked

v4/0008-eventfs-Implement-tracefs_inode_cache.patch has no obvious style problems and is ready for submission.
----------------------------------------------------
v4/0009-eventfs-Move-tracing-events-to-eventfs.patch
----------------------------------------------------
total: 0 errors, 0 warnings, 241 lines checked

v4/0009-eventfs-Move-tracing-events-to-eventfs.patch has no obvious style problems and is ready for submission.
-----------------------------------------------------
v4/0010-test-ftrace-Fix-kprobe-test-for-eventfs.patch
-----------------------------------------------------
total: 0 errors, 0 warnings, 32 lines checked

v4/0010-test-ftrace-Fix-kprobe-test-for-eventfs.patch has no obvious style problems and is ready for submission.

-Ajay



2023-07-18 13:52:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs

On Sun, 16 Jul 2023 17:32:35 +0000
Ajay Kaher <[email protected]> wrote:

> Thanks Steve, hopefully I will fix all the pending nits in v5.
> Here is the checkpatch.pl report:

Hold off on v5. I hit the following on v4:

[ 220.170527] BUG: unable to handle page fault for address: fffffffffffffff0
[ 220.172792] #PF: supervisor read access in kernel mode
[ 220.174618] #PF: error_code(0x0000) - not-present page
[ 220.176516] PGD 13104d067 P4D 13104d067 PUD 13104f067 PMD 0
[ 220.178559] Oops: 0000 [#1] PREEMPT SMP PTI
[ 220.180087] CPU: 3 PID: 35 Comm: kworker/u8:1 Not tainted 6.5.0-rc1-test-00021-gdd6e7af33766-dirty #15
[ 220.183441] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 220.186629] Workqueue: events_unbound eventfs_workfn
[ 220.188286] RIP: 0010:eventfs_set_ef_status_free+0x17/0x40
[ 220.190091] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 47 18 48 8b 40 30 48 83 f8 10 74 1b <f6> 40 f0 02 74 15 48 8b 47 78 48 85 c0 74 0c c6 40 5a 00 48 c7 40
[ 220.195360] RSP: 0018:ffffa731c0147e20 EFLAGS: 00010287
[ 220.196802] RAX: 0000000000000000 RBX: ffff97ca512ca000 RCX: 0000000000000000
[ 220.198703] RDX: 0000000000000001 RSI: ffff97ca52d18010 RDI: ffff97ca512ca000
[ 220.200540] RBP: ffff97ca52cb3780 R08: 0000000000000064 R09: 00000000802a0022
[ 220.202324] R10: 0000000000039e80 R11: ffff97cabffd5000 R12: ffff97ca512ca058
[ 220.204012] R13: ffff97ca52cb3780 R14: ffff97ca40153705 R15: ffffffffad5c1848
[ 220.205685] FS: 0000000000000000(0000) GS:ffff97cabbd80000(0000) knlGS:0000000000000000
[ 220.207476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 220.208764] CR2: fffffffffffffff0 CR3: 000000010a01a001 CR4: 0000000000170ee0
[ 220.210342] Call Trace:
[ 220.210879] <TASK>
[ 220.211359] ? __die+0x23/0x70
[ 220.212036] ? page_fault_oops+0xa4/0x180
[ 220.212904] ? exc_page_fault+0xf6/0x190
[ 220.213738] ? asm_exc_page_fault+0x26/0x30
[ 220.214586] ? eventfs_set_ef_status_free+0x17/0x40
[ 220.216081] tracefs_dentry_iput+0x39/0x50
[ 220.217370] __dentry_kill+0xdc/0x170
[ 220.218581] dput+0x142/0x310
[ 220.219647] eventfs_workfn+0x42/0x70
[ 220.220805] process_one_work+0x1e2/0x3e0
[ 220.222031] worker_thread+0x1da/0x390
[ 220.223204] ? __pfx_worker_thread+0x10/0x10
[ 220.224476] kthread+0xf7/0x130
[ 220.225543] ? __pfx_kthread+0x10/0x10
[ 220.226735] ret_from_fork+0x2c/0x50
[ 220.227898] </TASK>
[ 220.228792] Modules linked in:
[ 220.229860] CR2: fffffffffffffff0
[ 220.230960] ---[ end trace 0000000000000000 ]---


I think I know the issue, and looking to see if I can fix it.

-- Steve

2023-07-19 11:09:25

by Ajay Kaher

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs



> On 18-Jul-2023, at 7:10 PM, Steven Rostedt <[email protected]> wrote:
>
> !! External Email
>
> On Sun, 16 Jul 2023 17:32:35 +0000
> Ajay Kaher <[email protected]> wrote:
>
>> Thanks Steve, hopefully I will fix all the pending nits in v5.
>> Here is the checkpatch.pl report:
>
> Hold off on v5. I hit the following on v4:

OK.

>
> [ 220.170527] BUG: unable to handle page fault for address: fffffffffffffff0
> [ 220.172792] #PF: supervisor read access in kernel mode
> [ 220.174618] #PF: error_code(0x0000) - not-present page
> [ 220.176516] PGD 13104d067 P4D 13104d067 PUD 13104f067 PMD 0
> [ 220.178559] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 220.180087] CPU: 3 PID: 35 Comm: kworker/u8:1 Not tainted 6.5.0-rc1-test-00021-gdd6e7af33766-dirty #15
> [ 220.183441] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 220.186629] Workqueue: events_unbound eventfs_workfn
> [ 220.188286] RIP: 0010:eventfs_set_ef_status_free+0x17/0x40
> [ 220.190091] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 47 18 48 8b 40 30 48 83 f8 10 74 1b <f6> 40 f0 02 74 15 48 8b 47 78 48 85 c0 74 0c c6 40 5a 00 48 c7 40
> [ 220.195360] RSP: 0018:ffffa731c0147e20 EFLAGS: 00010287
> [ 220.196802] RAX: 0000000000000000 RBX: ffff97ca512ca000 RCX: 0000000000000000
> [ 220.198703] RDX: 0000000000000001 RSI: ffff97ca52d18010 RDI: ffff97ca512ca000
> [ 220.200540] RBP: ffff97ca52cb3780 R08: 0000000000000064 R09: 00000000802a0022
> [ 220.202324] R10: 0000000000039e80 R11: ffff97cabffd5000 R12: ffff97ca512ca058
> [ 220.204012] R13: ffff97ca52cb3780 R14: ffff97ca40153705 R15: ffffffffad5c1848
> [ 220.205685] FS: 0000000000000000(0000) GS:ffff97cabbd80000(0000) knlGS:0000000000000000
> [ 220.207476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 220.208764] CR2: fffffffffffffff0 CR3: 000000010a01a001 CR4: 0000000000170ee0
> [ 220.210342] Call Trace:
> [ 220.210879] <TASK>
> [ 220.211359] ? __die+0x23/0x70
> [ 220.212036] ? page_fault_oops+0xa4/0x180
> [ 220.212904] ? exc_page_fault+0xf6/0x190
> [ 220.213738] ? asm_exc_page_fault+0x26/0x30
> [ 220.214586] ? eventfs_set_ef_status_free+0x17/0x40
> [ 220.216081] tracefs_dentry_iput+0x39/0x50
> [ 220.217370] __dentry_kill+0xdc/0x170
> [ 220.218581] dput+0x142/0x310
> [ 220.219647] eventfs_workfn+0x42/0x70
> [ 220.220805] process_one_work+0x1e2/0x3e0
> [ 220.222031] worker_thread+0x1da/0x390
> [ 220.223204] ? __pfx_worker_thread+0x10/0x10
> [ 220.224476] kthread+0xf7/0x130
> [ 220.225543] ? __pfx_kthread+0x10/0x10
> [ 220.226735] ret_from_fork+0x2c/0x50
> [ 220.227898] </TASK>
> [ 220.228792] Modules linked in:
> [ 220.229860] CR2: fffffffffffffff0
> [ 220.230960] ---[ end trace 0000000000000000 ]---
>
>
> I think I know the issue, and looking to see if I can fix it.


- Is it also reproducible on v3?
- Is it manually reproducible or reproducible using any specific script?

Let me know if I can help.

-Ajay


2023-07-19 14:28:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs

On Wed, 19 Jul 2023 10:25:28 +0000
Ajay Kaher <[email protected]> wrote:

> - Is it also reproducible on v3?
> - Is it manually reproducible or reproducible using any specific script?
>
> Let me know if I can help.

Just tried it against v3, and it gave me the splat that I originally had
and starting to fix, which now gives me another splat. I'll spend a couple
more days on it and start sharing code and seeing if we can work together
on this.

Here's the reproducer (of both v3 splat and the bug I'm hitting now).

~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events

v3 gives me (and my updates too)

======================================================
WARNING: possible circular locking dependency detected
6.5.0-rc1-test+ #576 Not tainted
------------------------------------------------------
trace-cmd/840 is trying to acquire lock:
ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0

but task is already holding lock:
ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (eventfs_rwsem/1){.+.+}-{3:3}:
down_read_nested+0x41/0x180
eventfs_root_lookup+0x42/0x120
__lookup_slow+0xff/0x1b0
walk_component+0xdb/0x150
path_lookupat+0x67/0x1a0
filename_lookup+0xe4/0x1f0
vfs_statx+0x9e/0x180
vfs_fstatat+0x51/0x70
__do_sys_newfstatat+0x3f/0x80
do_syscall_64+0x3a/0xc0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8

-> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}:
__lock_acquire+0x165d/0x2390
lock_acquire+0xd4/0x2d0
down_write+0x3b/0xd0
dcache_dir_open_wrapper+0xc1/0x1b0
do_dentry_open+0x20c/0x510
path_openat+0x7ad/0xc60
do_filp_open+0xaf/0x160
do_sys_openat2+0xab/0xe0
__x64_sys_openat+0x6a/0xa0
do_syscall_64+0x3a/0xc0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
rlock(eventfs_rwsem/1);
lock(&sb->s_type->i_mutex_key#5);
lock(eventfs_rwsem/1);
lock(&sb->s_type->i_mutex_key#5);

*** DEADLOCK ***

1 lock held by trace-cmd/840:
#0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0

stack backtrace:
CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x90
check_noncircular+0x14b/0x160
__lock_acquire+0x165d/0x2390
lock_acquire+0xd4/0x2d0
? dcache_dir_open_wrapper+0xc1/0x1b0
down_write+0x3b/0xd0
? dcache_dir_open_wrapper+0xc1/0x1b0
dcache_dir_open_wrapper+0xc1/0x1b0
? __pfx_dcache_dir_open_wrapper+0x10/0x10
do_dentry_open+0x20c/0x510
path_openat+0x7ad/0xc60
do_filp_open+0xaf/0x160
do_sys_openat2+0xab/0xe0
__x64_sys_openat+0x6a/0xa0
do_syscall_64+0x3a/0xc0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7f1743267e41
Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41
RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c
RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040
R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0
R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b
</TASK>


I moved the code around a bit, and it appears that kprobes is getting
dput() more than once.

I moved the d_invalidate() and dput() into the workqueue function, and on
kprobes going away, d_invalidate() frees it, and dput() is now corrupted.

Still investigating. The VFS layer is a magic box that needs the right
wizard hat to deal with, but I unfortunately am waiting on back order to
retrieve that specific hat :-p

-- Steve

2023-07-19 18:59:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs

On Wed, 19 Jul 2023 18:37:12 +0000
Ajay Kaher <[email protected]> wrote:

> > Here's the reproducer (of both v3 splat and the bug I'm hitting now).
> >
> > ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
> > ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
> > ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
>
> I tried above steps on v4 but couldn’t reproduce:
>
> root@photon-6 [ ~/sdb/linux ]# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
> root@photon-6 [ ~/sdb/linux ]# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
> enable filter format id trigger
> root@photon-6 [ ~/sdb/linux ]# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
> -bash: echo: write error: No such file or directory
>
> I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list
> will have unordered list and could cause problem.

I modified the srcu portion a bit. Will post soon. I think I got something
working.

I'm having doubt that the dput()s were needed in the eventfs_remove_rec(),
as the d_invalidate() appears to be enough. I'm still testing.

>
>
> >
> > v3 gives me (and my updates too)
> >
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 6.5.0-rc1-test+ #576 Not tainted
> > ------------------------------------------------------
> > trace-cmd/840 is trying to acquire lock:
> > ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0
> >
> > but task is already holding lock:
> > ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #1 (eventfs_rwsem/1){.+.+}-{3:3}:
> > down_read_nested+0x41/0x180
> > eventfs_root_lookup+0x42/0x120
> > __lookup_slow+0xff/0x1b0
> > walk_component+0xdb/0x150
> > path_lookupat+0x67/0x1a0
> > filename_lookup+0xe4/0x1f0
> > vfs_statx+0x9e/0x180
> > vfs_fstatat+0x51/0x70
> > __do_sys_newfstatat+0x3f/0x80
> > do_syscall_64+0x3a/0xc0
> > entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> >
> > -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}:
> > __lock_acquire+0x165d/0x2390
> > lock_acquire+0xd4/0x2d0
> > down_write+0x3b/0xd0
> > dcache_dir_open_wrapper+0xc1/0x1b0
> > do_dentry_open+0x20c/0x510
> > path_openat+0x7ad/0xc60
> > do_filp_open+0xaf/0x160
> > do_sys_openat2+0xab/0xe0
> > __x64_sys_openat+0x6a/0xa0
> > do_syscall_64+0x3a/0xc0
> > entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> >
> > other info that might help us debug this:
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > rlock(eventfs_rwsem/1);
> > lock(&sb->s_type->i_mutex_key#5);
> > lock(eventfs_rwsem/1);
> > lock(&sb->s_type->i_mutex_key#5);
> >
> > *** DEADLOCK ***
> >
> > 1 lock held by trace-cmd/840:
> > #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
> >
> > stack backtrace:
> > CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x57/0x90
> > check_noncircular+0x14b/0x160
> > __lock_acquire+0x165d/0x2390
> > lock_acquire+0xd4/0x2d0
> > ? dcache_dir_open_wrapper+0xc1/0x1b0
> > down_write+0x3b/0xd0
> > ? dcache_dir_open_wrapper+0xc1/0x1b0
> > dcache_dir_open_wrapper+0xc1/0x1b0
> > ? __pfx_dcache_dir_open_wrapper+0x10/0x10
> > do_dentry_open+0x20c/0x510
> > path_openat+0x7ad/0xc60
> > do_filp_open+0xaf/0x160
> > do_sys_openat2+0xab/0xe0
> > __x64_sys_openat+0x6a/0xa0
> > do_syscall_64+0x3a/0xc0
> > entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > RIP: 0033:0x7f1743267e41
> > Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
> > RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41
> > RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c
> > RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040
> > R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0
> > R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b
> > </TASK>
> >
>
> This is expected from v3 (just ignore as of now), if eventfs_set_ef_status_free crash not
> reproduced on v3 then it’s v4 issue.

The issue comes from fixing the above ;-)

-- Steve

2023-07-19 19:34:27

by Ajay Kaher

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs



> On 19-Jul-2023, at 7:53 PM, Steven Rostedt <[email protected]> wrote:
>
> !! External Email
>
> On Wed, 19 Jul 2023 10:25:28 +0000
> Ajay Kaher <[email protected]> wrote:
>
>> - Is it also reproducible on v3?
>> - Is it manually reproducible or reproducible using any specific script?
>>
>> Let me know if I can help.
>
> Just tried it against v3, and it gave me the splat that I originally had
> and starting to fix, which now gives me another splat. I'll spend a couple
> more days on it and start sharing code and seeing if we can work together
> on this.
>
> Here's the reproducer (of both v3 splat and the bug I'm hitting now).
>
> ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
> ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
> ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events

I tried above steps on v4 but couldn’t reproduce:

root@photon-6 [ ~/sdb/linux ]# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
root@photon-6 [ ~/sdb/linux ]# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
enable filter format id trigger
root@photon-6 [ ~/sdb/linux ]# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
-bash: echo: write error: No such file or directory

I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list
will have unordered list and could cause problem.


>
> v3 gives me (and my updates too)
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.5.0-rc1-test+ #576 Not tainted
> ------------------------------------------------------
> trace-cmd/840 is trying to acquire lock:
> ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0
>
> but task is already holding lock:
> ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (eventfs_rwsem/1){.+.+}-{3:3}:
> down_read_nested+0x41/0x180
> eventfs_root_lookup+0x42/0x120
> __lookup_slow+0xff/0x1b0
> walk_component+0xdb/0x150
> path_lookupat+0x67/0x1a0
> filename_lookup+0xe4/0x1f0
> vfs_statx+0x9e/0x180
> vfs_fstatat+0x51/0x70
> __do_sys_newfstatat+0x3f/0x80
> do_syscall_64+0x3a/0xc0
> entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>
> -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}:
> __lock_acquire+0x165d/0x2390
> lock_acquire+0xd4/0x2d0
> down_write+0x3b/0xd0
> dcache_dir_open_wrapper+0xc1/0x1b0
> do_dentry_open+0x20c/0x510
> path_openat+0x7ad/0xc60
> do_filp_open+0xaf/0x160
> do_sys_openat2+0xab/0xe0
> __x64_sys_openat+0x6a/0xa0
> do_syscall_64+0x3a/0xc0
> entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> rlock(eventfs_rwsem/1);
> lock(&sb->s_type->i_mutex_key#5);
> lock(eventfs_rwsem/1);
> lock(&sb->s_type->i_mutex_key#5);
>
> *** DEADLOCK ***
>
> 1 lock held by trace-cmd/840:
> #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
>
> stack backtrace:
> CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x57/0x90
> check_noncircular+0x14b/0x160
> __lock_acquire+0x165d/0x2390
> lock_acquire+0xd4/0x2d0
> ? dcache_dir_open_wrapper+0xc1/0x1b0
> down_write+0x3b/0xd0
> ? dcache_dir_open_wrapper+0xc1/0x1b0
> dcache_dir_open_wrapper+0xc1/0x1b0
> ? __pfx_dcache_dir_open_wrapper+0x10/0x10
> do_dentry_open+0x20c/0x510
> path_openat+0x7ad/0xc60
> do_filp_open+0xaf/0x160
> do_sys_openat2+0xab/0xe0
> __x64_sys_openat+0x6a/0xa0
> do_syscall_64+0x3a/0xc0
> entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> RIP: 0033:0x7f1743267e41
> Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
> RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41
> RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c
> RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040
> R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0
> R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b
> </TASK>
>

This is expected from v3 (just ignore as of now), if eventfs_set_ef_status_free crash not
reproduced on v3 then it’s v4 issue.

-Ajay

>
> I moved the code around a bit, and it appears that kprobes is getting
> dput() more than once.
>
> I moved the d_invalidate() and dput() into the workqueue function, and on
> kprobes going away, d_invalidate() frees it, and dput() is now corrupted.
>
> Still investigating. The VFS layer is a magic box that needs the right
> wizard hat to deal with, but I unfortunately am waiting on back order to
> retrieve that specific hat :-p
>
> -- Steve
>
> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

2023-07-21 13:48:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs


[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ]

On Fri, 21 Jul 2023 08:48:39 -0400
Steven Rostedt <[email protected]> wrote:

> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 4db048250cdb..2718de1533e6 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -36,16 +36,36 @@ struct eventfs_file {
> const struct file_operations *fop;
> const struct inode_operations *iop;
> union {
> + struct list_head del_list;
> struct rcu_head rcu;
> - struct llist_node llist; /* For freeing after RCU */
> + unsigned long is_freed; /* Freed if one of the above is set */

I changed the freeing around. The dentries are freed before returning from
eventfs_remove_dir().

I also added a "is_freed" field that is part of the union and is set if
list elements have content. Note, since the union was criticized before, I
will state the entire purpose of doing this patch set is to save memory.
This structure will be used for every event file. What's the point of
getting rid of dentries if we are replacing it with something just as big?
Anyway, struct dentry does the exact same thing!

> };
> void *data;
> umode_t mode;
> - bool created;
> + unsigned int flags;

Bah, I forgot to remove flags (one iteration replaced the created with
flags to set both created and freed). I removed the freed with the above
"is_freed" and noticed that created is set if and only if ef->dentry is
set. So instead of using the created boolean, just test ef->dentry.

The flags isn't used and can be removed. I just forgot to do so.

> };
>
> static DEFINE_MUTEX(eventfs_mutex);
> DEFINE_STATIC_SRCU(eventfs_srcu);
> +
> +static struct dentry *eventfs_root_lookup(struct inode *dir,
> + struct dentry *dentry,
> + unsigned int flags);
> +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
> +static int eventfs_release(struct inode *inode, struct file *file);
> +
> +static const struct inode_operations eventfs_root_dir_inode_operations = {
> + .lookup = eventfs_root_lookup,
> +};
> +
> +static const struct file_operations eventfs_file_operations = {
> + .open = dcache_dir_open_wrapper,
> + .read = generic_read_dir,
> + .iterate_shared = dcache_readdir,
> + .llseek = generic_file_llseek,
> + .release = eventfs_release,
> +};
> +

In preparing for getting rid of eventfs_file, I noticed that all
directories are set to the above ops. In create_dir() instead of passing in
ef->*ops, just use these directly. This does help with future work.

> /**
> * create_file - create a file in the tracefs filesystem
> * @name: the name of the file to create.
> @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
> * If tracefs is not enabled in the kernel, the value -%ENODEV will be
> * returned.
> */
> -static struct dentry *create_dir(const char *name, umode_t mode,
> - struct dentry *parent, void *data,
> - const struct file_operations *fop,
> - const struct inode_operations *iop)
> +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
> {

As stated, the directories always used the same *op values, so I just hard
coded it.

> struct tracefs_inode *ti;
> struct dentry *dentry;
> struct inode *inode;
>
> - WARN_ON(!S_ISDIR(mode));
> -
> dentry = eventfs_start_creating(name, parent);
> if (IS_ERR(dentry))
> return dentry;
> @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
> if (unlikely(!inode))
> return eventfs_failed_creating(dentry);
>
> - inode->i_mode = mode;
> - inode->i_op = iop;
> - inode->i_fop = fop;
> + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> + inode->i_op = &eventfs_root_dir_inode_operations;
> + inode->i_fop = &eventfs_file_operations;
> inode->i_private = data;
>
> ti = get_tracefs(inode);
> @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
> struct tracefs_inode *ti_parent;
> struct eventfs_file *ef;
>
> + mutex_lock(&eventfs_mutex);

To synchronize with the removals, I needed to add locking here.

> ti_parent = get_tracefs(dentry->d_parent->d_inode);
> if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> - return;
> + goto out;
>
> ef = dentry->d_fsdata;
> if (!ef)
> - return;
> - ef->created = false;
> + goto out;
> + /*
> + * If ef was freed, then the LSB bit is set for d_fsdata.
> + * But this should not happen, as it should still have a
> + * ref count that prevents it. Warn in case it does.
> + */
> + if (WARN_ON_ONCE((unsigned long)ef & 1))
> + goto out;

During the remove, a dget() is done to keep the dentry from freeing. To
make sure that it doesn't get freed, I added this test.

> +
> + dentry->d_fsdata = NULL;
> +
> ef->dentry = NULL;
> + out:
> + mutex_unlock(&eventfs_mutex);
> }
>
> /**
> @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> ti->private = ef->ei;
> }
>
> +static struct dentry *
> +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> +{

Because both the lookup and the dir_open_wrapper did basically the same
thing, I created a helper function so that I didn't have to update both
locations.

> + bool invalidate = false;
> + struct dentry *dentry;
> +
> + mutex_lock(&eventfs_mutex);
> + if (ef->is_freed) {
> + mutex_unlock(&eventfs_mutex);
> + return NULL;
> + }

Ignore if the ef is on its way to be freed.

> + if (ef->dentry) {
> + dentry = ef->dentry;

If the ef already has a dentry (created) then use it.

> + /* On dir open, up the ref count */
> + if (!lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> + mutex_unlock(&eventfs_mutex);
> +
> + if (!lookup)
> + inode_lock(parent->d_inode);
> +
> + if (ef->ei)
> + dentry = create_dir(ef->name, parent, ef->data);
> + else
> + dentry = create_file(ef->name, ef->mode, parent,
> + ef->data, ef->fop);
> +
> + if (!lookup)
> + inode_unlock(parent->d_inode);
> +
> + mutex_lock(&eventfs_mutex);
> + if (IS_ERR_OR_NULL(dentry)) {

With the lock dropped, the dentry could have been created causing it to
fail. Check if the ef->dentry exists, and if so, use it instead.

Note, if the ef is freed, it should not have a dentry.

> + /* If the ef was already updated get it */
> + dentry = ef->dentry;
> + if (dentry && !lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> +
> + if (!ef->dentry && !ef->is_freed) {

With the lock dropped, the dentry could have been filled too. If so, drop
the created dentry and use the one owned by the ef->dentry.

> + ef->dentry = dentry;
> + if (ef->ei)
> + eventfs_post_create_dir(ef);
> + dentry->d_fsdata = ef;
> + } else {
> + /* A race here, should try again (unless freed) */
> + invalidate = true;

I had a WARN_ON() once here. Probably could add a:

WARN_ON_ONCE(!ef->is_freed);

> + }
> + mutex_unlock(&eventfs_mutex);
> + if (invalidate)
> + d_invalidate(dentry);
> +
> + if (lookup || invalidate)
> + dput(dentry);
> +
> + return invalidate ? NULL : dentry;
> +}
> +
> +static bool match_event_file(struct eventfs_file *ef, const char *name)
> +{

A bit of a paranoid helper function. I wanted to make sure to synchronize
with the removals.

> + bool ret;
> +
> + mutex_lock(&eventfs_mutex);
> + ret = !ef->is_freed && strcmp(ef->name, name) == 0;
> + mutex_unlock(&eventfs_mutex);
> +
> + return ret;
> +}
> +
> /**
> * eventfs_root_lookup - lookup routine to create file/dir
> * @dir: directory in which lookup to be done
> @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> * Used to create dynamic file/dir with-in @dir, search with-in ei
> * list, if @dentry found go ahead and create the file/dir
> */
> -
> static struct dentry *eventfs_root_lookup(struct inode *dir,
> struct dentry *dentry,
> unsigned int flags)
> @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (strcmp(ef->name, dentry->d_name.name))
> + if (!match_event_file(ef, dentry->d_name.name))
> continue;
> ret = simple_lookup(dir, dentry, flags);
> - if (ef->created)
> - continue;
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - mutex_unlock(&eventfs_mutex);
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - mutex_unlock(&eventfs_mutex);
> - dput(ef->dentry);
> - }
> + create_dentry(ef, ef->d_parent, true);
> break;
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
> struct tracefs_inode *ti;
> struct eventfs_inode *ei;
> struct eventfs_file *ef;
> + struct dentry *dentry;
> int idx;
>
> ti = get_tracefs(inode);
> @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (ef->created)
> - dput(ef->dentry);
> + mutex_lock(&eventfs_mutex);
> + dentry = ef->dentry;
> + mutex_unlock(&eventfs_mutex);
> + if (dentry)
> + dput(dentry);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_close(inode, file);
> @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
> ei = ti->private;
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
> - if (ef->created) {
> - dget(ef->dentry);
> - continue;
> - }
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> -
> - inode_lock(dentry->d_inode);
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, dentry,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, dentry,
> - ef->data, ef->fop);
> - inode_unlock(dentry->d_inode);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - }
> - mutex_unlock(&eventfs_mutex);
> + create_dentry(ef, dentry, false);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_open(inode, file);
> }
>
> -static const struct file_operations eventfs_file_operations = {
> - .open = dcache_dir_open_wrapper,
> - .read = generic_read_dir,
> - .iterate_shared = dcache_readdir,
> - .llseek = generic_file_llseek,
> - .release = eventfs_release,
> -};
> -
> -static const struct inode_operations eventfs_root_dir_inode_operations = {
> - .lookup = eventfs_root_lookup,
> -};
> -
> /**
> * eventfs_prepare_ef - helper function to prepare eventfs_file
> * @name: the name of the file/directory to create.
> @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
> ti_parent = get_tracefs(parent->d_inode);
> ei_parent = ti_parent->private;
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);

For directories, just use the hard coded values.

> if (IS_ERR(ef))
> return ef;
>
> @@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
> if (!ef_parent)
> return ERR_PTR(-EINVAL);
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);

ditto.

> if (IS_ERR(ef))
> return ef;
>
> @@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
> return 0;
> }
>
> -static LLIST_HEAD(free_list);
> -
> -static void eventfs_workfn(struct work_struct *work)
> -{
> - struct eventfs_file *ef, *tmp;
> - struct llist_node *llnode;
> -
> - llnode = llist_del_all(&free_list);
> - llist_for_each_entry_safe(ef, tmp, llnode, llist) {
> - if (ef->created && ef->dentry)
> - dput(ef->dentry);
> - kfree(ef->name);
> - kfree(ef->ei);
> - kfree(ef);
> - }
> -}
> -
> -DECLARE_WORK(eventfs_work, eventfs_workfn);
> -
> static void free_ef(struct rcu_head *head)
> {
> struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
>
> - if (!llist_add(&ef->llist, &free_list))
> - return;
> -
> - queue_work(system_unbound_wq, &eventfs_work);
> + kfree(ef->name);
> + kfree(ef->ei);
> + kfree(ef);

Since I did not do the dput() or d_invalidate() here I don't need call this
from task context. This simplifies the process.

> }
>
> -
> -
> /**
> * eventfs_remove_rec - remove eventfs dir or file from list
> * @ef: eventfs_file to be removed.
> @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
> * This function recursively remove eventfs_file which
> * contains info of file or dir.
> */
> -static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
> {
> struct eventfs_file *ef_child;
>
> @@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> /* search for nested folders or files */
> list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
> lockdep_is_held(&eventfs_mutex)) {
> - eventfs_remove_rec(ef_child, level + 1);
> + eventfs_remove_rec(ef_child, head, level + 1);
> }
> }
>
> - if (ef->created && ef->dentry)
> - d_invalidate(ef->dentry);
> -
> list_del_rcu(&ef->list);
> - call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + list_add_tail(&ef->del_list, head);

Hold off on freeing the ef. Add it to a link list to do so later.

> }
>
> /**
> @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> */
> void eventfs_remove(struct eventfs_file *ef)
> {
> + struct eventfs_file *tmp;
> + LIST_HEAD(ef_del_list);
> + struct dentry *dentry_list = NULL;
> + struct dentry *dentry;
> +
> if (!ef)
> return;
>
> mutex_lock(&eventfs_mutex);
> - eventfs_remove_rec(ef, 0);
> + eventfs_remove_rec(ef, &ef_del_list, 0);

The above returns back with ef_del_list holding all the ef's to be freed.

I probably could have just passed the dentry_list down instead, but I
wanted the below complexity done in a non recursive function.

> +
> + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
> + if (ef->dentry) {
> + unsigned long ptr = (unsigned long)dentry_list;
> +
> + /* Keep the dentry from being freed yet */
> + dget(ef->dentry);
> +
> + /*
> + * Paranoid: The dget() above should prevent the dentry
> + * from being freed and calling eventfs_set_ef_status_free().
> + * But just in case, set the link list LSB pointer to 1
> + * and have eventfs_set_ef_status_free() check that to
> + * make sure that if it does happen, it will not think
> + * the d_fsdata is an event_file.
> + *
> + * For this to work, no event_file should be allocated
> + * on a odd space, as the ef should always be allocated
> + * to be at least word aligned. Check for that too.
> + */
> + WARN_ON_ONCE(ptr & 1);
> +
> + ef->dentry->d_fsdata = (void *)(ptr | 1);

Set the d_fsdata to be a link list. The comment above needs to say to say
struct eventfs_file and struct dentry should be word aligned. Anyway, while
the eventfs_mutex is held, set all the dentries belonging to eventfs_files
to the dentry_list and clear the ef->dentry.


> + dentry_list = ef->dentry;
> + ef->dentry = NULL;
> + }
> + call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + }
> mutex_unlock(&eventfs_mutex);
> +
> + while (dentry_list) {
> + unsigned long ptr;
> +
> + dentry = dentry_list;
> + ptr = (unsigned long)dentry->d_fsdata & ~1UL;
> + dentry_list = (struct dentry *)ptr;
> + dentry->d_fsdata = NULL;

With the mutex released, it is safe to free the dentries here. This also
must be done before returning from this function, as when I had it done in
the workqueue, it was failing some tests that would remove a dynamic event
and still see that the directory was still around!

> + d_invalidate(dentry);
> + mutex_lock(&eventfs_mutex);
> + /* dentry should now have at least a single reference */
> + WARN_ONCE((int)d_count(dentry) < 1,
> + "dentry %px less than one reference (%d) after invalidate\n",

I did update the above to:

WARN_ONCE((int)d_count(dentry) < 1,
"dentry %px (%s) less than one reference (%d) after invalidate\n",
dentry, dentry->d_name.name, d_count(dentry));

To include the name of the dentry (my current work is triggering this still).

> + dentry, d_count(dentry));
> + mutex_unlock(&eventfs_mutex);
> + dput(dentry);
> + }
> }
>
> /**
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> index c443a0c32a8c..1b880b5cd29d 100644
> --- a/fs/tracefs/internal.h
> +++ b/fs/tracefs/internal.h
> @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
> struct dentry *tracefs_failed_creating(struct dentry *dentry);
> struct inode *tracefs_get_inode(struct super_block *sb);
>
> +void eventfs_set_ef_status_free(struct dentry *dentry);
> +
> #endif /* _TRACEFS_INTERNAL_H */
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 4d30b0cafc5f..47c1b4d21735 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
>
> void eventfs_remove_events_dir(struct dentry *dentry);
>
> -void eventfs_set_ef_status_free(struct dentry *dentry);
> -

Oh, and eventfs_set_ef_status_free() should not be exported to outside the
tracefs system.

-- Steve


> struct dentry *tracefs_create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fops);


2023-07-21 13:50:10

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs


[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ]

On Wed, 19 Jul 2023 14:40:46 -0400
Steven Rostedt <[email protected]> wrote:

> > I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list
> > will have unordered list and could cause problem.
>
> I modified the srcu portion a bit. Will post soon. I think I got something
> working.
>
> I'm having doubt that the dput()s were needed in the eventfs_remove_rec(),
> as the d_invalidate() appears to be enough. I'm still testing.

OK, I got this working and it appears to pass all my tests. I actually got
it working Wednesday night, but I tried a different approach on Thursday
that got rid of the evenfs_file and only used eventfs_inodes and makes the
files more dynamic. There's still a couple of corner cases that are not
working with this approach (the dentry counters are getting out of sync).
This should not stop this from going in. I'll continue working on that
approach for the next merge cycle. But for now, here's the patch to this
series that works.

I'm going to post it here, and then reply to it with comments about my
changes.

-- Steve

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 4db048250cdb..2718de1533e6 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -36,16 +36,36 @@ struct eventfs_file {
const struct file_operations *fop;
const struct inode_operations *iop;
union {
+ struct list_head del_list;
struct rcu_head rcu;
- struct llist_node llist; /* For freeing after RCU */
+ unsigned long is_freed; /* Freed if one of the above is set */
};
void *data;
umode_t mode;
- bool created;
+ unsigned int flags;
};

static DEFINE_MUTEX(eventfs_mutex);
DEFINE_STATIC_SRCU(eventfs_srcu);
+
+static struct dentry *eventfs_root_lookup(struct inode *dir,
+ struct dentry *dentry,
+ unsigned int flags);
+static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
+static int eventfs_release(struct inode *inode, struct file *file);
+
+static const struct inode_operations eventfs_root_dir_inode_operations = {
+ .lookup = eventfs_root_lookup,
+};
+
+static const struct file_operations eventfs_file_operations = {
+ .open = dcache_dir_open_wrapper,
+ .read = generic_read_dir,
+ .iterate_shared = dcache_readdir,
+ .llseek = generic_file_llseek,
+ .release = eventfs_release,
+};
+
/**
* create_file - create a file in the tracefs filesystem
* @name: the name of the file to create.
@@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
* If tracefs is not enabled in the kernel, the value -%ENODEV will be
* returned.
*/
-static struct dentry *create_dir(const char *name, umode_t mode,
- struct dentry *parent, void *data,
- const struct file_operations *fop,
- const struct inode_operations *iop)
+static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
{
struct tracefs_inode *ti;
struct dentry *dentry;
struct inode *inode;

- WARN_ON(!S_ISDIR(mode));
-
dentry = eventfs_start_creating(name, parent);
if (IS_ERR(dentry))
return dentry;
@@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
if (unlikely(!inode))
return eventfs_failed_creating(dentry);

- inode->i_mode = mode;
- inode->i_op = iop;
- inode->i_fop = fop;
+ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
+ inode->i_op = &eventfs_root_dir_inode_operations;
+ inode->i_fop = &eventfs_file_operations;
inode->i_private = data;

ti = get_tracefs(inode);
@@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
struct tracefs_inode *ti_parent;
struct eventfs_file *ef;

+ mutex_lock(&eventfs_mutex);
ti_parent = get_tracefs(dentry->d_parent->d_inode);
if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
- return;
+ goto out;

ef = dentry->d_fsdata;
if (!ef)
- return;
- ef->created = false;
+ goto out;
+ /*
+ * If ef was freed, then the LSB bit is set for d_fsdata.
+ * But this should not happen, as it should still have a
+ * ref count that prevents it. Warn in case it does.
+ */
+ if (WARN_ON_ONCE((unsigned long)ef & 1))
+ goto out;
+
+ dentry->d_fsdata = NULL;
+
ef->dentry = NULL;
+ out:
+ mutex_unlock(&eventfs_mutex);
}

/**
@@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
ti->private = ef->ei;
}

+static struct dentry *
+create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
+{
+ bool invalidate = false;
+ struct dentry *dentry;
+
+ mutex_lock(&eventfs_mutex);
+ if (ef->is_freed) {
+ mutex_unlock(&eventfs_mutex);
+ return NULL;
+ }
+ if (ef->dentry) {
+ dentry = ef->dentry;
+ /* On dir open, up the ref count */
+ if (!lookup)
+ dget(dentry);
+ mutex_unlock(&eventfs_mutex);
+ return dentry;
+ }
+ mutex_unlock(&eventfs_mutex);
+
+ if (!lookup)
+ inode_lock(parent->d_inode);
+
+ if (ef->ei)
+ dentry = create_dir(ef->name, parent, ef->data);
+ else
+ dentry = create_file(ef->name, ef->mode, parent,
+ ef->data, ef->fop);
+
+ if (!lookup)
+ inode_unlock(parent->d_inode);
+
+ mutex_lock(&eventfs_mutex);
+ if (IS_ERR_OR_NULL(dentry)) {
+ /* If the ef was already updated get it */
+ dentry = ef->dentry;
+ if (dentry && !lookup)
+ dget(dentry);
+ mutex_unlock(&eventfs_mutex);
+ return dentry;
+ }
+
+ if (!ef->dentry && !ef->is_freed) {
+ ef->dentry = dentry;
+ if (ef->ei)
+ eventfs_post_create_dir(ef);
+ dentry->d_fsdata = ef;
+ } else {
+ /* A race here, should try again (unless freed) */
+ invalidate = true;
+ }
+ mutex_unlock(&eventfs_mutex);
+ if (invalidate)
+ d_invalidate(dentry);
+
+ if (lookup || invalidate)
+ dput(dentry);
+
+ return invalidate ? NULL : dentry;
+}
+
+static bool match_event_file(struct eventfs_file *ef, const char *name)
+{
+ bool ret;
+
+ mutex_lock(&eventfs_mutex);
+ ret = !ef->is_freed && strcmp(ef->name, name) == 0;
+ mutex_unlock(&eventfs_mutex);
+
+ return ret;
+}
+
/**
* eventfs_root_lookup - lookup routine to create file/dir
* @dir: directory in which lookup to be done
@@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
* Used to create dynamic file/dir with-in @dir, search with-in ei
* list, if @dentry found go ahead and create the file/dir
*/
-
static struct dentry *eventfs_root_lookup(struct inode *dir,
struct dentry *dentry,
unsigned int flags)
@@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
idx = srcu_read_lock(&eventfs_srcu);
list_for_each_entry_srcu(ef, &ei->e_top_files, list,
srcu_read_lock_held(&eventfs_srcu)) {
- if (strcmp(ef->name, dentry->d_name.name))
+ if (!match_event_file(ef, dentry->d_name.name))
continue;
ret = simple_lookup(dir, dentry, flags);
- if (ef->created)
- continue;
- mutex_lock(&eventfs_mutex);
- ef->created = true;
- if (ef->ei)
- ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
- ef->data, ef->fop, ef->iop);
- else
- ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
- ef->data, ef->fop);
-
- if (IS_ERR_OR_NULL(ef->dentry)) {
- ef->created = false;
- mutex_unlock(&eventfs_mutex);
- } else {
- if (ef->ei)
- eventfs_post_create_dir(ef);
- ef->dentry->d_fsdata = ef;
- mutex_unlock(&eventfs_mutex);
- dput(ef->dentry);
- }
+ create_dentry(ef, ef->d_parent, true);
break;
}
srcu_read_unlock(&eventfs_srcu, idx);
@@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
struct tracefs_inode *ti;
struct eventfs_inode *ei;
struct eventfs_file *ef;
+ struct dentry *dentry;
int idx;

ti = get_tracefs(inode);
@@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
idx = srcu_read_lock(&eventfs_srcu);
list_for_each_entry_srcu(ef, &ei->e_top_files, list,
srcu_read_lock_held(&eventfs_srcu)) {
- if (ef->created)
- dput(ef->dentry);
+ mutex_lock(&eventfs_mutex);
+ dentry = ef->dentry;
+ mutex_unlock(&eventfs_mutex);
+ if (dentry)
+ dput(dentry);
}
srcu_read_unlock(&eventfs_srcu, idx);
return dcache_dir_close(inode, file);
@@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
ei = ti->private;
idx = srcu_read_lock(&eventfs_srcu);
list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
- if (ef->created) {
- dget(ef->dentry);
- continue;
- }
- mutex_lock(&eventfs_mutex);
- ef->created = true;
-
- inode_lock(dentry->d_inode);
- if (ef->ei)
- ef->dentry = create_dir(ef->name, ef->mode, dentry,
- ef->data, ef->fop, ef->iop);
- else
- ef->dentry = create_file(ef->name, ef->mode, dentry,
- ef->data, ef->fop);
- inode_unlock(dentry->d_inode);
-
- if (IS_ERR_OR_NULL(ef->dentry)) {
- ef->created = false;
- } else {
- if (ef->ei)
- eventfs_post_create_dir(ef);
- ef->dentry->d_fsdata = ef;
- }
- mutex_unlock(&eventfs_mutex);
+ create_dentry(ef, dentry, false);
}
srcu_read_unlock(&eventfs_srcu, idx);
return dcache_dir_open(inode, file);
}

-static const struct file_operations eventfs_file_operations = {
- .open = dcache_dir_open_wrapper,
- .read = generic_read_dir,
- .iterate_shared = dcache_readdir,
- .llseek = generic_file_llseek,
- .release = eventfs_release,
-};
-
-static const struct inode_operations eventfs_root_dir_inode_operations = {
- .lookup = eventfs_root_lookup,
-};
-
/**
* eventfs_prepare_ef - helper function to prepare eventfs_file
* @name: the name of the file/directory to create.
@@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
ti_parent = get_tracefs(parent->d_inode);
ei_parent = ti_parent->private;

- ef = eventfs_prepare_ef(name,
- S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
- &eventfs_file_operations,
- &eventfs_root_dir_inode_operations, NULL);
-
+ ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
if (IS_ERR(ef))
return ef;

@@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
if (!ef_parent)
return ERR_PTR(-EINVAL);

- ef = eventfs_prepare_ef(name,
- S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
- &eventfs_file_operations,
- &eventfs_root_dir_inode_operations, NULL);
-
+ ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
if (IS_ERR(ef))
return ef;

@@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
return 0;
}

-static LLIST_HEAD(free_list);
-
-static void eventfs_workfn(struct work_struct *work)
-{
- struct eventfs_file *ef, *tmp;
- struct llist_node *llnode;
-
- llnode = llist_del_all(&free_list);
- llist_for_each_entry_safe(ef, tmp, llnode, llist) {
- if (ef->created && ef->dentry)
- dput(ef->dentry);
- kfree(ef->name);
- kfree(ef->ei);
- kfree(ef);
- }
-}
-
-DECLARE_WORK(eventfs_work, eventfs_workfn);
-
static void free_ef(struct rcu_head *head)
{
struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);

- if (!llist_add(&ef->llist, &free_list))
- return;
-
- queue_work(system_unbound_wq, &eventfs_work);
+ kfree(ef->name);
+ kfree(ef->ei);
+ kfree(ef);
}

-
-
/**
* eventfs_remove_rec - remove eventfs dir or file from list
* @ef: eventfs_file to be removed.
@@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
* This function recursively remove eventfs_file which
* contains info of file or dir.
*/
-static void eventfs_remove_rec(struct eventfs_file *ef, int level)
+static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
{
struct eventfs_file *ef_child;

@@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
/* search for nested folders or files */
list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
lockdep_is_held(&eventfs_mutex)) {
- eventfs_remove_rec(ef_child, level + 1);
+ eventfs_remove_rec(ef_child, head, level + 1);
}
}

- if (ef->created && ef->dentry)
- d_invalidate(ef->dentry);
-
list_del_rcu(&ef->list);
- call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
+ list_add_tail(&ef->del_list, head);
}

/**
@@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
*/
void eventfs_remove(struct eventfs_file *ef)
{
+ struct eventfs_file *tmp;
+ LIST_HEAD(ef_del_list);
+ struct dentry *dentry_list = NULL;
+ struct dentry *dentry;
+
if (!ef)
return;

mutex_lock(&eventfs_mutex);
- eventfs_remove_rec(ef, 0);
+ eventfs_remove_rec(ef, &ef_del_list, 0);
+
+ list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
+ if (ef->dentry) {
+ unsigned long ptr = (unsigned long)dentry_list;
+
+ /* Keep the dentry from being freed yet */
+ dget(ef->dentry);
+
+ /*
+ * Paranoid: The dget() above should prevent the dentry
+ * from being freed and calling eventfs_set_ef_status_free().
+ * But just in case, set the link list LSB pointer to 1
+ * and have eventfs_set_ef_status_free() check that to
+ * make sure that if it does happen, it will not think
+ * the d_fsdata is an event_file.
+ *
+ * For this to work, no event_file should be allocated
+ * on a odd space, as the ef should always be allocated
+ * to be at least word aligned. Check for that too.
+ */
+ WARN_ON_ONCE(ptr & 1);
+
+ ef->dentry->d_fsdata = (void *)(ptr | 1);
+ dentry_list = ef->dentry;
+ ef->dentry = NULL;
+ }
+ call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
+ }
mutex_unlock(&eventfs_mutex);
+
+ while (dentry_list) {
+ unsigned long ptr;
+
+ dentry = dentry_list;
+ ptr = (unsigned long)dentry->d_fsdata & ~1UL;
+ dentry_list = (struct dentry *)ptr;
+ dentry->d_fsdata = NULL;
+ d_invalidate(dentry);
+ mutex_lock(&eventfs_mutex);
+ /* dentry should now have at least a single reference */
+ WARN_ONCE((int)d_count(dentry) < 1,
+ "dentry %px less than one reference (%d) after invalidate\n",
+ dentry, d_count(dentry));
+ mutex_unlock(&eventfs_mutex);
+ dput(dentry);
+ }
}

/**
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index c443a0c32a8c..1b880b5cd29d 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
struct dentry *tracefs_failed_creating(struct dentry *dentry);
struct inode *tracefs_get_inode(struct super_block *sb);

+void eventfs_set_ef_status_free(struct dentry *dentry);
+
#endif /* _TRACEFS_INTERNAL_H */
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 4d30b0cafc5f..47c1b4d21735 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);

void eventfs_remove_events_dir(struct dentry *dentry);

-void eventfs_set_ef_status_free(struct dentry *dentry);
-
struct dentry *tracefs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fops);

2023-07-21 17:28:40

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs


> On Jul 21, 2023, at 6:19 AM, Steven Rostedt <[email protected]> wrote:
>
>> union {
>> + struct list_head del_list;
>> struct rcu_head rcu;
>> - struct llist_node llist; /* For freeing after RCU */
>> + unsigned long is_freed; /* Freed if one of the above is set */
>
> I changed the freeing around. The dentries are freed before returning from
> eventfs_remove_dir().
>
> I also added a "is_freed" field that is part of the union and is set if
> list elements have content. Note, since the union was criticized before, I
> will state the entire purpose of doing this patch set is to save memory.
> This structure will be used for every event file. What's the point of
> getting rid of dentries if we are replacing it with something just as big?
> Anyway, struct dentry does the exact same thing!

Hey, don’t shoot me…

[And admittedly, I didn’t review the whole series after v1.]

I understand your position, but I think that at least is_freed should not
be in the union, and you can just put it after umode_t.

Even for the matter of size, it should not matter in most architectures
since umode_t is 16-bit, as natural alignment is at least 32-bits.

[ And “bool" is clearer type for is_freed. ]

2023-07-21 17:32:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs

On Fri, 21 Jul 2023 17:17:24 +0000
Nadav Amit <[email protected]> wrote:

> [ And “bool" is clearer type for is_freed. ]

Oh, and to answer why I didn't use bool. I want to make sure that it
doesn't use just part of the other elements in the union. If the compiler
decides to use one byte for the bool (which it is perfectly valid to do
so), and it happens to map over a zero of the other elements in the union,
then it will give a false negative.

By using unsigned long, it will be guaranteed to contain some content.

-- Steve

2023-07-21 17:42:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs

On Fri, 21 Jul 2023 17:17:24 +0000
Nadav Amit <[email protected]> wrote:


> > I also added a "is_freed" field that is part of the union and is set if
> > list elements have content. Note, since the union was criticized before, I
> > will state the entire purpose of doing this patch set is to save memory.
> > This structure will be used for every event file. What's the point of
> > getting rid of dentries if we are replacing it with something just as big?
> > Anyway, struct dentry does the exact same thing!
>
> Hey, don’t shoot me…

;-)

>
> [And admittedly, I didn’t review the whole series after v1.]
>
> I understand your position, but I think that at least is_freed should not
> be in the union, and you can just put it after umode_t.
>
> Even for the matter of size, it should not matter in most architectures
> since umode_t is 16-bit, as natural alignment is at least 32-bits.

My new code I'm working on removes the umode_t and will require this then.
I rather have that part tested before adding other drastic changes.

-- Steve


>
> [ And “bool" is clearer type for is_freed. ]
>


2023-07-21 19:45:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs

On Fri, 21 Jul 2023 09:18:43 -0400
Steven Rostedt <[email protected]> wrote:

> OK, I got this working and it appears to pass all my tests. I actually got
> it working Wednesday night, but I tried a different approach on Thursday
> that got rid of the evenfs_file and only used eventfs_inodes and makes the
> files more dynamic. There's still a couple of corner cases that are not
> working with this approach (the dentry counters are getting out of sync).
> This should not stop this from going in. I'll continue working on that
> approach for the next merge cycle. But for now, here's the patch to this
> series that works.

Just figured out my bug with my new design. It was in the eventfs_release()
code, where I have a loop that does the dput on the children, but I wasn't
dput(child), I was dput(parent) in that loop!!

Anyway, for this merge window I much rather have this code in, and for the
next merge window I'll add this update.

I can post the new design too, but first let's focus on this.

-- Steve

2023-07-21 21:08:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs

On Fri, 21 Jul 2023 09:19:47 -0400
Steven Rostedt <[email protected]> wrote:

> > + } else {
> > + /* A race here, should try again (unless freed) */
> > + invalidate = true;
>
> I had a WARN_ON() once here. Probably could add a:
>
> WARN_ON_ONCE(!ef->is_freed);

Yeah this should have a WARN_ON_ONCE() because the only way to get here
with having a dentry and the ef->dentry being set is if we have two
dentries with the same name in the same directory. Which should never
happen.

I think we can add:

/*
* Should never happen unless we get here due to being freed.
* Otherwise it means two dentries exist with the same name.
*/
WARN_ON_ONCE(!ef->is_freed);

>
> > + }


-- Steve

2023-07-26 19:55:32

by Ajay Kaher

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] tracing: introducing eventfs



> On 22-Jul-2023, at 2:10 AM, Steven Rostedt <[email protected]> wrote:
>
> !! External Email
>
> On Fri, 21 Jul 2023 09:19:47 -0400
> Steven Rostedt <[email protected]> wrote:
>
>>> + } else {
>>> + /* A race here, should try again (unless freed) */
>>> + invalidate = true;
>>
>> I had a WARN_ON() once here. Probably could add a:
>>
>> WARN_ON_ONCE(!ef->is_freed);
>
> Yeah this should have a WARN_ON_ONCE() because the only way to get here
> with having a dentry and the ef->dentry being set is if we have two
> dentries with the same name in the same directory. Which should never
> happen.
>
> I think we can add:
>
> /*
> * Should never happen unless we get here due to being freed.
> * Otherwise it means two dentries exist with the same name.
> */
> WARN_ON_ONCE(!ef->is_freed);

I missed WARN_ON_ONCE in v5 06/10, I will add in v6 06/10.

- Ajay