Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
Diff from v5:
Patch 02: removed TRACEFS_EVENT_INODE enum.
Patch 04: added TRACEFS_EVENT_INODE enum.
Patch 06: removed WARN_ON_ONCE in eventfs_set_ef_status_free()
Patch 07: added WARN_ON_ONCE in create_dentry()
moved declaration of following to internal.h:
eventfs_start_creating()
eventfs_failed_creating()
eventfs_end_creating()
Patch 08: added WARN_ON_ONCE in eventfs_set_ef_status_free()
Diff from v4:
Patch 02: moved from v4 08/10
added fs/tracefs/internal.h
Patch 03: moved from v4 02/10
removed fs/tracefs/internal.h
Patch 04: moved from v4 03/10
moved out changes of fs/tracefs/internal.h
Patch 05: moved from v4 04/10
renamed eventfs_add_top_file() -> eventfs_add_events_file()
Patch 06: moved from v4 07/10
implemented create_dentry() helper function
added create_file(), create_dir() stub function
Patch 07: moved from v4 06/10
Patch 08: moved from v4 05/10
improved eventfs remove functionality
Patch 09: removed unwanted if conditions
Patch 10: added available_filter_functions check
Diff from v3:
Patch 3,4,5,7,9:
removed all the eventfs_rwsem code and replaced it with an srcu
lock for the readers, and a mutex to synchronize the writers of
the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).
Patch 6: renamed:
eventfs_create_file() -> create_file()
eventfs_create_dir() -> create_dir()
Diff from v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9
Diff from v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 801 ++++++++++++++++++
fs/tracefs/inode.c | 151 +++-
fs/tracefs/internal.h | 29 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 23 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 76 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +-
.../test.d/kprobe/kprobe_args_string.tc | 9 +-
10 files changed, 1050 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.39.0
Create a kmem cache of tracefs_inodes. To be more efficient, as there are
lots of tracefs inodes, create its own cache. This also allows to see how
many tracefs inodes have been created.
Add helper functions:
tracefs_alloc_inode()
tracefs_free_inode()
get_tracefs()
Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
---
fs/tracefs/inode.c | 39 +++++++++++++++++++++++++++++++++++++++
fs/tracefs/internal.h | 15 +++++++++++++++
2 files changed, 54 insertions(+)
create mode 100644 fs/tracefs/internal.h
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 57ac8aa4a724..2508944cc4d8 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -21,13 +21,33 @@
#include <linux/parser.h>
#include <linux/magic.h>
#include <linux/slab.h>
+#include "internal.h"
#define TRACEFS_DEFAULT_MODE 0700
+static struct kmem_cache *tracefs_inode_cachep __ro_after_init;
static struct vfsmount *tracefs_mount;
static int tracefs_mount_count;
static bool tracefs_registered;
+static struct inode *tracefs_alloc_inode(struct super_block *sb)
+{
+ struct tracefs_inode *ti;
+
+ ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL);
+ if (!ti)
+ return NULL;
+
+ ti->flags = 0;
+
+ return &ti->vfs_inode;
+}
+
+static void tracefs_free_inode(struct inode *inode)
+{
+ kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
+}
+
static ssize_t default_read_file(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
@@ -346,6 +366,9 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root)
}
static const struct super_operations tracefs_super_operations = {
+ .alloc_inode = tracefs_alloc_inode,
+ .free_inode = tracefs_free_inode,
+ .drop_inode = generic_delete_inode,
.statfs = simple_statfs,
.remount_fs = tracefs_remount,
.show_options = tracefs_show_options,
@@ -628,10 +651,26 @@ bool tracefs_initialized(void)
return tracefs_registered;
}
+static void init_once(void *foo)
+{
+ struct tracefs_inode *ti = (struct tracefs_inode *) foo;
+
+ inode_init_once(&ti->vfs_inode);
+}
+
static int __init tracefs_init(void)
{
int retval;
+ tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache",
+ sizeof(struct tracefs_inode),
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD|
+ SLAB_ACCOUNT),
+ init_once);
+ if (!tracefs_inode_cachep)
+ return -ENOMEM;
+
retval = sysfs_create_mount_point(kernel_kobj, "tracing");
if (retval)
return -EINVAL;
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
new file mode 100644
index 000000000000..954ea005632b
--- /dev/null
+++ b/fs/tracefs/internal.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _TRACEFS_INTERNAL_H
+#define _TRACEFS_INTERNAL_H
+
+struct tracefs_inode {
+ unsigned long flags;
+ void *private;
+ struct inode vfs_inode;
+};
+
+static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
+{
+ return container_of(inode, struct tracefs_inode, vfs_inode);
+}
+#endif /* _TRACEFS_INTERNAL_H */
--
2.39.0
Add create_file() and create_dir() functions to create the files and
directories respectively when they are accessed. The functions will be
called from the lookup operation of the inode_operations or from the open
function of file_operations.
Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
---
fs/tracefs/event_inode.c | 61 +++++++++++++++++++++++++++++++--
fs/tracefs/inode.c | 74 ++++++++++++++++++++++++++++++++++++++++
fs/tracefs/internal.h | 3 ++
3 files changed, 136 insertions(+), 2 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 24d645c61029..5240bd2c81e7 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -101,7 +101,34 @@ static struct dentry *create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fop)
{
- return NULL;
+ struct tracefs_inode *ti;
+ struct dentry *dentry;
+ struct inode *inode;
+
+ if (!(mode & S_IFMT))
+ mode |= S_IFREG;
+
+ if (WARN_ON_ONCE(!S_ISREG(mode)))
+ return NULL;
+
+ dentry = eventfs_start_creating(name, parent);
+
+ if (IS_ERR(dentry))
+ return dentry;
+
+ inode = tracefs_get_inode(dentry->d_sb);
+ if (unlikely(!inode))
+ return eventfs_failed_creating(dentry);
+
+ inode->i_mode = mode;
+ inode->i_fop = fop;
+ inode->i_private = data;
+
+ ti = get_tracefs(inode);
+ ti->flags |= TRACEFS_EVENT_INODE;
+ d_instantiate(dentry, inode);
+ fsnotify_create(dentry->d_parent->d_inode, dentry);
+ return eventfs_end_creating(dentry);
};
/**
@@ -123,7 +150,31 @@ static struct dentry *create_file(const char *name, umode_t mode,
*/
static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
{
- return NULL;
+ struct tracefs_inode *ti;
+ struct dentry *dentry;
+ struct inode *inode;
+
+ dentry = eventfs_start_creating(name, parent);
+ if (IS_ERR(dentry))
+ return dentry;
+
+ inode = tracefs_get_inode(dentry->d_sb);
+ if (unlikely(!inode))
+ return eventfs_failed_creating(dentry);
+
+ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
+ inode->i_op = &eventfs_root_dir_inode_operations;
+ inode->i_fop = &eventfs_file_operations;
+ inode->i_private = data;
+
+ ti = get_tracefs(inode);
+ ti->flags |= TRACEFS_EVENT_INODE;
+
+ inc_nlink(inode);
+ d_instantiate(dentry, inode);
+ inc_nlink(dentry->d_parent->d_inode);
+ fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
+ return eventfs_end_creating(dentry);
}
/**
@@ -234,6 +285,12 @@ create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
} else {
/* A race here, should try again (unless freed) */
invalidate = true;
+
+ /*
+ * Should never happen unless we get here due to being freed.
+ * Otherwise it means two dentries exist with the same name.
+ */
+ WARN_ON_ONCE(!ef->is_freed);
}
mutex_unlock(&eventfs_mutex);
if (invalidate)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 4acc4b4dfd22..d9273066f25f 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -474,6 +474,80 @@ struct dentry *tracefs_end_creating(struct dentry *dentry)
return dentry;
}
+/**
+ * eventfs_start_creating - start the process of creating a dentry
+ * @name: Name of the file created for the dentry
+ * @parent: The parent dentry where this dentry will be created
+ *
+ * This is a simple helper function for the dynamically created eventfs
+ * files. When the directory of the eventfs files are accessed, their
+ * dentries are created on the fly. This function is used to start that
+ * process.
+ */
+struct dentry *eventfs_start_creating(const char *name, struct dentry *parent)
+{
+ struct dentry *dentry;
+ int error;
+
+ error = simple_pin_fs(&trace_fs_type, &tracefs_mount,
+ &tracefs_mount_count);
+ if (error)
+ return ERR_PTR(error);
+
+ /*
+ * If the parent is not specified, we create it in the root.
+ * We need the root dentry to do this, which is in the super
+ * block. A pointer to that is in the struct vfsmount that we
+ * have around.
+ */
+ if (!parent)
+ parent = tracefs_mount->mnt_root;
+
+ if (unlikely(IS_DEADDIR(parent->d_inode)))
+ dentry = ERR_PTR(-ENOENT);
+ else
+ dentry = lookup_one_len(name, parent, strlen(name));
+
+ if (!IS_ERR(dentry) && dentry->d_inode) {
+ dput(dentry);
+ dentry = ERR_PTR(-EEXIST);
+ }
+
+ if (IS_ERR(dentry))
+ simple_release_fs(&tracefs_mount, &tracefs_mount_count);
+
+ return dentry;
+}
+
+/**
+ * eventfs_failed_creating - clean up a failed eventfs dentry creation
+ * @dentry: The dentry to clean up
+ *
+ * If after calling eventfs_start_creating(), a failure is detected, the
+ * resources created by eventfs_start_creating() needs to be cleaned up. In
+ * that case, this function should be called to perform that clean up.
+ */
+struct dentry *eventfs_failed_creating(struct dentry *dentry)
+{
+ dput(dentry);
+ simple_release_fs(&tracefs_mount, &tracefs_mount_count);
+ return NULL;
+}
+
+/**
+ * eventfs_end_creating - Finish the process of creating a eventfs dentry
+ * @dentry: The dentry that has successfully been created.
+ *
+ * This function is currently just a place holder to match
+ * eventfs_start_creating(). In case any synchronization needs to be added,
+ * this function will be used to implement that without having to modify
+ * the callers of eventfs_start_creating().
+ */
+struct dentry *eventfs_end_creating(struct dentry *dentry)
+{
+ return dentry;
+}
+
/**
* tracefs_create_file - create a file in the tracefs filesystem
* @name: a pointer to a string containing the name of the file to create.
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 9bfad9d95a4a..69c2b1d87c46 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -21,6 +21,9 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
struct dentry *tracefs_end_creating(struct dentry *dentry);
struct dentry *tracefs_failed_creating(struct dentry *dentry);
struct inode *tracefs_get_inode(struct super_block *sb);
+struct dentry *eventfs_start_creating(const char *name, struct dentry *parent);
+struct dentry *eventfs_failed_creating(struct dentry *dentry);
+struct dentry *eventfs_end_creating(struct dentry *dentry);
void eventfs_set_ef_status_free(struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */
--
2.39.0
Add the following functions to add files to evenfs:
eventfs_add_events_file() to add the data needed to create a specific file
located at the top level events directory. The dentry/inode will be
created when the events directory is scanned.
eventfs_add_file() to add the data needed for files within the directories
below the top level events directory. The dentry/inode of the file will be
created when the directory that the file is in is scanned.
Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
---
fs/tracefs/event_inode.c | 86 ++++++++++++++++++++++++++++++++++++++++
include/linux/tracefs.h | 8 ++++
2 files changed, 94 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 8f334b122e46..9e4843be9dc9 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -209,3 +209,89 @@ struct eventfs_file *eventfs_add_dir(const char *name,
mutex_unlock(&eventfs_mutex);
return ef;
}
+
+/**
+ * eventfs_add_events_file - add the data needed to create a file for later reference
+ * @name: the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: parent dentry for this file.
+ * @data: something that the caller will want to get to later on.
+ * @fop: struct file_operations that should be used for this file.
+ *
+ * This function is used to add the information needed to create a
+ * dentry/inode within the top level events directory. The file created
+ * will have the @mode permissions. The @data will be used to fill the
+ * inode.i_private when the open() call is done. The dentry and inodes are
+ * all created when they are referenced, and removed when they are no
+ * longer referenced.
+ */
+int eventfs_add_events_file(const char *name, umode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fop)
+{
+ struct tracefs_inode *ti;
+ struct eventfs_inode *ei;
+ struct eventfs_file *ef;
+
+ if (!parent)
+ return -EINVAL;
+
+ if (!(mode & S_IFMT))
+ mode |= S_IFREG;
+
+ if (!parent->d_inode)
+ return -EINVAL;
+
+ ti = get_tracefs(parent->d_inode);
+ if (!(ti->flags & TRACEFS_EVENT_INODE))
+ return -EINVAL;
+
+ ei = ti->private;
+ ef = eventfs_prepare_ef(name, mode, fop, NULL, data);
+
+ if (IS_ERR(ef))
+ return -ENOMEM;
+
+ mutex_lock(&eventfs_mutex);
+ list_add_tail(&ef->list, &ei->e_top_files);
+ mutex_unlock(&eventfs_mutex);
+ return 0;
+}
+
+/**
+ * eventfs_add_file - add eventfs file to list to create later
+ * @name: the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @ef_parent: parent eventfs_file for this file.
+ * @data: something that the caller will want to get to later on.
+ * @fop: struct file_operations that should be used for this file.
+ *
+ * This function is used to add the information needed to create a
+ * file within a subdirectory of the events directory. The file created
+ * will have the @mode permissions. The @data will be used to fill the
+ * inode.i_private when the open() call is done. The dentry and inodes are
+ * all created when they are referenced, and removed when they are no
+ * longer referenced.
+ */
+int eventfs_add_file(const char *name, umode_t mode,
+ struct eventfs_file *ef_parent,
+ void *data,
+ const struct file_operations *fop)
+{
+ struct eventfs_file *ef;
+
+ if (!ef_parent)
+ return -EINVAL;
+
+ if (!(mode & S_IFMT))
+ mode |= S_IFREG;
+
+ ef = eventfs_prepare_ef(name, mode, fop, NULL, data);
+ if (IS_ERR(ef))
+ return -ENOMEM;
+
+ mutex_lock(&eventfs_mutex);
+ list_add_tail(&ef->list, &ef_parent->ei->e_top_files);
+ mutex_unlock(&eventfs_mutex);
+ return 0;
+}
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 432e5e6f7901..54c9cbd0389b 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -32,6 +32,14 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
struct eventfs_file *eventfs_add_dir(const char *name,
struct eventfs_file *ef_parent);
+int eventfs_add_file(const char *name, umode_t mode,
+ struct eventfs_file *ef_parent, void *data,
+ const struct file_operations *fops);
+
+int eventfs_add_events_file(const char *name, umode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops);
+
struct dentry *tracefs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fops);
--
2.39.0
kprobe_args_char.tc, kprobe_args_string.tc has validation check
for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
---
.../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +++++++--
.../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc | 9 +++++++--
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
index 285b4770efad..ff7499eb98d6 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
@@ -34,14 +34,19 @@ mips*)
esac
: "Test get argument (1)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events
+if grep -q eventfs_add_dir available_filter_functions; then
+ DIR_NAME="eventfs_add_dir"
+else
+ DIR_NAME="tracefs_create_dir"
+fi
+echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char" > kprobe_events
echo 1 > events/kprobes/testprobe/enable
echo "p:test $FUNCTION_FORK" >> kprobe_events
grep -qe "testprobe.* arg1='t'" trace
echo 0 > events/kprobes/testprobe/enable
: "Test get argument (2)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
+echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
echo 1 > events/kprobes/testprobe/enable
echo "p:test $FUNCTION_FORK" >> kprobe_events
grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
index a4f8e7c53c1f..a202b2ea4baf 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
@@ -37,14 +37,19 @@ loongarch*)
esac
: "Test get argument (1)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
+if grep -q eventfs_add_dir available_filter_functions; then
+ DIR_NAME="eventfs_add_dir"
+else
+ DIR_NAME="tracefs_create_dir"
+fi
+echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string" > kprobe_events
echo 1 > events/kprobes/testprobe/enable
echo "p:test $FUNCTION_FORK" >> kprobe_events
grep -qe "testprobe.* arg1=\"test\"" trace
echo 0 > events/kprobes/testprobe/enable
: "Test get argument (2)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
+echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
echo 1 > events/kprobes/testprobe/enable
echo "p:test $FUNCTION_FORK" >> kprobe_events
grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace
--
2.39.0
Add eventfs_file structure which will hold the properties of the eventfs
files and directories.
Add following functions to create the directories in eventfs:
eventfs_create_events_dir() will create the top level "events" directory
within the tracefs file system.
eventfs_add_subsystem_dir() creates an eventfs_file descriptor with the
given name of the subsystem.
eventfs_add_dir() creates an eventfs_file descriptor with the given name of
the directory and attached to a eventfs_file of a subsystem.
Add tracefs_inode structure to hold the inodes, flags and pointers to
private data used by eventfs.
Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
---
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 211 +++++++++++++++++++++++++++++++++++++++
fs/tracefs/internal.h | 4 +
include/linux/tracefs.h | 11 ++
4 files changed, 227 insertions(+)
create mode 100644 fs/tracefs/event_inode.c
diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile
index 7c35a282b484..73c56da8e284 100644
--- a/fs/tracefs/Makefile
+++ b/fs/tracefs/Makefile
@@ -1,5 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
tracefs-objs := inode.o
+tracefs-objs += event_inode.o
obj-$(CONFIG_TRACING) += tracefs.o
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
new file mode 100644
index 000000000000..8f334b122e46
--- /dev/null
+++ b/fs/tracefs/event_inode.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * event_inode.c - part of tracefs, a pseudo file system for activating tracing
+ *
+ * Copyright (C) 2020-23 VMware Inc, author: Steven Rostedt (VMware) <[email protected]>
+ * Copyright (C) 2020-23 VMware Inc, author: Ajay Kaher <[email protected]>
+ *
+ * eventfs is used to dynamically create inodes and dentries based on the
+ * meta data provided by the tracing system.
+ *
+ * eventfs stores the meta-data of files/dirs and holds off on creating
+ * inodes/dentries of the files. When accessed, the eventfs will create the
+ * inodes/dentries in a just-in-time (JIT) manner. The eventfs will clean up
+ * and delete the inodes/dentries when they are no longer referenced.
+ */
+#include <linux/fsnotify.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/workqueue.h>
+#include <linux/security.h>
+#include <linux/tracefs.h>
+#include <linux/kref.h>
+#include <linux/delay.h>
+#include "internal.h"
+
+struct eventfs_inode {
+ struct list_head e_top_files;
+};
+
+/**
+ * struct eventfs_file - hold the properties of the eventfs files and
+ * directories.
+ * @name: the name of the file or directory to create
+ * @list: file or directory to be added to parent directory
+ * @ei: list of files and directories within directory
+ * @fop: file_operations for file or directory
+ * @iop: inode_operations for file or directory
+ * @data: something that the caller will want to get to later on
+ * @mode: the permission that the file or directory should have
+ */
+struct eventfs_file {
+ const char *name;
+ struct list_head list;
+ struct eventfs_inode *ei;
+ const struct file_operations *fop;
+ const struct inode_operations *iop;
+ void *data;
+ umode_t mode;
+};
+
+static DEFINE_MUTEX(eventfs_mutex);
+
+static const struct inode_operations eventfs_root_dir_inode_operations = {
+};
+
+static const struct file_operations eventfs_file_operations = {
+};
+
+/**
+ * eventfs_prepare_ef - helper function to prepare eventfs_file
+ * @name: the name of the file/directory to create.
+ * @mode: the permission that the file should have.
+ * @fop: struct file_operations that should be used for this file/directory.
+ * @iop: struct inode_operations that should be used for this file/directory.
+ * @data: something that the caller will want to get to later on. The
+ * inode.i_private pointer will point to this value on the open() call.
+ *
+ * This function allocates and fills the eventfs_file structure.
+ */
+static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode,
+ const struct file_operations *fop,
+ const struct inode_operations *iop,
+ void *data)
+{
+ struct eventfs_file *ef;
+
+ ef = kzalloc(sizeof(*ef), GFP_KERNEL);
+ if (!ef)
+ return ERR_PTR(-ENOMEM);
+
+ ef->name = kstrdup(name, GFP_KERNEL);
+ if (!ef->name) {
+ kfree(ef);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ if (S_ISDIR(mode)) {
+ ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL);
+ if (!ef->ei) {
+ kfree(ef->name);
+ kfree(ef);
+ return ERR_PTR(-ENOMEM);
+ }
+ INIT_LIST_HEAD(&ef->ei->e_top_files);
+ } else {
+ ef->ei = NULL;
+ }
+
+ ef->iop = iop;
+ ef->fop = fop;
+ ef->mode = mode;
+ ef->data = data;
+ return ef;
+}
+
+/**
+ * eventfs_create_events_dir - create the trace event structure
+ * @name: the name of the directory to create.
+ * @parent: parent dentry for this file. This should be a directory dentry
+ * if set. If this parameter is NULL, then the directory will be
+ * created in the root of the tracefs filesystem.
+ *
+ * This function creates the top of the trace event directory.
+ */
+struct dentry *eventfs_create_events_dir(const char *name,
+ struct dentry *parent)
+{
+ struct dentry *dentry = tracefs_start_creating(name, parent);
+ struct eventfs_inode *ei;
+ struct tracefs_inode *ti;
+ struct inode *inode;
+
+ if (IS_ERR(dentry))
+ return dentry;
+
+ ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+ if (!ei)
+ return ERR_PTR(-ENOMEM);
+ inode = tracefs_get_inode(dentry->d_sb);
+ if (unlikely(!inode)) {
+ kfree(ei);
+ tracefs_failed_creating(dentry);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ INIT_LIST_HEAD(&ei->e_top_files);
+
+ ti = get_tracefs(inode);
+ ti->flags |= TRACEFS_EVENT_INODE;
+ ti->private = ei;
+
+ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
+ inode->i_op = &eventfs_root_dir_inode_operations;
+ inode->i_fop = &eventfs_file_operations;
+
+ /* directory inodes start off with i_nlink == 2 (for "." entry) */
+ inc_nlink(inode);
+ d_instantiate(dentry, inode);
+ inc_nlink(dentry->d_parent->d_inode);
+ fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
+ return tracefs_end_creating(dentry);
+}
+
+/**
+ * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later
+ * @name: the name of the file to create.
+ * @parent: parent dentry for this dir.
+ *
+ * This function adds eventfs subsystem dir to list.
+ * And all these dirs are created on the fly when they are looked up,
+ * and the dentry and inodes will be removed when they are done.
+ */
+struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
+ struct dentry *parent)
+{
+ struct tracefs_inode *ti_parent;
+ struct eventfs_inode *ei_parent;
+ struct eventfs_file *ef;
+
+ if (!parent)
+ return ERR_PTR(-EINVAL);
+
+ ti_parent = get_tracefs(parent->d_inode);
+ ei_parent = ti_parent->private;
+
+ ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
+ if (IS_ERR(ef))
+ return ef;
+
+ mutex_lock(&eventfs_mutex);
+ list_add_tail(&ef->list, &ei_parent->e_top_files);
+ mutex_unlock(&eventfs_mutex);
+ return ef;
+}
+
+/**
+ * eventfs_add_dir - add eventfs dir to list to create later
+ * @name: the name of the file to create.
+ * @ef_parent: parent eventfs_file for this dir.
+ *
+ * This function adds eventfs dir to list.
+ * And all these dirs are created on the fly when they are looked up,
+ * and the dentry and inodes will be removed when they are done.
+ */
+struct eventfs_file *eventfs_add_dir(const char *name,
+ struct eventfs_file *ef_parent)
+{
+ struct eventfs_file *ef;
+
+ if (!ef_parent)
+ return ERR_PTR(-EINVAL);
+
+ ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
+ if (IS_ERR(ef))
+ return ef;
+
+ mutex_lock(&eventfs_mutex);
+ list_add_tail(&ef->list, &ef_parent->ei->e_top_files);
+ mutex_unlock(&eventfs_mutex);
+ return ef;
+}
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 7dfb7ebc1c3f..f0fd565d59ec 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -2,6 +2,10 @@
#ifndef _TRACEFS_INTERNAL_H
#define _TRACEFS_INTERNAL_H
+enum {
+ TRACEFS_EVENT_INODE = BIT(1),
+};
+
struct tracefs_inode {
unsigned long flags;
void *private;
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 99912445974c..432e5e6f7901 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -21,6 +21,17 @@ struct file_operations;
#ifdef CONFIG_TRACING
+struct eventfs_file;
+
+struct dentry *eventfs_create_events_dir(const char *name,
+ struct dentry *parent);
+
+struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
+ struct dentry *parent);
+
+struct eventfs_file *eventfs_add_dir(const char *name,
+ struct eventfs_file *ef_parent);
+
struct dentry *tracefs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *fops);
--
2.39.0
Up until now, /sys/kernel/tracing/events was no different than any other
part of tracefs. The files and directories within the events directory was
created when the tracefs was mounted, and also created for the instances in
/sys/kernel/tracing/instances/<instance>/events. Most of these files and
directories will never be referenced. Since there are thousands of these
files and directories they spend their time wasting precious memory
resources.
Move the "events" directory to the new eventfs. The eventfs will take the
meta data of the events that they represent and store that. When the files
in the events directory are referenced, the dentry and inodes to represent
them are then created. When the files are no longer referenced, they are
freed. This saves the precious memory resources that were wasted on these
seldom referenced dentries and inodes.
Signed-off-by: Ajay Kaher <[email protected]>
Co-developed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ching-lin Yu <[email protected]>
---
fs/tracefs/inode.c | 18 ++++++++++
include/linux/trace_events.h | 1 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 65 ++++++++++++++++++------------------
4 files changed, 53 insertions(+), 33 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index d9273066f25f..bb6de89eb446 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -374,6 +374,23 @@ static const struct super_operations tracefs_super_operations = {
.show_options = tracefs_show_options,
};
+static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode)
+{
+ struct tracefs_inode *ti;
+
+ if (!dentry || !inode)
+ return;
+
+ ti = get_tracefs(inode);
+ if (ti && ti->flags & TRACEFS_EVENT_INODE)
+ eventfs_set_ef_status_free(dentry);
+ iput(inode);
+}
+
+static const struct dentry_operations tracefs_dentry_operations = {
+ .d_iput = tracefs_dentry_iput,
+};
+
static int trace_fill_super(struct super_block *sb, void *data, int silent)
{
static const struct tree_descr trace_files[] = {{""}};
@@ -396,6 +413,7 @@ static int trace_fill_super(struct super_block *sb, void *data, int silent)
goto fail;
sb->s_op = &tracefs_super_operations;
+ sb->s_d_op = &tracefs_dentry_operations;
tracefs_apply_options(sb, false);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 3930e676436c..c17623c78029 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -638,6 +638,7 @@ struct trace_event_file {
struct list_head list;
struct trace_event_call *event_call;
struct event_filter __rcu *filter;
+ struct eventfs_file *ef;
struct dentry *dir;
struct trace_array *tr;
struct trace_subsystem_dir *system;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e1edc2197fc8..956938357774 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1324,7 +1324,7 @@ struct trace_subsystem_dir {
struct list_head list;
struct event_subsystem *subsystem;
struct trace_array *tr;
- struct dentry *entry;
+ struct eventfs_file *ef;
int ref_count;
int nr_events;
};
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index a284171d5c74..85f9a99bb506 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -990,7 +990,7 @@ static void remove_subsystem(struct trace_subsystem_dir *dir)
return;
if (!--dir->nr_events) {
- tracefs_remove(dir->entry);
+ eventfs_remove(dir->ef);
list_del(&dir->list);
__put_system_dir(dir);
}
@@ -1011,7 +1011,7 @@ static void remove_event_file_dir(struct trace_event_file *file)
tracefs_remove(dir);
}
-
+ eventfs_remove(file->ef);
list_del(&file->list);
remove_subsystem(file->system);
free_event_filter(file->filter);
@@ -2297,13 +2297,13 @@ create_new_subsystem(const char *name)
return NULL;
}
-static struct dentry *
+static struct eventfs_file *
event_subsystem_dir(struct trace_array *tr, const char *name,
struct trace_event_file *file, struct dentry *parent)
{
struct event_subsystem *system, *iter;
struct trace_subsystem_dir *dir;
- struct dentry *entry;
+ int res;
/* First see if we did not already create this dir */
list_for_each_entry(dir, &tr->systems, list) {
@@ -2311,7 +2311,7 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
if (strcmp(system->name, name) == 0) {
dir->nr_events++;
file->system = dir;
- return dir->entry;
+ return dir->ef;
}
}
@@ -2335,8 +2335,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
} else
__get_system(system);
- dir->entry = tracefs_create_dir(name, parent);
- if (!dir->entry) {
+ dir->ef = eventfs_add_subsystem_dir(name, parent);
+ if (IS_ERR(dir->ef)) {
pr_warn("Failed to create system directory %s\n", name);
__put_system(system);
goto out_free;
@@ -2351,22 +2351,22 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
/* the ftrace system is special, do not create enable or filter files */
if (strcmp(name, "ftrace") != 0) {
- entry = tracefs_create_file("filter", TRACE_MODE_WRITE,
- dir->entry, dir,
+ res = eventfs_add_file("filter", TRACE_MODE_WRITE,
+ dir->ef, dir,
&ftrace_subsystem_filter_fops);
- if (!entry) {
+ if (res) {
kfree(system->filter);
system->filter = NULL;
pr_warn("Could not create tracefs '%s/filter' entry\n", name);
}
- trace_create_file("enable", TRACE_MODE_WRITE, dir->entry, dir,
+ eventfs_add_file("enable", TRACE_MODE_WRITE, dir->ef, dir,
&ftrace_system_enable_fops);
}
list_add(&dir->list, &tr->systems);
- return dir->entry;
+ return dir->ef;
out_free:
kfree(dir);
@@ -2419,8 +2419,8 @@ static int
event_create_dir(struct dentry *parent, struct trace_event_file *file)
{
struct trace_event_call *call = file->event_call;
+ struct eventfs_file *ef_subsystem = NULL;
struct trace_array *tr = file->tr;
- struct dentry *d_events;
const char *name;
int ret;
@@ -2432,24 +2432,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0))
return -ENODEV;
- d_events = event_subsystem_dir(tr, call->class->system, file, parent);
- if (!d_events)
+ ef_subsystem = event_subsystem_dir(tr, call->class->system, file, parent);
+ if (!ef_subsystem)
return -ENOMEM;
name = trace_event_name(call);
- file->dir = tracefs_create_dir(name, d_events);
- if (!file->dir) {
+ file->ef = eventfs_add_dir(name, ef_subsystem);
+ if (IS_ERR(file->ef)) {
pr_warn("Could not create tracefs '%s' directory\n", name);
return -1;
}
if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE))
- trace_create_file("enable", TRACE_MODE_WRITE, file->dir, file,
+ eventfs_add_file("enable", TRACE_MODE_WRITE, file->ef, file,
&ftrace_enable_fops);
#ifdef CONFIG_PERF_EVENTS
if (call->event.type && call->class->reg)
- trace_create_file("id", TRACE_MODE_READ, file->dir,
+ eventfs_add_file("id", TRACE_MODE_READ, file->ef,
(void *)(long)call->event.type,
&ftrace_event_id_fops);
#endif
@@ -2465,27 +2465,27 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
* triggers or filters.
*/
if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) {
- trace_create_file("filter", TRACE_MODE_WRITE, file->dir,
+ eventfs_add_file("filter", TRACE_MODE_WRITE, file->ef,
file, &ftrace_event_filter_fops);
- trace_create_file("trigger", TRACE_MODE_WRITE, file->dir,
+ eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef,
file, &event_trigger_fops);
}
#ifdef CONFIG_HIST_TRIGGERS
- trace_create_file("hist", TRACE_MODE_READ, file->dir, file,
+ eventfs_add_file("hist", TRACE_MODE_READ, file->ef, file,
&event_hist_fops);
#endif
#ifdef CONFIG_HIST_TRIGGERS_DEBUG
- trace_create_file("hist_debug", TRACE_MODE_READ, file->dir, file,
+ eventfs_add_file("hist_debug", TRACE_MODE_READ, file->ef, file,
&event_hist_debug_fops);
#endif
- trace_create_file("format", TRACE_MODE_READ, file->dir, call,
+ eventfs_add_file("format", TRACE_MODE_READ, file->ef, call,
&ftrace_event_format_fops);
#ifdef CONFIG_TRACE_EVENT_INJECT
if (call->event.type && call->class->reg)
- trace_create_file("inject", 0200, file->dir, file,
+ eventfs_add_file("inject", 0200, file->ef, file,
&event_inject_fops);
#endif
@@ -3638,21 +3638,22 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
{
struct dentry *d_events;
struct dentry *entry;
+ int error = 0;
entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent,
tr, &ftrace_set_event_fops);
if (!entry)
return -ENOMEM;
- d_events = tracefs_create_dir("events", parent);
- if (!d_events) {
+ d_events = eventfs_create_events_dir("events", parent);
+ if (IS_ERR(d_events)) {
pr_warn("Could not create tracefs 'events' directory\n");
return -ENOMEM;
}
- entry = trace_create_file("enable", TRACE_MODE_WRITE, d_events,
+ error = eventfs_add_events_file("enable", TRACE_MODE_WRITE, d_events,
tr, &ftrace_tr_enable_fops);
- if (!entry)
+ if (error)
return -ENOMEM;
/* There are not as crucial, just warn if they are not created */
@@ -3665,11 +3666,11 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
&ftrace_set_event_notrace_pid_fops);
/* ring buffer internal formats */
- trace_create_file("header_page", TRACE_MODE_READ, d_events,
+ eventfs_add_events_file("header_page", TRACE_MODE_READ, d_events,
ring_buffer_print_page_header,
&ftrace_show_header_fops);
- trace_create_file("header_event", TRACE_MODE_READ, d_events,
+ eventfs_add_events_file("header_event", TRACE_MODE_READ, d_events,
ring_buffer_print_entry_header,
&ftrace_show_header_fops);
@@ -3757,7 +3758,7 @@ int event_trace_del_tracer(struct trace_array *tr)
down_write(&trace_event_sem);
__trace_remove_event_dirs(tr);
- tracefs_remove(tr->event_dir);
+ eventfs_remove_events_dir(tr->event_dir);
up_write(&trace_event_sem);
tr->event_dir = NULL;
--
2.39.0