2015-12-08 09:51:31

by Roman Peniaev

[permalink] [raw]
Subject: [RFC PATCH 0/3] debugfs: implement 'debugfs_create_dir_with_tmpfiles()'

Hello.

Here is an attempt to solve annoying race, which exists between two operations
on debugfs entries: write (setting a request) and read (reading a response).

E.g. let's assume that we have some storage device, which can have thousands
of snapshots (yeah, plenty of them, thus it is ridicoulous to create a debugfs
entry for each), and each snapshot is controlled by the handle, which is a UUID
or any non-numeric character sequence (for numeric sequence this problem can be
solved by 'seek' operation). This device provides a debugfs entry 'snap_status',
which can be opened for reading and writing, where write - is an operation for
specifiying a request, and read - is an operation for getting a response back.

I.e. it is obvious, that to request a status of a snapshot you have to write a
UUID first of a snapshot and then read back a status response back, so the
sequence can be the following:

# echo $UUID > /sys/kernel/debug/storage/snap_status
# cat /sys/kernel/debug/storage/snap_status

Between those two operations a race exists, and if someone else comes and
requests status for another snapshot, the first requester will get incorrect
data.

An atomic request-set and response-read solution can be the following:

# cat /sys/kernel/debug/storage/snap_status/$UUID

Here debugfs creates non-existent temporary entry on demand with the $UUID
name and eventually calls file operations, which were passed to the
'debugfs_create_dir_with_tmpfiles()' function. Caller of that function can
control the correctness of the file name in 'i_fop->open' callback and can
return an error if temporary file name does not match some format.

Temporary file, which is created, will not appear in any lookups, further
linking is forbidden, corresponding dentry and inode will be freed when last
file descriptor is closed (see O_TMPFILE, with the only difference is that
debugfs temporary dentry has a name).

Of course this file creation on demand can be applied to many other cases,
where it is impossible to create as many debugfs entries as objects exist,
but atomicity of read-write can be required.

This atomicity can be achieved also by locking from userspace, but that approach
increases complexity and makes it hardly possible to invoke only few commands
from command line, like 'echo' or 'cat'.

So basically creating a temporary file on demand with a specified name is a
way to provide one additional parameter for an 'read' operation.

Probably, there is more elegant solution for that write-read race problem,
but I've not found any.

PS. I did not want to use configfs, because I have nothing to configure (what
I have described is not a configuration issue), and I do not like to keep
dentries in a system if userspace forgets to remove them.

Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]

Roman Pen (3):
debugfs: make create directory logic more generic
debugfs: implement 'debugfs_create_dir_with_tmpfiles()'
debugfs: update some bits of documentation

Documentation/filesystems/debugfs.txt | 25 ++++++
fs/debugfs/inode.c | 157 ++++++++++++++++++++++++++++++----
include/linux/debugfs.h | 12 +++
3 files changed, 179 insertions(+), 15 deletions(-)

--
2.6.2


2015-12-08 09:51:54

by Roman Peniaev

[permalink] [raw]
Subject: [RFC PATCH 1/3] debugfs: make create directory logic more generic

Now __create_dir() accepts inode operations and private data.
I will use this generic call in next path to create directory
with temporary files.

Signed-off-by: Roman Pen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
---
fs/debugfs/inode.c | 48 ++++++++++++++++++++++++++++--------------------
1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index b7fcc0d..f1c4f80 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -388,25 +388,9 @@ struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
}
EXPORT_SYMBOL_GPL(debugfs_create_file_size);

-/**
- * debugfs_create_dir - create a directory in the debugfs filesystem
- * @name: a pointer to a string containing the name of the directory to
- * create.
- * @parent: a pointer to the parent dentry for this file. This should be a
- * directory dentry if set. If this parameter is NULL, then the
- * directory will be created in the root of the debugfs filesystem.
- *
- * This function creates a directory in debugfs with the given name.
- *
- * This function will return a pointer to a dentry if it succeeds. This
- * pointer must be passed to the debugfs_remove() function when the file is
- * to be removed (no automatic cleanup happens if your module is unloaded,
- * you are responsible here.) If an error occurs, %NULL will be returned.
- *
- * If debugfs is not enabled in the kernel, the value -%ENODEV will be
- * returned.
- */
-struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
+static struct dentry *__create_dir(const char *name,
+ struct dentry *parent, void *data,
+ const struct inode_operations *i_op)
{
struct dentry *dentry = start_creating(name, parent);
struct inode *inode;
@@ -419,7 +403,8 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
return failed_creating(dentry);

inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
- inode->i_op = &simple_dir_inode_operations;
+ inode->i_op = i_op;
+ inode->i_private = data;
inode->i_fop = &simple_dir_operations;

/* directory inodes start off with i_nlink == 2 (for "." entry) */
@@ -429,6 +414,29 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
return end_creating(dentry);
}
+
+/**
+ * debugfs_create_dir - create a directory in the debugfs filesystem
+ * @name: a pointer to a string containing the name of the directory to
+ * create.
+ * @parent: a pointer to the parent dentry for this file. This should be a
+ * directory dentry if set. If this parameter is NULL, then the
+ * directory will be created in the root of the debugfs filesystem.
+ *
+ * This function creates a directory in debugfs with the given name.
+ *
+ * This function will return a pointer to a dentry if it succeeds. This
+ * pointer must be passed to the debugfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.) If an error occurs, %NULL will be returned.
+ *
+ * If debugfs is not enabled in the kernel, the value -%ENODEV will be
+ * returned.
+ */
+struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
+{
+ return __create_dir(name, parent, NULL, &simple_dir_inode_operations);
+}
EXPORT_SYMBOL_GPL(debugfs_create_dir);

/**
--
2.6.2

2015-12-08 09:51:51

by Roman Peniaev

[permalink] [raw]
Subject: [RFC PATCH 2/3] debugfs: implement 'debugfs_create_dir_with_tmpfiles()'

This new function creates a directory in debugfs with the given name with
possibility to create temporary files on demand. Any attempt to open
a non-existent file in that directory will create a temporary file,
wich will be deleted when the last file descriptor is closed. This
temporary file is very similar to opening a directory with O_TMPFILE,
with the difference that a resulting dentry has a name, but still is
unhashed, so is invisible to outer world and can never be reached via
any pathname.

Why it is needed? That will solve the race between writing (setting some
request information) and then reading back a response from some debugfs
entry.

E.g. let's assume that we have some storage device, which can have thousands
of snapshots, and each snapshot is controlled by the handle, which is a UUID
or any non-numeric character sequence. This device provides a debugfs entry
to get a status of a requested snapshot: 'snap_status'. It is obvious, that
to request a status of a snapshot you have to write a UUID first of a snapshot
and then read back the response with the status data, so the sequence is the
following:

# echo $UUID > /sys/kernel/debug/storage/snap_status
# cat /sys/kernel/debug/storage/snap_status

Between those two operations a race exists, and if someone else comes and
requests status for another snapshot, the first requester will get incorrect
data.

The atomic request-set and response-read solution can be the following:

# cat /sys/kernel/debug/storage/snap_status/$UUID

Here debugfs creates non-existent temporary entry with the $UUID name on
demand and eventually calls file operations, which were passed to the
'debugfs_create_dir_with_tmpfiles()' function. User of that function can
control the correctness of the file name in 'i_fop->open' callback and can
return an error if temporary file name does not match some criteria.

Created temporary file will not appear in any lookups, further linking is
forbidden, corresponding dentry and inode will be freed when last file
descriptor is closed.

Signed-off-by: Roman Pen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
---
fs/debugfs/inode.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/debugfs.h | 12 +++++
2 files changed, 131 insertions(+)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index f1c4f80..4a67d14 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -170,6 +170,9 @@ static void debugfs_evict_inode(struct inode *inode)
clear_inode(inode);
if (S_ISLNK(inode->i_mode))
kfree(inode->i_link);
+ else if (S_ISDIR(inode->i_mode))
+ /* Only for directories with tmpfiles i_private is set */
+ kfree(inode->i_private);
}

static const struct super_operations debugfs_super_operations = {
@@ -242,8 +245,19 @@ static struct file_system_type debug_fs_type = {
};
MODULE_ALIAS_FS("debugfs");

+static inline const struct inode_operations *swap_inode_ops(
+ const struct inode_operations **dst,
+ const struct inode_operations *new)
+{
+ const struct inode_operations *ret = *dst;
+
+ *dst = new;
+ return ret;
+}
+
static struct dentry *start_creating(const char *name, struct dentry *parent)
{
+ const struct inode_operations *i_op;
struct dentry *dentry;
int error;

@@ -266,7 +280,13 @@ static struct dentry *start_creating(const char *name, struct dentry *parent)
parent = debugfs_mount->mnt_root;

mutex_lock(&d_inode(parent)->i_mutex);
+ /* We want to avoid creating temporary inodes while lookup,
+ * thus use simple inode ops
+ */
+ i_op = swap_inode_ops(&d_inode(parent)->i_op,
+ &simple_dir_inode_operations);
dentry = lookup_one_len(name, parent, strlen(name));
+ swap_inode_ops(&d_inode(parent)->i_op, i_op);
if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
dput(dentry);
dentry = ERR_PTR(-EEXIST);
@@ -439,6 +459,98 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
}
EXPORT_SYMBOL_GPL(debugfs_create_dir);

+struct tmp_priv {
+ const struct file_operations *fops;
+ void *data;
+ umode_t mode;
+};
+
+/**
+ * debugfs_tmp_lookup() - lookup for directory with temporary files
+ *
+ * This lookup always creates inode for any name and ties it with dentry,
+ * but dentry cannot be reached via any pathname, so when last referenced
+ * on the file is closed - everything will be deleted (see O_TMPFILE).
+ */
+struct dentry *debugfs_tmp_lookup(struct inode *dir, struct dentry *dentry,
+ unsigned int flags)
+{
+ struct inode *inode;
+ struct tmp_priv *p;
+
+ inode = debugfs_get_inode(dir->i_sb);
+ if (unlikely(!inode))
+ return ERR_PTR(-ENOMEM);
+
+ p = dir->i_private;
+ BUG_ON(p == NULL);
+
+ inode->i_mode = p->mode;
+ inode->i_fop = p->fops ? p->fops : &debugfs_file_operations;
+ inode->i_private = p->data;
+
+ inode_dec_link_count(inode);
+ d_instantiate(dentry, inode);
+
+ return NULL;
+}
+
+const struct inode_operations debugfs_dir_tmp_inode_operations = {
+ .lookup = debugfs_tmp_lookup,
+};
+
+/**
+ * debugfs_create_dir_with_tmpfiles() - create a directory in the debugfs
+ * filesystem, where temporary files can be created on demand.
+ *
+ * @name: a pointer to a string containing the name of the directory to
+ * create.
+ * @mode: the permission that a temporary file, which will be created on
+ * demand, should have.
+ * @parent: a pointer to the parent dentry for this file. This should be a
+ * directory dentry if set. If this parameter is NULL, then the
+ * directory will be created in the root of the debugfs filesystem.
+ * @data: a pointer to something that the caller will want to get to later
+ * on. The inode.i_private pointer will point to this value on
+ * the open() call of a temporart file, which will be created on demand.
+ * @fops: a pointer to a struct file_operations that should be used for a
+ * temporary file, which will be created on demand.
+ *
+ * This function creates a directory in debugfs with the given name with
+ * possibility to create temporary files on demand. Any attempt to open
+ * a non-existent file in that directory will create a temporary file,
+ * wich will be deleted when the last file descriptor is closed. This
+ * temporary file is very similar to opening a directory with O_TMPFILE,
+ * with the difference that a resulting dentry has a name, but still is
+ * unhashed, so is invisible to outer world and can never be reached via
+ * any pathname.
+ *
+ * This function will return a pointer to a dentry if it succeeds. This
+ * pointer must be passed to the debugfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.) If an error occurs, %NULL will be returned.
+ *
+ * If debugfs is not enabled in the kernel, the value -%ENODEV will be
+ * returned.
+ */
+struct dentry *debugfs_create_dir_with_tmpfiles(const char *name, umode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops)
+{
+ struct tmp_priv *p;
+
+ p = kmalloc(sizeof(*p), GFP_NOFS);
+ if (unlikely(!p))
+ return NULL;
+
+ p->fops = fops ? fops : &debugfs_file_operations;
+ p->data = data;
+ p->mode = mode;
+
+ return __create_dir(name, parent, p, &debugfs_dir_tmp_inode_operations);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_dir_with_tmpfiles);
+
/**
* debugfs_create_automount - create automount point in the debugfs filesystem
* @name: a pointer to a string containing the name of the file to create.
@@ -677,6 +789,7 @@ struct dentry *debugfs_rename(struct dentry *old_dir, struct dentry *old_dentry,
{
int error;
struct dentry *dentry = NULL, *trap;
+ const struct inode_operations *i_op;
const char *old_name;

trap = lock_rename(new_dir, old_dir);
@@ -687,7 +800,13 @@ struct dentry *debugfs_rename(struct dentry *old_dir, struct dentry *old_dentry,
if (d_really_is_negative(old_dentry) || old_dentry == trap ||
d_mountpoint(old_dentry))
goto exit;
+ /* We want to avoid creating temporary inodes while lookup,
+ * thus use simple inode ops
+ */
+ i_op = swap_inode_ops(&d_inode(new_dir)->i_op,
+ &simple_dir_inode_operations);
dentry = lookup_one_len(new_name, new_dir, strlen(new_name));
+ swap_inode_ops(&d_inode(new_dir)->i_op, i_op);
/* Lookup failed, cyclic rename or target exists? */
if (IS_ERR(dentry) || dentry == trap || d_really_is_positive(dentry))
goto exit;
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index 19c066d..abf8c0d 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -57,6 +57,10 @@ struct dentry *debugfs_create_file_size(const char *name, umode_t mode,

struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);

+struct dentry *debugfs_create_dir_with_tmpfiles(const char *name, umode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops);
+
struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
const char *dest);

@@ -155,6 +159,14 @@ static inline struct dentry *debugfs_create_dir(const char *name,
return ERR_PTR(-ENODEV);
}

+static inline struct dentry *debugfs_create_dir_with_tmpfiles(const char *name,
+ umode_t mode, struct dentry *parent,
+ void *data,
+ const struct file_operations *fops)
+{
+ return ERR_PTR(-ENODEV);
+}
+
static inline struct dentry *debugfs_create_symlink(const char *name,
struct dentry *parent,
const char *dest)
--
2.6.2

2015-12-08 09:51:50

by Roman Peniaev

[permalink] [raw]
Subject: [RFC PATCH 3/3] debugfs: update some bits of documentation

Include description of 'debugfs_create_dir_with_tmpfiles()' call.

Signed-off-by: Roman Pen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
---
Documentation/filesystems/debugfs.txt | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt
index 4f45f71..150fabf 100644
--- a/Documentation/filesystems/debugfs.txt
+++ b/Documentation/filesystems/debugfs.txt
@@ -36,6 +36,31 @@ wrong. If ERR_PTR(-ENODEV) is returned, that is an indication that the
kernel has been built without debugfs support and none of the functions
described below will work.

+Another way to create a directory where temporary files will be created
+on demand is:
+
+ struct dentry *debugfs_create_dir_with_tmpfiles(const char *name,
+ umode_t mode, struct dentry *parent, void *data,
+ const struct file_operations *fops);
+
+This function creates a directory in debugfs with the given name with
+possibility to create temporary files on demand. Any attempt to open
+a non-existent file in that directory will create a temporary file,
+wich will be deleted when the last file descriptor is closed. This
+temporary file is very similar to opening a directory with O_TMPFILE,
+with the difference that a resulting dentry has a name, but still is
+unhashed, so is invisible to outer world and can never be reached via
+any pathname.
+
+How those temporary files on demand can be used? This is a way to provide
+one additional parameter in a file name and specify an object name. E.g.:
+
+ # cat /sys/kernel/debug/my_objects/$UUID
+
+Where $UUID is a UUID of an object which should be requested. This $UUID
+file will never appear in lookup and will be deleted when 'cat' closes last
+file descriptor.
+
The most general way to create a file within a debugfs directory is with:

struct dentry *debugfs_create_file(const char *name, umode_t mode,
--
2.6.2

2015-12-09 00:47:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] debugfs: implement 'debugfs_create_dir_with_tmpfiles()'

On Tue, Dec 08, 2015 at 10:51:03AM +0100, Roman Pen wrote:
> Hello.
>
> Here is an attempt to solve annoying race, which exists between two operations
> on debugfs entries: write (setting a request) and read (reading a response).
>
> E.g. let's assume that we have some storage device, which can have thousands
> of snapshots (yeah, plenty of them, thus it is ridicoulous to create a debugfs
> entry for each), and each snapshot is controlled by the handle, which is a UUID
> or any non-numeric character sequence (for numeric sequence this problem can be
> solved by 'seek' operation). This device provides a debugfs entry 'snap_status',
> which can be opened for reading and writing, where write - is an operation for
> specifiying a request, and read - is an operation for getting a response back.
>
> I.e. it is obvious, that to request a status of a snapshot you have to write a
> UUID first of a snapshot and then read back a status response back, so the
> sequence can be the following:
>
> # echo $UUID > /sys/kernel/debug/storage/snap_status
> # cat /sys/kernel/debug/storage/snap_status
>
> Between those two operations a race exists, and if someone else comes and
> requests status for another snapshot, the first requester will get incorrect
> data.
>
> An atomic request-set and response-read solution can be the following:
>
> # cat /sys/kernel/debug/storage/snap_status/$UUID
>
> Here debugfs creates non-existent temporary entry on demand with the $UUID
> name and eventually calls file operations, which were passed to the
> 'debugfs_create_dir_with_tmpfiles()' function. Caller of that function can
> control the correctness of the file name in 'i_fop->open' callback and can
> return an error if temporary file name does not match some format.
>
> Temporary file, which is created, will not appear in any lookups, further
> linking is forbidden, corresponding dentry and inode will be freed when last
> file descriptor is closed (see O_TMPFILE, with the only difference is that
> debugfs temporary dentry has a name).
>
> Of course this file creation on demand can be applied to many other cases,
> where it is impossible to create as many debugfs entries as objects exist,
> but atomicity of read-write can be required.
>
> This atomicity can be achieved also by locking from userspace, but that approach
> increases complexity and makes it hardly possible to invoke only few commands
> from command line, like 'echo' or 'cat'.
>
> So basically creating a temporary file on demand with a specified name is a
> way to provide one additional parameter for an 'read' operation.
>
> Probably, there is more elegant solution for that write-read race problem,
> but I've not found any.
>
> PS. I did not want to use configfs, because I have nothing to configure (what
> I have described is not a configuration issue), and I do not like to keep
> dentries in a system if userspace forgets to remove them.

Do you have a patch series that depends on these new apis? I don't want
to add things to debugfs without any in-tree users if at all possible.

thanks,

greg k-h

2015-12-09 20:06:19

by Roman Peniaev

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] debugfs: implement 'debugfs_create_dir_with_tmpfiles()'

On Tue, Dec 8, 2015 at 12:49 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Tue, Dec 08, 2015 at 10:51:03AM +0100, Roman Pen wrote:
>> Hello.
>>
>> Here is an attempt to solve annoying race, which exists between two operations
>> on debugfs entries: write (setting a request) and read (reading a response).
>>
>> E.g. let's assume that we have some storage device, which can have thousands
>> of snapshots (yeah, plenty of them, thus it is ridicoulous to create a debugfs
>> entry for each), and each snapshot is controlled by the handle, which is a UUID
>> or any non-numeric character sequence (for numeric sequence this problem can be
>> solved by 'seek' operation). This device provides a debugfs entry 'snap_status',
>> which can be opened for reading and writing, where write - is an operation for
>> specifiying a request, and read - is an operation for getting a response back.
>>
[skipped]
>>
>> So basically creating a temporary file on demand with a specified name is a
>> way to provide one additional parameter for an 'read' operation.
>>
>> Probably, there is more elegant solution for that write-read race problem,
>> but I've not found any.
>>
>> PS. I did not want to use configfs, because I have nothing to configure (what
>> I have described is not a configuration issue), and I do not like to keep
>> dentries in a system if userspace forgets to remove them.
>
> Do you have a patch series that depends on these new apis? I don't want
> to add things to debugfs without any in-tree users if at all possible.

I can show only similar write/read usage, which I tried to avoid. I
did the grep and found
the following files which do exactly what I've described here:

linux/drivers/bluetooth/btmrvl_debufgfs.c
.read = btmrvl_hscfgcmd_read,
.write = btmrvl_hscfgcmd_write,

.read = btmrvl_pscmd_read,
.write = btmrvl_pscmd_write,

.read = btmrvl_hscmd_read,
.write = btmrvl_hscmd_write,

linux/drivers/mfd/ab8500-debugfs.c
.open = ab8500_bank_open,
.write = ab8500_bank_write,

.open = ab8500_address_open,
.write = ab8500_address_write,

.open = ab8500_val_open,
.write = ab8500_val_write,


linux/drivers/mmc/card/mmc_test.c
.open = mtf_test_open,
.read = seq_read,
.write = mtf_test_write,

linux/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
.read = ixgbe_dbg_reg_ops_read,
.write = ixgbe_dbg_reg_ops_write,

.read = ixgbe_dbg_netdev_ops_read,
.write = ixgbe_dbg_netdev_ops_write,

linux/drivers/platform/olpc/olpc-ec.c
.write = ec_dbgfs_cmd_write,
.read = ec_dbgfs_cmd_read,

linux/kernel/time/test_udelay.c
.open = udelay_test_open,
.read = seq_read,
.write = udelay_test_write,


Of course, I could miss something, because plenty of callers with
similar meaning,
but in the list above everything boils down to setting request on
write() and getting
response back on read().

For example this simplest and representative test_udelay.c:

echo "100 10" > debugfs/udelay_test
cat debugfs/udelay_test

can be replaced with atomic sequence:
cat debugfs/udelay_test/"100 10"

And frankly I do not know does it make sense to switch these functions
to new API
and to break userspace expectations, but for sure they are the
candidates to behave
atomically.

--
Roman