2015-12-10 12:47:25

by Roman Peniaev

[permalink] [raw]
Subject: [RFC PATCH 0/3] debugfs: make __debugfs_remove wait for dentry release

Hello.

Recently I observed that all over the sources people expect that debugfs_remove()
should behave as a barrier, hence file operations won't be invoked after it is
called and it is safe to free remain memory.

But by default debugfs_remove() does not guarantee that dentry will be released,
and after this call fops->open/read/write can still be invoked if someone holds
the reference on dentry because of successfull lookup.

For example here is the grep output:

*** drivers/block/pktcdvd.c:
pkt_debugfs_dev_remove[489] debugfs_remove(pd->dfs_f_info);
pkt_debugfs_dev_remove[490] debugfs_remove(pd->dfs_d_root);

*** drivers/char/virtio_console.c:
unplug_port[1595] debugfs_remove(port->debugfs_file);

*** drivers/crypto/qat/qat_common/adf_cfg.c:
adf_cfg_dev_remove[187] debugfs_remove(dev_cfg_data->debug);

*** drivers/gpu/drm/drm_debugfs.c:
drm_debugfs_remove_files[203] debugfs_remove(tmp->dent);

.... and more and more and more ...

where people do the following sequence:

debugfs_remove(dev->debugfs_dentry);
kfree(dev);

and 'dev' pointer was passed to 'debugfs_create_file("my_dev", , dev, dev_fops)',
so any access to 'dev' in file operations can lead to usage-after-free.

In this patch __debugfs_remove() waits for last dentry release callback.

BUT! I am not sure that nobody tries to remove the dentry from it's own file
operation (dentry suicide). And if so - deadlock will happen.

Probably, dentry_remove_self() should be implemented for such cases, which is
similar to sysfs_remove_file_self(). But for now I do not want to add new
function which can be useless in the nearest future.

Also I did a fix for automount inodes and increased i_nlink references because
of WARNING at fs/inode.c:273 drop_nlink.

Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]

Roman Pen (3):
debugfs: fix automount inode i_nlink references
debugfs: put private data to i_private for automount inode
debugfs: make __debugfs_remove wait for dentry release

fs/debugfs/inode.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 51 insertions(+), 7 deletions(-)

--
2.6.2


2015-12-10 12:48:14

by Roman Peniaev

[permalink] [raw]
Subject: [RFC PATCH 1/3] debugfs: fix automount inode i_nlink references

Directory inodes should start off with i_nlink == 2 (for "." entry).
Of course the same rule should be applied to automount dentries for
child and parent inodes as well.

Also now automount dentry does fsnotify_mkdir.

Without this patch kernel complains when sees i_nlink == 0:

[ 86.288070] WARNING: CPU: 1 PID: 3616 at fs/inode.c:273 drop_nlink+0x3e/0x50()
[ 86.288461] Modules linked in: debugfs_example2(O-)
[ 86.288745] CPU: 1 PID: 3616 Comm: rmmod Tainted: G O 4.4.0-rc3-next-20151207+ #135
[ 86.289197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150617_082717-anatol 04/01/2014
[ 86.289696] ffffffff81be05c9 ffff8800b9e6fda0 ffffffff81352e2c 0000000000000000
[ 86.290110] ffff8800b9e6fdd8 ffffffff81065142 ffff8801399175e8 ffff8800bb78b240
[ 86.290507] ffff8801399175e8 ffff8800b73d7898 ffff8800b73d7840 ffff8800b9e6fde8
[ 86.290933] Call Trace:
[ 86.291080] [<ffffffff81352e2c>] dump_stack+0x4e/0x82
[ 86.291340] [<ffffffff81065142>] warn_slowpath_common+0x82/0xc0
[ 86.291640] [<ffffffff8106523a>] warn_slowpath_null+0x1a/0x20
[ 86.291932] [<ffffffff811ae62e>] drop_nlink+0x3e/0x50
[ 86.292208] [<ffffffff811ba35b>] simple_unlink+0x4b/0x60
[ 86.292481] [<ffffffff811ba3a7>] simple_rmdir+0x37/0x50
[ 86.292748] [<ffffffff812d9808>] __debugfs_remove.part.16+0xa8/0xd0
[ 86.293082] [<ffffffff812d9a0b>] debugfs_remove_recursive+0xdb/0x1c0
[ 86.293406] [<ffffffffa00004dd>] cleanup_module+0x2d/0x3b [debugfs_example2]
[ 86.293762] [<ffffffff810d959b>] SyS_delete_module+0x16b/0x220
[ 86.294077] [<ffffffff818ef857>] entry_SYSCALL_64_fastpath+0x12/0x6a
[ 86.294405] ---[ end trace c9fc53353fe14a36 ]---
[ 86.294639] ------------[ cut here ]------------
[ 86.294871] WARNING: CPU: 1 PID: 3616 at fs/inode.c:273 drop_nlink+0x3e/0x50()
[ 86.295249] Modules linked in: debugfs_example2(O-)
[ 86.295510] CPU: 1 PID: 3616 Comm: rmmod Tainted: G W O 4.4.0-rc3-next-20151207+ #135
[ 86.295943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150617_082717-anatol 04/01/2014
[ 86.296450] ffffffff81be05c9 ffff8800b9e6fda0 ffffffff81352e2c 0000000000000000
[ 86.296846] ffff8800b9e6fdd8 ffffffff81065142 ffff880139917810 ffff8800b73d7840
[ 86.297253] ffff880139917810 ffff8800b73d7898 ffff8800b73d7840 ffff8800b9e6fde8
[ 86.297650] Call Trace:
[ 86.297777] [<ffffffff81352e2c>] dump_stack+0x4e/0x82
[ 86.298043] [<ffffffff81065142>] warn_slowpath_common+0x82/0xc0
[ 86.298344] [<ffffffff8106523a>] warn_slowpath_null+0x1a/0x20
[ 86.298636] [<ffffffff811ae62e>] drop_nlink+0x3e/0x50
[ 86.298893] [<ffffffff811ba35b>] simple_unlink+0x4b/0x60
[ 86.299174] [<ffffffff811ba3a7>] simple_rmdir+0x37/0x50
[ 86.299442] [<ffffffff812d9808>] __debugfs_remove.part.16+0xa8/0xd0
[ 86.299760] [<ffffffff812d9aaf>] debugfs_remove_recursive+0x17f/0x1c0
[ 86.300095] [<ffffffffa00004dd>] cleanup_module+0x2d/0x3b [debugfs_example2]
[ 86.300453] [<ffffffff810d959b>] SyS_delete_module+0x16b/0x220
[ 86.300750] [<ffffffff818ef857>] entry_SYSCALL_64_fastpath+0x12/0x6a
[ 86.301198] ---[ end trace c9fc53353fe14a37 ]---

Signed-off-by: Roman Pen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
---
fs/debugfs/inode.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index b7fcc0d..7bd7e19 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -461,7 +461,11 @@ struct dentry *debugfs_create_automount(const char *name,
inode->i_flags |= S_AUTOMOUNT;
inode->i_private = data;
dentry->d_fsdata = (void *)f;
+ /* directory inodes start off with i_nlink == 2 (for "." entry) */
+ inc_nlink(inode);
d_instantiate(dentry, inode);
+ inc_nlink(d_inode(dentry->d_parent));
+ fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
return end_creating(dentry);
}
EXPORT_SYMBOL(debugfs_create_automount);
--
2.6.2

2015-12-10 12:47:27

by Roman Peniaev

[permalink] [raw]
Subject: [RFC PATCH 2/3] debugfs: put private data to i_private for automount inode

I will need dentry->d_fsdata in the next patch.
Keep 'd_fsdata' for debugfs needs.

Signed-off-by: Roman Pen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
---
fs/debugfs/inode.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 7bd7e19..a1d077a 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -170,6 +170,8 @@ static void debugfs_evict_inode(struct inode *inode)
clear_inode(inode);
if (S_ISLNK(inode->i_mode))
kfree(inode->i_link);
+ else if (S_ISDIR(inode->i_mode) && IS_AUTOMOUNT(inode))
+ kfree(inode->i_private);
}

static const struct super_operations debugfs_super_operations = {
@@ -179,11 +181,16 @@ static const struct super_operations debugfs_super_operations = {
.evict_inode = debugfs_evict_inode,
};

+struct automount_priv {
+ struct vfsmount *(*func)(void *);
+ void *data;
+};
+
static struct vfsmount *debugfs_automount(struct path *path)
{
- struct vfsmount *(*f)(void *);
- f = (struct vfsmount *(*)(void *))path->dentry->d_fsdata;
- return f(d_inode(path->dentry)->i_private);
+ struct automount_priv *p = d_inode(path->dentry)->i_private;
+
+ return p->func(p->data);
}

static const struct dentry_operations debugfs_dops = {
@@ -448,19 +455,28 @@ struct dentry *debugfs_create_automount(const char *name,
void *data)
{
struct dentry *dentry = start_creating(name, parent);
+ struct automount_priv *p;
struct inode *inode;

if (IS_ERR(dentry))
return NULL;

+ p = kmalloc(sizeof(*p), GFP_KERNEL);
+ if (unlikely(!p))
+ return failed_creating(dentry);
+
+ p->func = f;
+ p->data = data;
+
inode = debugfs_get_inode(dentry->d_sb);
- if (unlikely(!inode))
+ if (unlikely(!inode)) {
+ kfree(p);
return failed_creating(dentry);
+ }

inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
inode->i_flags |= S_AUTOMOUNT;
- inode->i_private = data;
- dentry->d_fsdata = (void *)f;
+ inode->i_private = p;
/* directory inodes start off with i_nlink == 2 (for "." entry) */
inc_nlink(inode);
d_instantiate(dentry, inode);
--
2.6.2

2015-12-10 12:47:56

by Roman Peniaev

[permalink] [raw]
Subject: [RFC PATCH 3/3] debugfs: make __debugfs_remove wait for dentry release

__debugfs_remove does not wait for dentry release, thus dentry can still be
alive and file operations can still be invoked after the function returns.

>From debugfs point of view this behaviour is definitely ok, but that can be
critical for users of debugfs and lead to usage-after-free: file operations
can be called after dentry is considered as removed.

Simple grep over the sources shows that dynamic debugfs file creation and
removal is exactly the case, and common usage is the following:

create_dev():
dev = kmalloc();
dev->debugfs_dentry = debugfs_create_file("my_dev", , dev, dev_fops);
^^^
!! pointer is passed to file
!! operations as private data

remove_dev(dev):
debugfs_remove(dev->debugfs_dentry);
kfree(dev);
^^^
!! memory is freed, but fops->open/read/write
!! can still be called and lead to usage-after-free

Here is quick grep output of the case described above:

*** drivers/block/pktcdvd.c:
pkt_debugfs_dev_remove[489] debugfs_remove(pd->dfs_f_info);
pkt_debugfs_dev_remove[490] debugfs_remove(pd->dfs_d_root);

*** drivers/char/virtio_console.c:
unplug_port[1595] debugfs_remove(port->debugfs_file);

*** drivers/crypto/qat/qat_common/adf_cfg.c:
adf_cfg_dev_remove[187] debugfs_remove(dev_cfg_data->debug);

*** drivers/gpu/drm/drm_debugfs.c:
drm_debugfs_remove_files[203] debugfs_remove(tmp->dent);

.... and more and more and more ...

In my grep output each third line is exactly this case: people expect that
debugfs_remove() is a barrier and file operations won't be invoked after it
(same behaviour as kobject_del(),kobject_put() tuple).

So in this patch debugfs_remove() waits for completion of final dentry release
callback.

BUT! I am not sure that nobody tries to remove the dentry from it's own file
operation (dentry suicide). And if so - deadlock will happen.

Probably, dentry_remove_self() should be implemented for such cases, which is
similar to sysfs_remove_file_self(). But for now I do not want to add new
function which can be useless in the nearest future.

Signed-off-by: Roman Pen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
---
fs/debugfs/inode.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index a1d077a..2525158 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -193,9 +193,23 @@ static struct vfsmount *debugfs_automount(struct path *path)
return p->func(p->data);
}

+static void debugfs_release(struct dentry *dentry)
+{
+ struct completion *comp;
+
+ /* Paired with __debugfs_remove */
+ smp_rmb();
+ comp = dentry->d_fsdata;
+ if (likely(comp)) {
+ dentry->d_fsdata = NULL;
+ complete(comp);
+ }
+}
+
static const struct dentry_operations debugfs_dops = {
.d_delete = always_delete_dentry,
.d_automount = debugfs_automount,
+ .d_release = debugfs_release,
};

static int debug_fill_super(struct super_block *sb, void *data, int silent)
@@ -542,14 +556,24 @@ static int __debugfs_remove(struct dentry *dentry, struct dentry *parent)
int ret = 0;

if (simple_positive(dentry)) {
+ struct completion comp;
+
+ init_completion(&comp);
dget(dentry);
if (d_is_dir(dentry))
ret = simple_rmdir(d_inode(parent), dentry);
else
simple_unlink(d_inode(parent), dentry);
- if (!ret)
+ if (likely(!ret)) {
d_delete(dentry);
+ dentry->d_fsdata = &comp;
+ /* Paired with debugfs_release callback */
+ smp_wmb();
+ }
dput(dentry);
+
+ if (likely(!ret))
+ wait_for_completion(&comp);
}
return ret;
}
--
2.6.2