2007-01-31 10:55:37

by Duncan Sands

[permalink] [raw]
Subject: remove_proc_entry and read_proc

Can read_proc still be executing when remove_proc_entry returns?

In my driver [*] I allocate some data and create a proc entry using
create_proc_entry. My read method reads from my allocated data. When
shutting down, I call remove_proc_entry and immediately free the data.
If some call to read_proc is still executing at this point then it will
be accessing freed memory. Can this happen? I've been rummaging around
in fs/proc to see what prevents it, but didn't find anything yet.

Thanks a lot,

Duncan.

[*] Actually it's the ATM layer that does all this (net/atm); my driver
uses that layer.


2007-01-31 18:43:08

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: remove_proc_entry and read_proc

On Wed, Jan 31, 2007 at 11:54:35AM +0100, Duncan Sands wrote:
> Can read_proc still be executing when remove_proc_entry returns?
>
> In my driver [*] I allocate some data and create a proc entry using
> create_proc_entry. My read method reads from my allocated data. When
> shutting down, I call remove_proc_entry and immediately free the data.
> If some call to read_proc is still executing at this point then it will
> be accessing freed memory. Can this happen? I've been rummaging around
> in fs/proc to see what prevents it, but didn't find anything yet.

This should be fixed by the following patch (in -mm currently):
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/broken-out/fix-rmmod-read-write-races-in-proc-entries.patch

Tell me if you're unsure it will.

2007-01-31 19:26:18

by Duncan Sands

[permalink] [raw]
Subject: Re: remove_proc_entry and read_proc

On Wednesday 31 January 2007 19:42:51 Alexey Dobriyan wrote:
> On Wed, Jan 31, 2007 at 11:54:35AM +0100, Duncan Sands wrote:
> > Can read_proc still be executing when remove_proc_entry returns?
> >
> > In my driver [*] I allocate some data and create a proc entry using
> > create_proc_entry. My read method reads from my allocated data. When
> > shutting down, I call remove_proc_entry and immediately free the data.
> > If some call to read_proc is still executing at this point then it will
> > be accessing freed memory. Can this happen? I've been rummaging around
> > in fs/proc to see what prevents it, but didn't find anything yet.
>
> This should be fixed by the following patch (in -mm currently):
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/broken-out/fix-rmmod-read-write-races-in-proc-entries.patch
>
> Tell me if you're unsure it will.

Excellent! But tell me,

+ atomic_inc(&dp->pde_users);
+ if (!dp->proc_fops)

don't you need a memory barrier between these two? Also a corresponding
one where proc_fops is set to NULL.


+ /*
+ * Stop accepting new readers/writers. If you're dynamically
+ * allocating ->proc_fops, save a pointer somewhere.
+ */
+ de->proc_fops = NULL;
+ /* Wait until all readers/writers are done. */
+ if (atomic_read(&de->pde_users) > 0) {
+ spin_unlock(&proc_subdir_lock);
+ msleep(1);
+ goto again;
+ }

I don't understand how this is supposed to work. Consider

CPU1 CPU2

atomic_inc(&dp->pde_users);
if (dp->proc_fops)
de->proc_fops = NULL;
use_proc_fops <= BOOM
if (atomic_read(&de->pde_users) > 0) {

what prevents dereference of a NULL proc_fops value?

Best wishes,

Duncan.

2007-02-01 10:37:32

by Duncan Sands

[permalink] [raw]
Subject: Re: remove_proc_entry and read_proc

> I don't understand how this is supposed to work. Consider
>
> CPU1 CPU2
>
> atomic_inc(&dp->pde_users);
> if (dp->proc_fops)
> de->proc_fops = NULL;
> use_proc_fops <= BOOM
> if (atomic_read(&de->pde_users) > 0) {
>
> what prevents dereference of a NULL proc_fops value?

I was wrong: what prevents it is that proc_fops isn't used at all in these
paths! However it is used in proc_get_inode thusly:

if (de->proc_fops)
inode->i_fop = de->proc_fops;

What if proc_fops is set to NULL between these two? Putting it into a
local variable should take care of this one, but since I'm not sure if
it's really needed I'll leave that to you!

Otherwise, here's a patch that adds memory barriers (maybe these can be
SMP memory barriers) since the atomic ops you are using do not imply
such barriers according to Documentation/atomic_ops.txt. The memory
barriers are needed, since you need to know that if you saw a non-NULL
proc_fops, then remove_proc_entry saw your increased pde_users count.
If the proc_fops write was re-ordered after the pde_users read, or the
proc_fops read was re-ordered before the pde_users write, then this may
not be true.

Also, I removed the early checks for NULL proc_fops. True, they avoid
an atomic operation and a memory barrier, however they only avoid them
in the error path, when -EIO is going to be returned, which is hardly a
fast path. In the common case, they represent a pointless check.

Ciao,

Duncan.

PS: Compiles, but otherwise untested.


Index: Linux/fs/proc/generic.c
===================================================================
--- Linux.orig/fs/proc/generic.c 2006-12-09 17:18:32.000000000 +0100
+++ Linux/fs/proc/generic.c 2007-02-01 10:54:36.000000000 +0100
@@ -19,6 +19,7 @@
#include <linux/idr.h>
#include <linux/namei.h>
#include <linux/bitops.h>
+#include <linux/delay.h>
#include <linux/spinlock.h>
#include <asm/uaccess.h>

@@ -76,6 +77,19 @@
if (!(page = (char*) __get_free_page(GFP_KERNEL)))
return -ENOMEM;

+ /*
+ * We are going to call into module's code via ->get_info or
+ * ->read_proc. Bump refcount so that remove_proc_entry() will
+ * wait for read to complete.
+ */
+ atomic_inc(&dp->pde_users);
+ mb();
+ if (!dp->proc_fops)
+ /*
+ * remove_proc_entry() marked PDE as "going away". Obey.
+ */
+ goto out_dec;
+
while ((nbytes > 0) && !eof) {
count = min_t(size_t, PROC_BLOCK_SIZE, nbytes);

@@ -195,6 +209,8 @@
buf += n;
retval += n;
}
+out_dec:
+ atomic_dec(&dp->pde_users);
free_page((unsigned long) page);
return retval;
}
@@ -205,14 +221,28 @@
{
struct inode *inode = file->f_path.dentry->d_inode;
struct proc_dir_entry * dp;
+ ssize_t rv;

dp = PDE(inode);

if (!dp->write_proc)
return -EIO;

- /* FIXME: does this routine need ppos? probably... */
- return dp->write_proc(file, buffer, count, dp->data);
+ rv = -EIO;
+ /*
+ * We are going to call into module's code via ->write_proc .
+ * Bump refcount so that module won't dissapear while ->write_proc
+ * sleeps in copy_from_user(). remove_proc_entry() will wait for
+ * write to complete.
+ */
+ atomic_inc(&dp->pde_users);
+ mb();
+ if (dp->proc_fops)
+ /* PDE is ready, refcount bumped, call into module. */
+ /* FIXME: does this routine need ppos? probably... */
+ rv = dp->write_proc(file, buffer, count, dp->data);
+ atomic_dec(&dp->pde_users);
+ return rv;
}


@@ -717,12 +747,26 @@
if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
goto out;
len = strlen(fn);
-
+again:
spin_lock(&proc_subdir_lock);
for (p = &parent->subdir; *p; p=&(*p)->next ) {
if (!proc_match(len, fn, *p))
continue;
de = *p;
+
+ /*
+ * Stop accepting new readers/writers. If you're dynamically
+ * allocating ->proc_fops, save a pointer somewhere.
+ */
+ de->proc_fops = NULL;
+ mb();
+ /* Wait until all readers/writers are done. */
+ if (atomic_read(&de->pde_users) > 0) {
+ spin_unlock(&proc_subdir_lock);
+ msleep(1);
+ goto again;
+ }
+
*p = de->next;
de->next = NULL;
if (S_ISDIR(de->mode))
Index: Linux/include/linux/proc_fs.h
===================================================================
--- Linux.orig/include/linux/proc_fs.h 2006-10-03 15:39:31.000000000 +0200
+++ Linux/include/linux/proc_fs.h 2007-02-01 10:42:07.000000000 +0100
@@ -56,6 +56,19 @@
gid_t gid;
loff_t size;
struct inode_operations * proc_iops;
+ /*
+ * NULL ->proc_fops means "PDE is going away RSN" or
+ * "PDE is just created". In either case ->get_info, ->read_proc,
+ * ->write_proc won't be called because it's too late or too early,
+ * respectively.
+ *
+ * Valid ->proc_fops means "use this file_operations" or
+ * "->data is setup, it's safe to call ->read_proc, ->write_proc which
+ * dereference it".
+ *
+ * If you're allocating ->proc_fops dynamically, save a pointer
+ * somewhere.
+ */
const struct file_operations * proc_fops;
get_info_t *get_info;
struct module *owner;
@@ -66,6 +79,8 @@
atomic_t count; /* use count */
int deleted; /* delete flag */
void *set;
+ atomic_t pde_users; /* number of readers + number of writers via
+ * ->read_proc, ->write_proc, ->get_info */
};

struct kcore_list {

2007-02-01 16:02:08

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: remove_proc_entry and read_proc

Duncan Sands wrote:
> On Wednesday 31 January 2007 19:42:51 Alexey Dobriyan wrote:
> > On Wed, Jan 31, 2007 at 11:54:35AM +0100, Duncan Sands wrote:
> > > Can read_proc still be executing when remove_proc_entry returns?
> > >
> > > In my driver [*] I allocate some data and create a proc entry using
> > > create_proc_entry. My read method reads from my allocated data. When
> > > shutting down, I call remove_proc_entry and immediately free the data.
> > > If some call to read_proc is still executing at this point then it will
> > > be accessing freed memory. Can this happen? I've been rummaging around
> > > in fs/proc to see what prevents it, but didn't find anything yet.
> >
> > This should be fixed by the following patch (in -mm currently):
> > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc6/2.6.20-rc6-mm3/broken-out/fix-rmmod-read-write-races-in-proc-entries.patch
> >
> > Tell me if you're unsure it will.
>
> Excellent! But tell me,
>
> + atomic_inc(&dp->pde_users);
> + if (!dp->proc_fops)
>
> don't you need a memory barrier between these two? Also a corresponding
> one where proc_fops is set to NULL.

I believe, barriers not needed, not now.

This scheme relies on the fact that remove_proc_entry() will be the only
place that will clear ->proc_fops and, once cleared, ->proc_fops will
never be resurrected. Clearing of ->proc_fops will eventually propagate
to CPU doing first check, thus preveting refcount bumps from this CPU.
What can be missed is some "rogue" readers or writers?. Big deal.

> + /*
> + * Stop accepting new readers/writers. If you're dynamically
> + * allocating ->proc_fops, save a pointer somewhere.
> + */
> + de->proc_fops = NULL;
> + /* Wait until all readers/writers are done. */
> + if (atomic_read(&de->pde_users) > 0) {
> + spin_unlock(&proc_subdir_lock);
> + msleep(1);
> + goto again;
> + }
>
> I don't understand how this is supposed to work. Consider
>
> CPU1 CPU2
>
> atomic_inc(&dp->pde_users);
> if (dp->proc_fops)
> de->proc_fops = NULL;
> use_proc_fops <= BOOM
> if (atomic_read(&de->pde_users) > 0) {
>
> what prevents dereference of a NULL proc_fops value?

? Sigh, modules should do removals of proc entries first. And I should check
for that.

2007-02-02 07:32:11

by Duncan Sands

[permalink] [raw]
Subject: Re: remove_proc_entry and read_proc

Hi Alexey,

> I believe, barriers not needed, not now.
>
> This scheme relies on the fact that remove_proc_entry() will be the only
> place that will clear ->proc_fops and, once cleared, ->proc_fops will
> never be resurrected. Clearing of ->proc_fops will eventually propagate
> to CPU doing first check, thus preveting refcount bumps from this CPU.
> What can be missed is some "rogue" readers or writers?. Big deal.

I don't understand you. Without memory barriers, remove_proc_entry will
most of the time, but not all of the time, wait for all readers and writers
to finish before exiting. Since the whole point of your patch was to ensure
that all readers and writers finish before remove_proc_entry exits, I don't
understand why you don't just put the memory barriers in and make it correct.

Also, I do consider it a big deal:

> ? Sigh, modules should do removals of proc entries first. And I should
> check for that.

Modules should of course call remove_proc_entry before exiting. However
right now, even with your patch, a read or write method can still be
running when remove_proc_entry returns [1], so could still be running when
the module is removed (if they sleep; I guess this applies mostly to
write methods). This is very bad - why not put in memory barriers and
fix it? Also, plenty of proc read and write methods access private data
that is allocated before calling create_proc_entry and freed after calling
remove_proc_entry. If a read or write method is still running after
remove_proc_entry returns, then it can access freed memory - very bad.

Ciao,

Duncan.

[1] proc_get_inode does a try_module_get, so it is possible that module
unloading is not a problem - not sure.

2007-02-05 11:32:27

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: remove_proc_entry and read_proc

On Fri, Feb 02, 2007 at 08:31:57AM +0100, Duncan Sands wrote:
> > I believe, barriers not needed, not now.

> > This scheme relies on the fact that remove_proc_entry() will be the only
> > place that will clear ->proc_fops and, once cleared, ->proc_fops will
> > never be resurrected. Clearing of ->proc_fops will eventually propagate
> > to CPU doing first check, thus preveting refcount bumps from this CPU.
> > What can be missed is some "rogue" readers or writers?. Big deal.
>
> I don't understand you. Without memory barriers, remove_proc_entry will
> most of the time, but not all of the time, wait for all readers and writers
> to finish before exiting. Since the whole point of your patch was to ensure
> that all readers and writers finish before remove_proc_entry exits, I don't
> understand why you don't just put the memory barriers in and make it correct.

Gee, thanks. I sat and wrote code side-by-side, and it looks like, even barriers
won't fix anything, because they don't affect other CPUs. There will be
new patch soon.

->proc_fops is valid ->proc_fops is valid
->pde_users is 0 ->pde_users is 0
------------------------------------------------------------


if (!pde->proc_fops)
goto out;

->proc_fops = NULL;
if (atomic_read(->pde_users) > 0)
goto again;

|
| atomic_inc(->pde_users);
|
|
|
V

> Also, I do consider it a big deal:
>
> > ? Sigh, modules should do removals of proc entries first. And I should
> > check for that.
>
> Modules should of course call remove_proc_entry before exiting. However
> right now, even with your patch, a read or write method can still be
> running when remove_proc_entry returns [1], so could still be running when
> the module is removed (if they sleep; I guess this applies mostly to
> write methods). This is very bad - why not put in memory barriers and
> fix it? Also, plenty of proc read and write methods access private data
> that is allocated before calling create_proc_entry and freed after calling
> remove_proc_entry. If a read or write method is still running after
> remove_proc_entry returns, then it can access freed memory - very bad.

> [1] proc_get_inode does a try_module_get, so it is possible that module
> unloading is not a problem - not sure.

Modules forget to set ->owner sometimes. Also, it's still racy, because
of the typical

pde = create_proc_entry();
/*
*
* ->owner is NULL here, effectively, PDE without ->owner.
*
*/
if (pde)
pde->owner = THIS_MODULE;

2007-02-05 12:05:24

by Duncan Sands

[permalink] [raw]
Subject: Re: remove_proc_entry and read_proc

> Gee, thanks. I sat and wrote code side-by-side, and it looks like, even barriers
> won't fix anything, because they don't affect other CPUs.

?! The whole point of memory barriers is that they affect other CPUs.
Maybe you are thinking of compiler barriers?

> ->proc_fops is valid ->proc_fops is valid
> ->pde_users is 0 ->pde_users is 0
> ------------------------------------------------------------
>
>
> if (!pde->proc_fops)
> goto out;
>
> ->proc_fops = NULL;
> if (atomic_read(->pde_users) > 0)
> goto again;
>
> |
> | atomic_inc(->pde_users);
> |
> |
> |
> V

The proc_fops check *before* the atomic_inc is indeed pointless (notice
how I removed it in the patch I sent?). It's the one after the atomic_inc
that prevents this race, but only if there is a memory barrier between the
atomic_inc and the check... because otherwise they could be reordered (i.e.
seen in reverse order by another CPU) giving the race.

> Modules forget to set ->owner sometimes. Also, it's still racy, because
> of the typical
>
> pde = create_proc_entry();
> /*
> *
> * ->owner is NULL here, effectively, PDE without ->owner.
> *
> */
> if (pde)
> pde->owner = THIS_MODULE;

As long as the module calls remove_proc_entry before being unloaded,
this should be ok.

Best wishes,

Duncan.