2000-11-16 21:31:05

by Dan Aloni

[permalink] [raw]
Subject: [PATCH (2.4)] atomic use count for proc_dir_entry


(against test11-pre5)

Makes procfs use an atomic use count for dir entries, to avoid using
the Big kernel lock. Axboe says it looks ok.

--- linux/fs/proc/inode.c Wed Jun 21 17:25:17 2000
+++ linux/fs/proc/inode.c Thu Nov 16 19:09:28 2000
@@ -25,7 +25,7 @@
struct proc_dir_entry * de_get(struct proc_dir_entry *de)
{
if (de)
- de->count++;
+ atomic_inc(&de->count);
return de;
}

@@ -35,20 +35,18 @@
void de_put(struct proc_dir_entry *de)
{
if (de) {
- lock_kernel(); /* FIXME: count should be atomic_t */
- if (!de->count) {
+ if (!atomic_read(&de->count)) {
printk("de_put: entry %s already free!\n", de->name);
return;
}

- if (!--de->count) {
+ if (atomic_dec_and_test(&de->count)) {
if (de->deleted) {
printk("de_put: deferred delete of %s\n",
de->name);
free_proc_entry(de);
}
}
- unlock_kernel();
}
}

@@ -139,7 +137,7 @@
#if 1
/* shouldn't ever happen */
if (de && de->deleted)
-printk("proc_iget: using deleted entry %s, count=%d\n", de->name, de->count);
+printk("proc_iget: using deleted entry %s, count=%d\n", de->name, atomic_read(&de->count));
#endif

inode = iget(sb, ino);
--- linux/fs/proc/generic.c Tue Nov 14 09:57:01 2000
+++ linux/fs/proc/generic.c Thu Nov 16 20:47:48 2000
@@ -215,7 +215,7 @@

static struct inode_operations proc_link_inode_operations = {
readlink: proc_readlink,
- follow_link: proc_follow_link
+ follow_link: proc_follow_link,
};

/*
@@ -574,11 +574,11 @@
proc_kill_inodes(de);
de->nlink = 0;
de->deleted = 1;
- if (!de->count)
+ if (!atomic_read(&de->count))
free_proc_entry(de);
else {
printk("remove_proc_entry: %s/%s busy, count=%d\n",
- parent->name, de->name, de->count);
+ parent->name, de->name, atomic_read(&de->count));
}
break;
}
--- linux/fs/proc/root.c Mon May 22 06:34:37 2000
+++ linux/fs/proc/root.c Thu Nov 16 19:22:43 2000
@@ -96,10 +96,12 @@
* This is the root "inode" in the /proc tree..
*/
struct proc_dir_entry proc_root = {
- PROC_ROOT_INO, 5, "/proc",
- S_IFDIR | S_IRUGO | S_IXUGO, 2, 0, 0,
- 0, &proc_root_inode_operations, &proc_root_operations,
- NULL, NULL,
- NULL,
- &proc_root, NULL
+ low_ino: PROC_ROOT_INO,
+ namelen: 5,
+ name: "/proc",
+ mode: S_IFDIR | S_IRUGO | S_IXUGO,
+ nlink: 2,
+ proc_iops: &proc_root_inode_operations,
+ proc_fops: &proc_root_operations,
+ parent: &proc_root,
};
--- linux/kernel/sysctl.c Tue Nov 14 09:57:49 2000
+++ linux/kernel/sysctl.c Thu Nov 16 19:35:02 2000
@@ -571,7 +571,7 @@
}

/* Don't unregister proc entries that are still being used.. */
- if (de->count)
+ if (atomic_read(&de->count))
continue;

table->de = NULL;
--- linux/include/linux/proc_fs.h Thu Nov 16 18:40:47 2000
+++ linux/include/linux/proc_fs.h Thu Nov 16 19:32:18 2000
@@ -67,7 +67,7 @@
void *data;
read_proc_t *read_proc;
write_proc_t *write_proc;
- unsigned int count; /* use count */
+ atomic_t count; /* use count */
int deleted; /* delete flag */
kdev_t rdev;
};
--- linux/drivers/pci/proc.c Tue Nov 14 09:56:55 2000
+++ linux/drivers/pci/proc.c Thu Nov 16 19:37:57 2000
@@ -279,7 +279,7 @@
struct proc_dir_entry *e;

if ((e = dev->procent)) {
- if (e->count)
+ if (atomic_read(&e->count))
return -EBUSY;
remove_proc_entry(e->name, dev->bus->procdir);
dev->procent = NULL;
--- linux/drivers/nubus/proc.c Sun Nov 28 01:27:48 1999
+++ linux/drivers/nubus/proc.c Thu Nov 16 22:21:34 2000
@@ -148,7 +148,7 @@
struct proc_dir_entry *e;

if ((e = dev->procdir)) {
- if (e->count)
+ if (atomic_read(&e->count))
return -EBUSY;
remove_proc_entry(e->name, proc_bus_nubus_dir);
dev->procdir = NULL;
--- linux/drivers/i2o/i2o_proc.c Tue Nov 14 09:56:52 2000
+++ linux/drivers/i2o/i2o_proc.c Thu Nov 16 22:44:44 2000
@@ -3237,7 +3237,7 @@
for(dev=pctrl->devices; dev; dev=dev->next)
i2o_proc_remove_device(dev);

- if(!pctrl->proc_entry->count)
+ if(!atomic_read(&pctrl->proc_entry->count))
{
sprintf(buff, "iop%d", pctrl->unit);

@@ -3257,7 +3257,7 @@

i2o_device_notify_off(dev, &i2o_proc_handler);
/* Would it be safe to remove _files_ even if they are in use? */
- if((de) && (!de->count))
+ if((de) && (!atomic_read(&de->count)))
{
i2o_proc_remove_entries(generic_dev_entries, de);
switch(dev->lct_data.class_id)
@@ -3334,7 +3334,7 @@
}
}

- if(!i2o_proc_dir_root->count)
+ if(!atomic_read(&i2o_proc_dir_root->count))
remove_proc_entry("i2o", 0);
else
return -1;

--
Dan Aloni
[email protected]


2000-11-16 21:44:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry



On Thu, 16 Nov 2000, Dan Aloni wrote:
>
> Makes procfs use an atomic use count for dir entries, to avoid using
> the Big kernel lock. Axboe says it looks ok.

There's a race there. Look at what happens if de_put() races with
remove_proc_entry(): we'd do free_proc_entry() twice. Not good.

Leave the kernel lock for now.

Linus

2000-11-16 22:01:39

by Dan Aloni

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry

On Thu, 16 Nov 2000, Linus Torvalds wrote:

> On Thu, 16 Nov 2000, Dan Aloni wrote:
> >
> > Makes procfs use an atomic use count for dir entries, to avoid using
> > the Big kernel lock. Axboe says it looks ok.
>
> There's a race there. Look at what happens if de_put() races with
> remove_proc_entry(): we'd do free_proc_entry() twice. Not good.
>
> Leave the kernel lock for now.

Is this particular kernel lock helps anyway? We could have been half way
through remove_proc_entry(), line 569, for example, while in the same time
another thread enters de_put when the use count is 1 and frees the entry
while the other thread is locked just before dereferencing the entry.

So, maybe a spinlock could be used here?

--
Dan Aloni
[email protected]

2000-11-17 05:51:46

by Jacob Luna Lundberg

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry


I'm not (yet) a kernel guru, so just point and laugh if I'm wrong, but...

On Thu, 16 Nov 2000, Dan Aloni wrote:
> - if (!--de->count) {
> + if (atomic_dec_and_test(&de->count)) {

Doesn't this reverse the sense of the test?

-Jacob

--

"My my, the cruelest lies are often told without a word.
My my, the kindest truths are often spoke, but never heard."

-Ben Folds Five, "The Last Polka"

2000-11-17 07:58:18

by Dan Aloni

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry

On Thu, 16 Nov 2000, Jacob Luna Lundberg wrote:

>
> I'm not (yet) a kernel guru, so just point and laugh if I'm wrong, but...
>
> On Thu, 16 Nov 2000, Dan Aloni wrote:
> > - if (!--de->count) {
> > + if (atomic_dec_and_test(&de->count)) {
>
> Doesn't this reverse the sense of the test?

If you are right, I guess put_files_struct() of kernel/exit.c would
have cleaned files_struct everytime someones called it.
Everywhere in the kernel, objects are freed when
atomic_dec_and_test() returns true.

--
Dan Aloni
[email protected]

2000-11-17 09:05:47

by Jacob Luna Lundberg

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry


On Fri, 17 Nov 2000, Dan Aloni wrote:
> If you are right, I guess put_files_struct() of kernel/exit.c would
> have cleaned files_struct everytime someones called it.
> Everywhere in the kernel, objects are freed when
> atomic_dec_and_test() returns true.

Indeed, after studying the asm in question I think I see how it ticks.
What is the reasoning behind reversing the result of the test instead of
returning the new value of the counter?

(Thanks for taking time to set me straight on this. :)

-Jacob

--

Why you say you no bunny rabbit when you have little powder-puff tail?
-- The Tasmanian Devil

2000-11-17 09:08:36

by Dan Aloni

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry

On Thu, 16 Nov 2000, Dan Aloni wrote:

> On Thu, 16 Nov 2000, Linus Torvalds wrote:
>
> > On Thu, 16 Nov 2000, Dan Aloni wrote:
> > >
> > > Makes procfs use an atomic use count for dir entries, to avoid using
> > > the Big kernel lock. Axboe says it looks ok.
> >
> > There's a race there. Look at what happens if de_put() races with
> > remove_proc_entry(): we'd do free_proc_entry() twice. Not good.
> >
> > Leave the kernel lock for now.
>
> Is this particular kernel lock helps anyway? We could have been half way
> through remove_proc_entry(), line 569, for example, while in the same time
> another thread enters de_put when the use count is 1 and frees the entry
> while the other thread is locked just before dereferencing the entry.

After reading the code more, I have realized we have a very narrowed race
condition, which can be eliminated if we do this (in remove_proc_entry()):

--- linux/fs/proc/generic.c Fri Nov 17 10:17:33 2000
+++ linux/fs/proc/generic.c Fri Nov 17 10:17:50 2000
@@ -573,10 +573,10 @@
(void *) proc_alloc_map);
proc_kill_inodes(de);
de->nlink = 0;
- de->deleted = 1;
if (!atomic_read(&de->count))
free_proc_entry(de);
else {
+ de->deleted = 1;
printk("remove_proc_entry: %s/%s busy, count=%d\n",
parent->name, de->name, atomic_read(&de->count));
}

Since de_put() only frees entries with deleted != 0, and deleted is only
set to 1 inside remove_proc_entry(), with this patch, there is no way
free_proc_entry() will be called from de_put(), while remove_proc_entry()
is about to free or is freeing the entry. After that 'de->delete = 1', the
proc entry is no longer in the proc entry tree, and a second call to
remove_proc_entry() won't find it at all, and leave it safe for
de_put() to handle.

--
Dan Aloni
[email protected]

2000-11-17 10:22:12

by Dan Aloni

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry

On Fri, 17 Nov 2000, Jacob Luna Lundberg wrote:

> On Fri, 17 Nov 2000, Dan Aloni wrote:
> > If you are right, I guess put_files_struct() of kernel/exit.c would
> > have cleaned files_struct everytime someones called it.
> > Everywhere in the kernel, objects are freed when
> > atomic_dec_and_test() returns true.
>
> Indeed, after studying the asm in question I think I see how it ticks.
> What is the reasoning behind reversing the result of the test instead of
> returning the new value of the counter?

Well, at 98% of the cases the code is in position where it 'ought to do
something' when the counter is 0, and do nothing when it's not, so instead
of not()'ing the counter value in the condition check, we just return the
value not()'ed already.

--
Dan Aloni
[email protected]

2000-11-17 10:55:08

by Ingo Oeser

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry

On Fri, Nov 17, 2000 at 12:35:03AM -0800, Jacob Luna Lundberg wrote:
> > atomic_dec_and_test() returns true.
>
> Indeed, after studying the asm in question I think I see how it ticks.
> What is the reasoning behind reversing the result of the test instead of
> returning the new value of the counter?

The full name of this operation is: "Decrement the value given
and test the result for equality with zero in one atomic operation".

So basically:

#define dec_and_test(i) ( (--i) ? 0 : 1)

but atomically.

This is implemented in hardware for some archs and a required
operation for proper and fast refcounting.

Regards

Ingo Oeser
--
To the systems programmer, users and applications
serve only to provide a test load.
<esc>:x

2000-11-17 10:54:07

by Francois romieu

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry

CPU A: assume de->count = 1 (in de_put)
fs/proc/inode.c::44 if (!--de->count) {
de->count = 0

CPU B: (in remove_proc_entry)
fs/proc/generic.c::577 if (!de->count)
fs/proc/generic.c::578 free_proc_entry(de);

CPU A: (in de_put)
fs/proc/inode.c::45 if (de->deleted) { <-- dereferencing kfreed pointer

What does protect us from the preceding if lock_kernel is thrown ?

--
Ueimor

2000-11-17 12:12:04

by Dan Aloni

[permalink] [raw]
Subject: Re: [PATCH (2.4)] atomic use count for proc_dir_entry

On Fri, 17 Nov 2000, Francois romieu wrote:

> CPU A: assume de->count = 1 (in de_put)
> fs/proc/inode.c::44 if (!--de->count) {
> de->count = 0
>
> CPU B: (in remove_proc_entry)
> fs/proc/generic.c::577 if (!de->count)
> fs/proc/generic.c::578 free_proc_entry(de);
>
> CPU A: (in de_put)
> fs/proc/inode.c::45 if (de->deleted) { <-- dereferencing kfreed pointer
>
> What does protect us from the preceding if lock_kernel is thrown ?

Ok, anyway, notice that in line 41 we return from de_put() without
unlock_kernel()'ing the kernel. It doesn't look good.

--
Dan Aloni
[email protected]