2004-09-16 01:29:49

by Paul Jackson

[permalink] [raw]
Subject: [Patch] cpusets: document proc status allowed fields

Document the /proc/<pid>/status fields added in an
earlier cpuset patch for Cpus_allowed and Mems_allowed.

Signed-off-by: Paul Jackson <[email protected]>

Index: 2.6.9-rc1-mm4/Documentation/cpusets.txt
===================================================================
--- 2.6.9-rc1-mm4.orig/Documentation/cpusets.txt 2004-09-08 15:09:56.000000000 -0700
+++ 2.6.9-rc1-mm4/Documentation/cpusets.txt 2004-09-12 00:29:01.000000000 -0700
@@ -151,6 +151,14 @@ Each task under /proc has an added file
the cpuset name, as the path relative to the root of the cpuset file
system.

+The /proc/<pid>/status file for each task has two added lines,
+displaying the tasks cpus_allowed (on which CPUs it may be scheduled)
+and mems_allowed (on which Memory Nodes it may obtain memory),
+in the format seen in the following example:
+
+ Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
+ Mems_allowed: ffffffff,ffffffff
+
Each cpuset is represented by a directory in the cpuset file system
containing the following files describing that cpuset:


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373


2004-09-16 14:02:13

by Simon Derr

[permalink] [raw]
Subject: [Patch] cpusets: fix race in cpuset_add_file()



Hi,

This patch fixes a missing down()/up() pair in cpuset_add_file().

Without this patch, sometimes it is possible to have two duplicate
dentries for a single file of a cpuset, with one of them being invalid,
and thus the file is present but cannot be opened...

Something like:

# cd /dev/cpuset/foo
# ls
ls: cpus: No such file or directory



The patch also removes comments that
1/are now bogus with this fix applied
2/were not respected anyway



Signed-off-by: Simon Derr <[email protected]>

Index: mm4/kernel/cpuset.c
===================================================================
--- mm4.orig/kernel/cpuset.c 2004-09-13 09:43:02.000000000 +0200
+++ mm4/kernel/cpuset.c 2004-09-16 15:46:21.847401360 +0200
@@ -956,13 +956,12 @@ static int cpuset_create_dir(struct cpus
return error;
}

-/* MUST be called with dir->d_inode->i_sem held */
-
static int cpuset_add_file(struct dentry *dir, const struct cftype *cft)
{
struct dentry *dentry;
int error;

+ down(&dir->d_inode->i_sem);
dentry = cpuset_get_dentry(dir, cft->name);
if (!IS_ERR(dentry)) {
error = cpuset_create_file(dentry, 0644 | S_IFREG);
@@ -971,6 +970,7 @@ static int cpuset_add_file(struct dentry
dput(dentry);
} else
error = PTR_ERR(dentry);
+ up(&dir->d_inode->i_sem);
return error;
}

@@ -1162,7 +1162,6 @@ static struct cftype cft_notify_on_relea
.private = FILE_NOTIFY_ON_RELEASE,
};

-/* MUST be called with ->d_inode->i_sem held */
static int cpuset_populate_dir(struct dentry *cs_dentry)
{
int err;

2004-09-16 15:01:51

by Paul Jackson

[permalink] [raw]
Subject: Re: [Patch] cpusets: fix race in cpuset_add_file()

Color me confused - this cpuset_sem down/up should not be needed,
and should deadlock. In the call chain:

cpuset_mkdir -> cpuset_create -> cpuset_populate_dir -> cpuset_add_file

cpuset_create() already holds the cpuset_sem for the duration, and you're
adding another cpuset_sem down in cpuset_add_file(), which should deadlock.

If you are seeing the duplicate invalid cpuset entries, then must be
something else going on, unfortunately.

That, or I'm confused.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2004-09-16 15:41:37

by Simon Derr

[permalink] [raw]
Subject: Re: [Patch] cpusets: fix race in cpuset_add_file()

On Thu, 16 Sep 2004, Paul Jackson wrote:

> Color me confused - this cpuset_sem down/up should not be needed,
> and should deadlock. In the call chain:
>
> cpuset_mkdir -> cpuset_create -> cpuset_populate_dir -> cpuset_add_file
>
> cpuset_create() already holds the cpuset_sem for the duration, and you're
> adding another cpuset_sem down in cpuset_add_file(), which should deadlock.
>
> If you are seeing the duplicate invalid cpuset entries, then must be
> something else going on, unfortunately.
>
> That, or I'm confused.
Neither.
You've just read the patch too quickly:

+ down(&dir->d_inode->i_sem);

NOT down(&cpuset_sem);

However, your remark is welcome, since there is indeed a slight chance of
deadlock with my patch, but it needs 2 mkdirs racing.

imagine:

mkdir a/b mkdir a/b/c

sys_mkdir(): down(a->i_sem);
cpuset_create(): down(cpuset_sem);
sys_mkdir(): down(b->i_sem);
cpuset_add_file(): down(b->i_sem);
cpuset_create(): down(cpuset_sem);


-> deadlock.


So we should release cpuset_sem a bit earlier in cpuset_create(), before
calling cpuset_populate_dir().





This updated patch should be better (hopefully).

Signed-off-by: Simon Derr <[email protected]>

Index: mm4/kernel/cpuset.c
===================================================================
--- mm4.orig/kernel/cpuset.c 2004-09-13 09:43:02.000000000 +0200
+++ mm4/kernel/cpuset.c 2004-09-16 17:23:48.764321923 +0200
@@ -956,13 +956,12 @@ static int cpuset_create_dir(struct cpus
return error;
}

-/* MUST be called with dir->d_inode->i_sem held */
-
static int cpuset_add_file(struct dentry *dir, const struct cftype *cft)
{
struct dentry *dentry;
int error;

+ down(&dir->d_inode->i_sem);
dentry = cpuset_get_dentry(dir, cft->name);
if (!IS_ERR(dentry)) {
error = cpuset_create_file(dentry, 0644 | S_IFREG);
@@ -971,6 +970,7 @@ static int cpuset_add_file(struct dentry
dput(dentry);
} else
error = PTR_ERR(dentry);
+ up(&dir->d_inode->i_sem);
return error;
}

@@ -1162,7 +1162,6 @@ static struct cftype cft_notify_on_relea
.private = FILE_NOTIFY_ON_RELEASE,
};

-/* MUST be called with ->d_inode->i_sem held */
static int cpuset_populate_dir(struct dentry *cs_dentry)
{
int err;
@@ -1219,9 +1218,16 @@ static long cpuset_create(struct cpuset
err = cpuset_create_dir(cs, name, mode);
if (err < 0)
goto err;
+
+ /* release cpuset_sem before cpuset_populate_dir()
+ * because it will down() this new directory's i_sem
+ * and if we race with another mkdir,
+ * we might deadlock
+ */
+ up(&cpuset_sem);
+
err = cpuset_populate_dir(cs->dentry);
/* If err < 0, we have a half-filled directory - oh well ;) */
- up(&cpuset_sem);
return 0;
err:
list_del(&cs->sibling);

2004-09-16 16:08:48

by Paul Jackson

[permalink] [raw]
Subject: Re: [Patch] cpusets: fix race in cpuset_add_file()

Simon wrote:
> However, your remark is welcome, since there is indeed a slight chance of
> deadlock with my patch, but it needs 2 mkdirs racing.

Glad my confusions led to some good.

Let me think about this one a bit.

I had it in my brain that the cpuset_sem should be held across the
entire cpuset_populate_dir(), as if there would be a problem with
another task getting this lock while a directory was half populated.

But I don't know if that was a valid concern, or just a superstition on
my part. I'll stare at this a bit more, and get back to you, either
way.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2004-09-17 07:25:37

by Paul Jackson

[permalink] [raw]
Subject: Re: [Patch] cpusets: fix race in cpuset_add_file()

You can continue to ignore this patch, Andrew. I'm still thinking it
through with Simon.

Here's another possible way to skin this cat, Simon.

Instead of adding an inode lock, how about just extending the cpuset_sem
window. If we hold cpuset_sem for the entire cpuset_mkdir() operation,
then no other cpuset_mkdir can overlap, and there should be no
confused overlapping directory creations.

This reduces the risks of unrecognized A-B-A deadlocks, and it removes
the concern I have that dropping the cpuset_sem before we're done opens
the way for more inconsistencies.

This needs to be tested before it goes in - there is a non-zero risk
that I made a stupid mistake and it locks up or something.

Signed-off-by: Paul Jackson <[email protected]>

Index: 2.6.9-rc2-mm1/kernel/cpuset.c
===================================================================
--- 2.6.9-rc2-mm1.orig/kernel/cpuset.c 2004-09-16 23:46:01.000000000 -0700
+++ 2.6.9-rc2-mm1/kernel/cpuset.c 2004-09-17 00:19:02.000000000 -0700
@@ -1235,7 +1235,6 @@ static long cpuset_create(struct cpuset
if (!cs)
return -ENOMEM;

- down(&cpuset_sem);
cs->flags = 0;
if (notify_on_release(parent))
set_bit(CS_NOTIFY_ON_RELEASE, &cs->flags);
@@ -1256,22 +1255,23 @@ static long cpuset_create(struct cpuset
goto err;
err = cpuset_populate_dir(cs->dentry);
/* If err < 0, we have a half-filled directory - oh well ;) */
- up(&cpuset_sem);
return 0;
err:
list_del(&cs->sibling);
- up(&cpuset_sem);
kfree(cs);
return err;
}

static int cpuset_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{
- struct dentry *d_parent = dentry->d_parent;
- struct cpuset *c_parent = (struct cpuset *)d_parent->d_fsdata;
+ struct cpuset *c_parent;
+ int rc;

- /* the vfs holds inode->i_sem already */
- return cpuset_create(c_parent, dentry->d_name.name, mode | S_IFDIR);
+ down(&cpuset_sem);
+ c_parent = dentry->d_parent->d_fsdata;
+ rc = cpuset_create(c_parent, dentry->d_name.name, mode | S_IFDIR);
+ up(&cpuset_sem);
+ return rc;
}

static int cpuset_rmdir(struct inode *unused_dir, struct dentry *dentry)


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2004-09-17 07:56:32

by Simon Derr

[permalink] [raw]
Subject: Re: [Patch] cpusets: fix race in cpuset_add_file()

On Fri, 17 Sep 2004, Paul Jackson wrote:

> You can continue to ignore this patch, Andrew. I'm still thinking it
> through with Simon.
>
> Here's another possible way to skin this cat, Simon.
>
> Instead of adding an inode lock, how about just extending the cpuset_sem
> window. If we hold cpuset_sem for the entire cpuset_mkdir() operation,
> then no other cpuset_mkdir can overlap, and there should be no
> confused overlapping directory creations.
>
no - the inode lock is necessary.

don't let my second mail with the deadlock example confuse you : the
original problem (i.e the problem my patch fixes) is a race between a
cpuset mkdir() and another operation in the newly created directory, for
instance just `ls'.

the race is:

mkdir a/b | ls a/b/cpus
|
cpuset_add_file(b, "cpus") | vfs_stat()
cpuset_get_dentry(b, "cpus") | user_path_walk()
lookup_hash(b, "cpus") | path_lookup()
cached_lookup(b, "cpus") | link_path_walk()
d_alloc(b, "cpus") | do_lookup()
| real_lookup()
| down(b->i_sem);
| d_lookup(b, "cpus");
| d_alloc(b, "cpus");



The result is that `ls' and `mkdir' both create a dentry for a/b/cpus, and
the dentry created by `ls' is bogus since it does not point to the cpuset
data.

The proper way to prevent this is to lock the i_sem of directory b.
This can be done in cpuset_populate_dir(), in cpuset_add_file(), or
cpuset_get_dentry().

The similar piece of code in sysfs does it in add_file().

If your are not convinced try the following script.
Without my patch it triggers the bug in a few seconds.

Simon.



#! /bin/bash

a()
{
name=$1
echo dir is /dev/cpuset/$name
while :; do
mkdir /dev/cpuset/$name
if ! test -r /dev/cpuset/$name/cpus; then
echo missing /dev/cpuset/$name/cpus
exit 1
fi
rmdir /dev/cpuset/$name
done
}

b()
{
name=$1
while :; do
test -r /dev/cpuset/$name/cpus
done
}

p=$$

b $p &
a $p

2004-09-17 09:12:19

by Paul Jackson

[permalink] [raw]
Subject: Re: [Patch] cpusets: fix race in cpuset_add_file()

Simon wrote:
> The result is that `ls' and `mkdir' both create a dentry for a/b/cpus

Ouch - persuasive - well presented - thanks.

On the flip side, I am not finding any firm basis in the concerns I had
that led me down the other path.

Give me a few hours to run this through a bit of unit testing on my
side, then I will likely endorse your patch.

I have two minor patches on my side:
1) add CONFIG_CPUSETS=y to sn2_defconfig, and
2) remove some more casts of (void *)d_fsdata.

If it's ok by you, and I don't have any more questions, I will send all
three along to Andrew, as a set, in a few hours.

Good work, Simon.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373