2005-05-26 08:25:52

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 2.6.12-rc4] cpuset rmdir scheduling while atomic fix

Andrew,

This fixes a complaint that I am seeing while running
a particular stress test. Please push it along at your
convenience.

The cpuset kernel code can generate a "scheduling while atomic"
complaint from the cpuset_rmdir code. This complaint means
that we had to sleep while trying to get the cpuset_sem global
semaphore during the handling of a 'rmdir()' call to remove
a cpuset.

The existing code tries to take the global cpuset_sem semaphore
while holding a dentry spinlock. The fix is easy enough -
the code that requires cpuset_sem can be moved below the point
where the dentry spinlock is released.

This bug is usually only seen when running stress tests or
loads causing rapid cpuset creation and deletion and queries.

The following fix has been tested using a current -linus git
kernel. Without the fix, I have a stress test that generates a
scheduling while atomic complaint every few seconds. With the
fix, I've seen no more complaints in several hours of the same
stress test.

Signed-off-by: Paul Jackson <[email protected]>

Index: 2.6-cpuset_path_fix/kernel/cpuset.c
===================================================================
--- 2.6-cpuset_path_fix.orig/kernel/cpuset.c 2005-05-20 22:11:48.000000000 -0700
+++ 2.6-cpuset_path_fix/kernel/cpuset.c 2005-05-20 22:12:15.000000000 -0700
@@ -1320,11 +1320,11 @@ static int cpuset_rmdir(struct inode *un
parent = cs->parent;
set_bit(CS_REMOVED, &cs->flags);
list_del(&cs->sibling); /* delete my sibling from parent->children */
- if (list_empty(&parent->children))
- check_for_release(parent);
d = dget(cs->dentry);
cs->dentry = NULL;
spin_unlock(&d->d_lock);
+ if (list_empty(&parent->children))
+ check_for_release(parent);
cpuset_d_remove_dir(d);
dput(d);
up(&cpuset_sem);

--
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373


2005-05-26 12:33:12

by Dinakar Guniguntala

[permalink] [raw]
Subject: Re: [PATCH 2.6.12-rc4] cpuset rmdir scheduling while atomic fix

On Thu, May 26, 2005 at 01:25:16AM -0700, Paul Jackson wrote:
>
> The cpuset kernel code can generate a "scheduling while atomic"
> complaint from the cpuset_rmdir code. This complaint means
> that we had to sleep while trying to get the cpuset_sem global
> semaphore during the handling of a 'rmdir()' call to remove
> a cpuset.
>

Paul, This was the same problem that I had reported earlier
and fixed as well

See, Message Id: [email protected] on google groups

As far as I can see this has already been fixed and is in
2.6.12-rc5-mm1

-Dinakar

2005-05-26 14:25:05

by Dinakar Guniguntala

[permalink] [raw]
Subject: Re: [PATCH 2.6.12-rc4] cpuset rmdir scheduling while atomic fix

Earlier I wrote:
> Paul, This was the same problem that I had reported earlier
> and fixed as well
>
> See, Message Id: [email protected] on google groups
>
> As far as I can see this has already been fixed and is in
> 2.6.12-rc5-mm1

Well not exactly the same problem as what you are seeing, but
the fix was the same. It should be fixed in rc5-mm1. I can send
a patch against 2.6.12-rc5 if you want.

-Dinakar

2005-05-26 18:27:02

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 2.6.12-rc4] cpuset rmdir scheduling while atomic fix

Dinakar wrote:
> This was the same problem that I had reported earlier
> and fixed as well

You're right. I even signed off on your change.

Andrew,

Drop my "cpuset rmdir scheduling while atomic fix" patch.

Dinakar's dynamic-sched-domains-cpuset-changes.patch fixed it.

Dinakar's change wasn't quite the same as mine. It is actually the
better of the two - it reduces the dentry spinlock region, instead of
moving the cpuset_sem region outside the spinlock. I just ran my stress
test that shows this bug. Dinakar's fix is fine, as expected.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401