2008-07-28 02:53:57

by Li Zefan

[permalink] [raw]
Subject: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

Use cpuset.stack_list rather than kfifo, so we avoid memory allocation
for kfifo.

Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/cpuset.c | 21 ++++++++-------------
1 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3624dc0..d0c57ac 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -54,7 +54,6 @@
#include <asm/uaccess.h>
#include <asm/atomic.h>
#include <linux/mutex.h>
-#include <linux/kfifo.h>
#include <linux/workqueue.h>
#include <linux/cgroup.h>

@@ -532,7 +531,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
* So the reverse nesting would risk an ABBA deadlock.
*
* The three key local variables below are:
- * q - a kfifo queue of cpuset pointers, used to implement a
+ * q - a linked-list queue of cpuset pointers, used to implement a
* top-down scan of all cpusets. This scan loads a pointer
* to each cpuset marked is_sched_load_balance into the
* array 'csa'. For our purposes, rebuilding the schedulers
@@ -567,7 +566,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)

void rebuild_sched_domains(void)
{
- struct kfifo *q; /* queue of cpusets to be scanned */
+ LIST_HEAD(q); /* queue of cpusets to be scanned*/
struct cpuset *cp; /* scans q */
struct cpuset **csa; /* array of all cpuset ptrs */
int csn; /* how many cpuset ptrs in csa so far */
@@ -577,7 +576,6 @@ void rebuild_sched_domains(void)
int ndoms; /* number of sched domains in result */
int nslot; /* next empty doms[] cpumask_t slot */

- q = NULL;
csa = NULL;
doms = NULL;
dattr = NULL;
@@ -597,20 +595,19 @@ void rebuild_sched_domains(void)
goto rebuild;
}

- q = kfifo_alloc(number_of_cpusets * sizeof(cp), GFP_KERNEL, NULL);
- if (IS_ERR(q))
- goto done;
csa = kmalloc(number_of_cpusets * sizeof(cp), GFP_KERNEL);
if (!csa)
goto done;
csn = 0;

- cp = &top_cpuset;
- __kfifo_put(q, (void *)&cp, sizeof(cp));
- while (__kfifo_get(q, (void *)&cp, sizeof(cp))) {
+ list_add(&top_cpuset.stack_list, &q);
+ while (!list_empty(&q)) {
struct cgroup *cont;
struct cpuset *child; /* scans child cpusets of cp */

+ cp = list_first_entry(&q, struct cpuset, stack_list);
+ list_del(q.next);
+
if (cpus_empty(cp->cpus_allowed))
continue;

@@ -619,7 +616,7 @@ void rebuild_sched_domains(void)

list_for_each_entry(cont, &cp->css.cgroup->children, sibling) {
child = cgroup_cs(cont);
- __kfifo_put(q, (void *)&child, sizeof(cp));
+ list_add_tail(&child->stack_list, &q);
}
}

@@ -702,8 +699,6 @@ rebuild:
put_online_cpus();

done:
- if (q && !IS_ERR(q))
- kfifo_free(q);
kfree(csa);
/* Don't kfree(doms) -- partition_sched_domains() does that. */
/* Don't kfree(dattr) -- partition_sched_domains() does that. */
--
1.5.4.rc3


2008-07-28 04:37:57

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

Li Zefan wrote:
> + list_add(&top_cpuset.stack_list, &q);

Can you figure out if we are holding any system wide lock, such as
cgroup_mutex, at this point? There is only one top_cpuset.stack_list
in the system, so I would think that we need to single thread its usage
somehow.

The hotplug hooks, such as common_cpu_mem_hotplug_unplug() that were
added sometime in the last few months (while the so called maintainer,
who's initials seem to be 'pj', was asleep ;) don't describe what
locking applies to them very well, and they call rebuild_sched_domains(),
where the above line of code lives.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-07-28 05:12:18

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

pj wrote:
> Can you figure out if we are holding any system wide lock, such as
> cgroup_mutex, at this point?

Nevermind ... this is in rebuild_sched_domains(), which must be
called with cgroup_mutex held. So we danged well better be holding
a system wide lock.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-07-28 06:01:53

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

Li Zefan wrote:
> - q = kfifo_alloc(number_of_cpusets * sizeof(cp), GFP_KERNEL, NULL);


The block comment for rebuild_sched_domains() states:

> ... May take callback_mutex during
> * call due to the kfifo_alloc() and kmalloc() calls.

I suspect that mention of kfifo_alloc() is no longer correct,
with your patch. If so, perhaps you could send a little additional
fix patch, to remove that mention from the comment.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-07-28 17:20:48

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

Paul Jackson wrote:
> Li Zefan wrote:
>> - q = kfifo_alloc(number_of_cpusets * sizeof(cp), GFP_KERNEL, NULL);
>
>
> The block comment for rebuild_sched_domains() states:
>
>> ... May take callback_mutex during
>> * call due to the kfifo_alloc() and kmalloc() calls.
>
> I suspect that mention of kfifo_alloc() is no longer correct,
> with your patch. If so, perhaps you could send a little additional
> fix patch, to remove that mention from the comment.
>

Paul, please take a look at
cpuset: Rework sched domains and CPU hotplug handling
patch I sent out last week.
I'd appreciate if we applied that one first. It simplifies lock nesting
and rearranges the way sched domains are rebuilt. It is IMO a bit higher
priority than this patch because scheduler depends on the
rebuild_sched_domains() call and we have to call rebuild_sched_domains()
from cpu hotplug handlers.

Max

2008-07-29 01:26:52

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

> Paul, please take a look at
> cpuset: Rework sched domains and CPU hotplug handling
> patch I sent out last week.
> I'd appreciate if we applied that one first. It simplifies lock nesting
> and rearranges the way sched domains are rebuilt. It is IMO a bit higher
> priority than this patch because scheduler depends on the
> rebuild_sched_domains() call and we have to call rebuild_sched_domains()
> from cpu hotplug handlers.
>

I don't mind which one gets first. :)

2008-07-29 01:32:20

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

Paul Jackson wrote:
> Li Zefan wrote:
>> - q = kfifo_alloc(number_of_cpusets * sizeof(cp), GFP_KERNEL, NULL);
>
>
> The block comment for rebuild_sched_domains() states:
>
>> ... May take callback_mutex during
>> * call due to the kfifo_alloc() and kmalloc() calls.
>
> I suspect that mention of kfifo_alloc() is no longer correct,
> with your patch. If so, perhaps you could send a little additional
> fix patch, to remove that mention from the comment.
>

Yes, but I just noticed Max removes that comment completely in his patch,
so I guess I can leave it as it is.

2008-07-29 01:45:59

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

Paul Jackson wrote:
> pj wrote:
>> Can you figure out if we are holding any system wide lock, such as
>> cgroup_mutex, at this point?
>
> Nevermind ... this is in rebuild_sched_domains(), which must be
> called with cgroup_mutex held. So we danged well better be holding
> a system wide lock.
>

Yes, and even after Max's patch, the code playing with ->stack_list is
still protected by cgroup_mutex. :)

2008-07-29 13:04:26

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

Li Zefan wrote:
> Yes, but I just noticed Max removes that comment completely in his patch,
> so I guess I can leave it as it is.

Yeah - leave it as it is.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-07-30 10:13:22

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpuset: clean up cpuset hierarchy traversal code

Max wrote:
> Paul, please take a look at
> cpuset: Rework sched domains and CPU hotplug handling
> patch I sent out last week.

Ok - I've been trying to get to this for the last two days,
but my good employer has reorganized me twice so far this week ;).

If I can keep the same boss for more than eight hours,
I should find time to get to this today.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214