2005-10-24 00:19:20

by Paul Jackson

[permalink] [raw]
Subject: [PATCH] cpuset confine pdflush to its cpuset

This patch keeps pdflush daemons on the same cpuset as their
parent, the kthread daemon.

Some large NUMA configurations put as much as they can of
kernel threads and other classic Unix load in what's called a
bootcpuset, keeping the rest of the system free for dedicated
jobs.

This effort is thwarted by pdflush, which dynamically destroys
and recreates pdflush daemons depending on load.

It's easy enough to force the originally created pdflush deamons
into the bootcpuset, at system boottime. But the pdflush
threads created later were allowed to run freely across the
system, due to the necessary line in their startup kthread():

set_cpus_allowed(current, CPU_MASK_ALL);

By simply coding pdflush to start its threads with the
cpus_allowed restrictions of its cpuset (inherited from kthread,
its parent) we can ensure that dynamically created pdflush
threads are also kept in the bootcpuset.

On systems w/o cpusets, or w/o a bootcpuset implementation,
the following will have no affect, leaving pdflush to run on
any CPU, as before.

Signed-off-by: Paul Jackson <[email protected]>

---

mm/pdflush.c | 13 +++++++++++++
1 files changed, 13 insertions(+)

--- 2.6.14-rc4-mm1-cpuset-patches.orig/mm/pdflush.c 2005-10-17 22:39:41.033879927 -0700
+++ 2.6.14-rc4-mm1-cpuset-patches/mm/pdflush.c 2005-10-23 17:17:03.720802617 -0700
@@ -20,6 +20,7 @@
#include <linux/fs.h> // Needed by writeback.h
#include <linux/writeback.h> // Prototypes pdflush_operation()
#include <linux/kthread.h>
+#include <linux/cpuset.h>


/*
@@ -170,12 +171,24 @@ static int __pdflush(struct pdflush_work
static int pdflush(void *dummy)
{
struct pdflush_work my_work;
+ cpumask_t cpus_allowed;

/*
* pdflush can spend a lot of time doing encryption via dm-crypt. We
* don't want to do that at keventd's priority.
*/
set_user_nice(current, 0);
+
+ /*
+ * Some configs put our parent kthread in a limited cpuset,
+ * which kthread() overrides, forcing cpus_allowed == CPU_MASK_ALL.
+ * Our needs are more modest - cut back to our cpusets cpus_allowed.
+ * This is needed as pdflush's are dynamically created and destroyed.
+ * The boottime pdflush's are easily placed w/o these 2 lines.
+ */
+ cpus_allowed = cpuset_cpus_allowed(current);
+ set_cpus_allowed(current, cpus_allowed);
+
return __pdflush(&my_work);
}


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373


2005-10-24 06:01:11

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: [PATCH] cpuset confine pdflush to its cpuset

Hi Paul,

I realized CPUSETS has another problem around pdflush.

Some cpuset may make most of pages in it dirty, while the others don't.
In this case, pdflush may not start since the ratio of the dirty pages
in the box may be less than the watermark, which is defined globally.
This may probably make it hard to allocate pages from the cpuset
or the nodes it depends on. This wouldn't be good for NUMA machine
without cpusets either.

Do you have any plans about it?

> This patch keeps pdflush daemons on the same cpuset as their
> parent, the kthread daemon.
>
> Some large NUMA configurations put as much as they can of
> kernel threads and other classic Unix load in what's called a
> bootcpuset, keeping the rest of the system free for dedicated
> jobs.
>
> This effort is thwarted by pdflush, which dynamically destroys
> and recreates pdflush daemons depending on load.


Thanks,
Hirokazu Takahashi.

2005-10-24 06:32:58

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH] cpuset confine pdflush to its cpuset

Takahashi-san wrote:
> I realized CPUSETS has another problem around pdflush.

Excellent observation. I had not realized this.

Thank-you for pointing it out.

I don't have plans. Do you have any suggestions?

( Anyone know what the "pd" stands for in pdflush ?? )

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-10-24 06:41:51

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] cpuset confine pdflush to its cpuset

Paul Jackson <[email protected]> wrote:
>
> Takahashi-san wrote:
> > I realized CPUSETS has another problem around pdflush.
>
> Excellent observation. I had not realized this.
>
> Thank-you for pointing it out.
>
> I don't have plans. Do you have any suggestions?

Per-zone dirty thresholds (quite messy), per-zone writeback (horrific,
linear searches or data structure proliferation everywhere).

Let's see a (serious) worload/testcase first, hey? vmscan.c writeback off
the LRU is a bit slow, but we should be able to make it suffice.

> ( Anyone know what the "pd" stands for in pdflush ?? )

"page dirty"? It's what bdflush became when writeback went from
being block-based to being page-based.

2005-10-24 06:49:29

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH] cpuset confine pdflush to its cpuset

Andrew wrote:
> Let's see a (serious) worload/testcase first, hey?

A reasonable request.


> > ( Anyone know what the "pd" stands for in pdflush ?? )
>
> "page dirty"? It's what bdflush became ...

Ah - thanks.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-10-24 07:21:39

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: [PATCH] cpuset confine pdflush to its cpuset

Hi Paul,

> Andrew wrote:
> > Let's see a (serious) worload/testcase first, hey?
>
> A reasonable request.

Can you do this?
I think you may probably use a large NUMA machine.

> > > ( Anyone know what the "pd" stands for in pdflush ?? )
> >
> > "page dirty"? It's what bdflush became ...
>
> Ah - thanks.

2005-10-24 07:37:27

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH] cpuset confine pdflush to its cpuset

Takahashi-san replied to pj:
> > A reasonable request.
>
> Can you do this?
> I think you may probably use a large NUMA machine.

In theory, yes. I certainly have access to large NUMA machines.

However, it is likely not a priority for me. My focus is on work that
will benefit workloads that do not depend on pdflush (except to want to
be sure that pdflush is -not- running in a cpuset containing a dedicated
job.)

That seems to keep me busy enough (and keep my employer paying me),
so I might never get to this problem. I might, but the odds are
not good.

Sorry.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401