2008-12-31 19:10:25

by Peter W. Morreale

[permalink] [raw]
Subject: [PATCH 0/2] fix pdflush races and enhancement v2

Update of pdflush series, changes since last post:

o Change upper bound patch to include update of last_empty_jifs to throttle
thread creation to one per second.

o Change new sysctls to use CTL_UNNUMBERED per Andrew Morton

o Use proc_dointvec_minmax and tie bounds of min and max threads to
the current min/max. Otherwise the possible combinations of actual
threads makes my head hurt.

o Update Documentation/sysctl/vm.txt with the new entries


Peter W Morreale (2):
Add /proc controls for pdflush threads
Fix pdflush thread creation upper bound.


Documentation/sysctl/vm.txt | 28 ++++++++++++++++++++++++++
include/linux/writeback.h | 2 ++
kernel/sysctl.c | 25 +++++++++++++++++++++--
mm/pdflush.c | 47 ++++++++++++++++++++++++++++++++++---------
4 files changed, 90 insertions(+), 12 deletions(-)


Best,
-PWM


2008-12-31 19:10:59

by Peter W. Morreale

[permalink] [raw]
Subject: [PATCH 2/2] Add /proc controls for pdflush threads

This patch adds /proc entries to give the admin the ability to
control the minimum and maximum number of pdflush threads. This allows
finer control of pdflush on both large and small machines.

The rational is simply one size does not fit all. Admins on large
and/or small systems may want to tune the min/max pdflush thread count
to best suit their needs. Right now the min/max is hardcoded to 2/8.
While probably a fair estimate for smaller machines, large machines with
large numbers of CPUs and large numbers of filesystems/block devices may
benefit from larger numbers of threads working on different block
devices.

Even if the background flushing algorithm is radically changed, it is
still likely that multiple threads will be involved and admins would
still desire finer control on the min/max other than to have to
recompile the kernel.

The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
'/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.

The minimum value for nr_pdflush_threads_min is 1 and the maximum value is the
current value of nr_pdflush_threads_max. This minimum is required since
additional thread creation is performed in a pdflush thread itself.

The minimum value for nr_pdflush_threads_max is the current value of
nr_pdflush_threads_min and the maximum value can be 1000.

Documentation/sysctl/vm.txt is also updated.
---

Signed-off-by: Peter W Morreale <[email protected]>

Documentation/sysctl/vm.txt | 28 ++++++++++++++++++++++++++++
include/linux/writeback.h | 2 ++
kernel/sysctl.c | 25 +++++++++++++++++++++++--
mm/pdflush.c | 19 ++++++++++++++-----
4 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index d79eeda..c2a257a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -22,6 +22,8 @@ Currently, these files are in /proc/sys/vm:
- dirty_background_ratio
- dirty_expire_centisecs
- dirty_writeback_centisecs
+- nr_pdflush_threads_min
+- nr_pdflush_threads_max
- highmem_is_dirtyable (only if CONFIG_HIGHMEM set)
- max_map_count
- min_free_kbytes
@@ -50,6 +52,32 @@ See Documentation/filesystems/proc.txt

==============================================================

+nr_pdflush_threads_min
+
+This value controls the minimum number of pdflush threads.
+
+At boot time, the kernel will create and maintain 'nr_pdflush_threads_min'
+threads for the kernel's lifetime.
+
+The default value is 2. The minimum value you can specify is 1, and
+the maximum value is the current setting of 'nr_pdflush_threads_max'.
+
+See 'nr_pdflush_threads_max' below for more information.
+
+==============================================================
+
+nr_pdflush_threads_max
+
+This value controls the maximum number of pdflush threads that can be
+created. The pdflush algorithm will create a new pdflush thread (up to
+this maximum) if no pdflush threads have been available for >= 1 second.
+
+The default value is 8. The minimum value you can specify is the
+current value of 'nr_pdflush_threads_min' and the
+maximum is 1000.
+
+==============================================================
+
overcommit_memory:

This value contains a flag that enables memory overcommitment.
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 12b15c5..ee566a0 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -150,6 +150,8 @@ void writeback_set_ratelimit(void);
/* pdflush.c */
extern int nr_pdflush_threads; /* Global so it can be exported to sysctl
read-only. */
+extern int nr_pdflush_threads_max; /* Global so it can be exported to sysctl */
+extern int nr_pdflush_threads_min; /* Global so it can be exported to sysctl */


#endif /* WRITEBACK_H */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 50ec088..96ba6b3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -88,9 +88,8 @@ extern int rcutorture_runnable;
#endif /* #ifdef CONFIG_RCU_TORTURE_TEST */

/* Constants used for minimum and maximum */
-#if defined(CONFIG_HIGHMEM) || defined(CONFIG_DETECT_SOFTLOCKUP)
static int one = 1;
-#endif
+static int one_thousand = 1000;

#ifdef CONFIG_DETECT_SOFTLOCKUP
static int sixty = 60;
@@ -948,6 +947,28 @@ static struct ctl_table vm_table[] = {
.proc_handler = &proc_dointvec,
},
{
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nr_pdflush_threads_min",
+ .data = &nr_pdflush_threads_min,
+ .maxlen = sizeof nr_pdflush_threads_min,
+ .mode = 0644 /* read-write */,
+ .proc_handler = &proc_dointvec_minmax,
+ .strategy = &sysctl_intvec,
+ .extra1 = &one,
+ .extra2 = &nr_pdflush_threads_max,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nr_pdflush_threads_max",
+ .data = &nr_pdflush_threads_max,
+ .maxlen = sizeof nr_pdflush_threads_max,
+ .mode = 0644 /* read-write */,
+ .proc_handler = &proc_dointvec_minmax,
+ .strategy = &sysctl_intvec,
+ .extra1 = &nr_pdflush_threads_min,
+ .extra2 = &one_thousand,
+ },
+ {
.ctl_name = VM_SWAPPINESS,
.procname = "swappiness",
.data = &vm_swappiness,
diff --git a/mm/pdflush.c b/mm/pdflush.c
index 80745c7..1409649 100644
--- a/mm/pdflush.c
+++ b/mm/pdflush.c
@@ -58,6 +58,14 @@ static DEFINE_SPINLOCK(pdflush_lock);
int nr_pdflush_threads = 0;

/*
+ * The max/min number of pdflush threads. R/W by sysctl at
+ * /proc/sys/vm/nr_pdflush_threads_max
+ */
+int nr_pdflush_threads_max = MAX_PDFLUSH_THREADS;
+int nr_pdflush_threads_min = MIN_PDFLUSH_THREADS;
+
+
+/*
* The time at which the pdflush thread pool last went empty
*/
static unsigned long last_empty_jifs;
@@ -68,7 +76,7 @@ static unsigned long last_empty_jifs;
* Thread pool management algorithm:
*
* - The minimum and maximum number of pdflush instances are bound
- * by MIN_PDFLUSH_THREADS and MAX_PDFLUSH_THREADS.
+ * by nr_pdflush_threads_min and nr_pdflush_threads_max.
*
* - If there have been no idle pdflush instances for 1 second, create
* a new one.
@@ -135,7 +143,8 @@ static int __pdflush(struct pdflush_work *my_work)
*/
if (time_after(jiffies, last_empty_jifs + 1 * HZ)) {
if (list_empty(&pdflush_list)) {
- if (nr_pdflush_threads < MAX_PDFLUSH_THREADS) {
+ if (nr_pdflush_threads <
+ nr_pdflush_threads_max) {
last_empty_jifs = jiffies;
nr_pdflush_threads++;
spin_unlock_irq(&pdflush_lock);
@@ -153,7 +162,7 @@ static int __pdflush(struct pdflush_work *my_work)
*/
if (list_empty(&pdflush_list))
continue;
- if (nr_pdflush_threads <= MIN_PDFLUSH_THREADS)
+ if (nr_pdflush_threads <= nr_pdflush_threads_min)
continue;
pdf = list_entry(pdflush_list.prev, struct pdflush_work, list);
if (time_after(jiffies, pdf->when_i_went_to_sleep + 1 * HZ)) {
@@ -249,9 +258,9 @@ static int __init pdflush_init(void)
* Pre-set nr_pdflush_threads... If we fail to create,
* the count will be decremented.
*/
- nr_pdflush_threads = MIN_PDFLUSH_THREADS;
+ nr_pdflush_threads = nr_pdflush_threads_min;

- for (i = 0; i < MIN_PDFLUSH_THREADS; i++)
+ for (i = 0; i < nr_pdflush_threads_min; i++)
start_one_pdflush_thread();
return 0;
}

2008-12-31 19:10:41

by Peter W. Morreale

[permalink] [raw]
Subject: [PATCH 1/2] Fix pdflush thread creation upper bound.

This patch fixes a race on creating pdflush threads. Without the patch,
it is possible to create more than MAX_PDFLUSH_THREADS threads, and this
has been observed in practice on IO loaded SMP machines.

The fix involves moving the lock around to protect the check against the
thread count and correctly dealing with thread creation failure.

This fix also _mostly_ repairs a race condition on how quickly the
threads are created. The original intent was to create a pdflush thread
(up to the max allowed) every second. Without this patch is is possible
to create NCPUS pdflush threads concurrently. The 'mostly' caveat is
because an assumption is made that thread creation will be successful.
If we fail to create the thread, the miss is not considered fatal. (we
will try again in 1 second)
---

Signed-off-by: Peter W Morreale <[email protected]>

mm/pdflush.c | 32 +++++++++++++++++++++++++-------
1 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/mm/pdflush.c b/mm/pdflush.c
index 0cbe0c6..80745c7 100644
--- a/mm/pdflush.c
+++ b/mm/pdflush.c
@@ -98,7 +98,6 @@ static int __pdflush(struct pdflush_work *my_work)
INIT_LIST_HEAD(&my_work->list);

spin_lock_irq(&pdflush_lock);
- nr_pdflush_threads++;
for ( ; ; ) {
struct pdflush_work *pdf;

@@ -126,20 +125,26 @@ static int __pdflush(struct pdflush_work *my_work)

(*my_work->fn)(my_work->arg0);

+ spin_lock_irq(&pdflush_lock);
+
/*
* Thread creation: For how long have there been zero
- * available threads?
+ * available threads?
+ *
+ * To throttle creation, we reset last_empty_jifs.
*/
if (time_after(jiffies, last_empty_jifs + 1 * HZ)) {
- /* unlocked list_empty() test is OK here */
if (list_empty(&pdflush_list)) {
- /* unlocked test is OK here */
- if (nr_pdflush_threads < MAX_PDFLUSH_THREADS)
+ if (nr_pdflush_threads < MAX_PDFLUSH_THREADS) {
+ last_empty_jifs = jiffies;
+ nr_pdflush_threads++;
+ spin_unlock_irq(&pdflush_lock);
start_one_pdflush_thread();
+ spin_lock_irq(&pdflush_lock);
+ }
}
}

- spin_lock_irq(&pdflush_lock);
my_work->fn = NULL;

/*
@@ -226,13 +231,26 @@ int pdflush_operation(void (*fn)(unsigned long), unsigned long arg0)

static void start_one_pdflush_thread(void)
{
- kthread_run(pdflush, NULL, "pdflush");
+ struct task_struct *k;
+
+ k = kthread_run(pdflush, NULL, "pdflush");
+ if (unlikely(IS_ERR(k))) {
+ spin_lock_irq(&pdflush_lock);
+ nr_pdflush_threads--;
+ spin_unlock_irq(&pdflush_lock);
+ }
}

static int __init pdflush_init(void)
{
int i;

+ /*
+ * Pre-set nr_pdflush_threads... If we fail to create,
+ * the count will be decremented.
+ */
+ nr_pdflush_threads = MIN_PDFLUSH_THREADS;
+
for (i = 0; i < MIN_PDFLUSH_THREADS; i++)
start_one_pdflush_thread();
return 0;

2008-12-31 21:35:43

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 2/2] Add /proc controls for pdflush threads

Peter W Morreale wrote:
> This patch adds /proc entries to give the admin the ability to
> control the minimum and maximum number of pdflush threads. This allows
> finer control of pdflush on both large and small machines.

> Signed-off-by: Peter W Morreale <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

--
All rights reversed.

2008-12-31 21:30:31

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix pdflush thread creation upper bound.

Peter W Morreale wrote:
> This patch fixes a race on creating pdflush threads. Without the patch,
> it is possible to create more than MAX_PDFLUSH_THREADS threads, and this
> has been observed in practice on IO loaded SMP machines.

>
> Signed-off-by: Peter W Morreale <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

--
All rights reversed.