PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.
E.g. architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero.
The futex implementation assumes that tids fit into the FUTEX_TID_MASK.
This limits the number of allowable threads.
version 2:
* use div64_u64
* check against FUTEX_TID_MASK
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
kernel/fork.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index cf65139..1449923 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -74,6 +74,7 @@
#include <linux/uprobes.h>
#include <linux/aio.h>
#include <linux/compiler.h>
+#include <linux/math64.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -255,6 +256,8 @@ void __init __weak arch_task_cache_init(void) { }
void __init fork_init(unsigned long mempages)
{
+ u64 temp;
+
#ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
#ifndef ARCH_MIN_TASKALIGN
#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
@@ -273,7 +276,16 @@ void __init fork_init(unsigned long mempages)
* value: the thread structures can take up at most half
* of memory.
*/
- max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
+ temp = div64_u64((u64) mempages * (u64) PAGE_SIZE,
+ (u64) THREAD_SIZE * 8UL);
+
+ /*
+ * The futex code assumes that tids fit into the FUTEX_TID_MASK.
+ */
+ if (temp < FUTEX_TID_MASK)
+ max_threads = temp;
+ else
+ max_threads = FUTEX_TID_MASK;
/*
* we need to allow at least 20 threads to boot a system
--
2.1.4
On Tue, 17 Feb 2015 20:01:38 +0100 Heinrich Schuchardt <[email protected]> wrote:
> PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
> THREAD_SIZE.
>
> E.g. architecture hexagon may have page size 1M and thread size 4096.
>
> This would lead to a division by zero.
>
> The futex implementation assumes that tids fit into the FUTEX_TID_MASK.
> This limits the number of allowable threads.
>
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -74,6 +74,7 @@
> #include <linux/uprobes.h>
> #include <linux/aio.h>
> #include <linux/compiler.h>
> +#include <linux/math64.h>
>
> #include <asm/pgtable.h>
> #include <asm/pgalloc.h>
> @@ -255,6 +256,8 @@ void __init __weak arch_task_cache_init(void) { }
>
> void __init fork_init(unsigned long mempages)
> {
> + u64 temp;
That's a really poor name. We should always try to make names
meaningful. Here, something like "threads" would be better.
> #ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
> #ifndef ARCH_MIN_TASKALIGN
> #define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
> @@ -273,7 +276,16 @@ void __init fork_init(unsigned long mempages)
> * value: the thread structures can take up at most half
> * of memory.
> */
> - max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
> + temp = div64_u64((u64) mempages * (u64) PAGE_SIZE,
> + (u64) THREAD_SIZE * 8UL);
> +
> + /*
> + * The futex code assumes that tids fit into the FUTEX_TID_MASK.
> + */
> + if (temp < FUTEX_TID_MASK)
> + max_threads = temp;
> + else
> + max_threads = FUTEX_TID_MASK;
Seems rather complicated. How about
max_threads = mempages / (8 * THREAD_SIZE);
max_threads *= PAGE_SIZE;
max_threads = min(max_threads, FUTEX_TID_MASK);
And while we're there, I do think the comments need a refresh. What
does "the thread structures can take up at most half of memory" mean?
And what's the reasoning behind that "8"? I suggest we just delete all
that and make a new attempt at explaining why the code is this way.
On Tue, Feb 17, 2015 at 03:15:11PM -0800, Andrew Morton wrote:
> On Tue, 17 Feb 2015 20:01:38 +0100 Heinrich Schuchardt <[email protected]> wrote:
>
> > PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
> > THREAD_SIZE.
> >
> > E.g. architecture hexagon may have page size 1M and thread size 4096.
> >
> > This would lead to a division by zero.
> >
> > The futex implementation assumes that tids fit into the FUTEX_TID_MASK.
> > This limits the number of allowable threads.
> >
> > ...
> >
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -74,6 +74,7 @@
> > #include <linux/uprobes.h>
> > #include <linux/aio.h>
> > #include <linux/compiler.h>
> > +#include <linux/math64.h>
> >
> > #include <asm/pgtable.h>
> > #include <asm/pgalloc.h>
> > @@ -255,6 +256,8 @@ void __init __weak arch_task_cache_init(void) { }
> >
> > void __init fork_init(unsigned long mempages)
> > {
> > + u64 temp;
>
> That's a really poor name. We should always try to make names
> meaningful. Here, something like "threads" would be better.
>
> > #ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
> > #ifndef ARCH_MIN_TASKALIGN
> > #define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
> > @@ -273,7 +276,16 @@ void __init fork_init(unsigned long mempages)
> > * value: the thread structures can take up at most half
> > * of memory.
> > */
> > - max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
> > + temp = div64_u64((u64) mempages * (u64) PAGE_SIZE,
> > + (u64) THREAD_SIZE * 8UL);
> > +
> > + /*
> > + * The futex code assumes that tids fit into the FUTEX_TID_MASK.
> > + */
> > + if (temp < FUTEX_TID_MASK)
> > + max_threads = temp;
> > + else
> > + max_threads = FUTEX_TID_MASK;
>
> Seems rather complicated. How about
>
> max_threads = mempages / (8 * THREAD_SIZE);
Apparently it is possible that
mempages < 8 * THREAD_SIZE
which would result in max_threads == 0.
How about the following ?
max_threads = mempages / DIV_ROUND_UP(8 * THREAD_SIZE, PAGE_SIZE);
Guenter
Hello Andrew,
thank you for your comments. Unfortunately there is no solution with
32-bit calculus. Please, see my answers below.
As fork_init is only called once there should be not performance issue
in using 64-bit calculus.
I think that my patch did not cover all problems connected to max_threads.
I just had a look at the memory hotplugging code.
Shouldn't max_threads and init_task.signal->rlim[RLIMIT_NPROC] be
recalculated after adding or removing memory?
This could be done in a hotplug callback.
max_threads can be set by writing to /proc/sys/kernel/threads-max.
Shouldn't the value be checked by the same routine and shouldn't
init_task.signal->rlim[RLIMIT_NPROC] be updated?
Best regards
Heinrich
On 18.02.2015 00:15, Andrew Morton wrote:
> On Tue, 17 Feb 2015 20:01:38 +0100 Heinrich Schuchardt <[email protected]> wrote:
>
>> PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
>> THREAD_SIZE.
>>
>> E.g. architecture hexagon may have page size 1M and thread size 4096.
>>
>> This would lead to a division by zero.
>>
>> The futex implementation assumes that tids fit into the FUTEX_TID_MASK.
>> This limits the number of allowable threads.
>>
>> ...
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -74,6 +74,7 @@
>> #include <linux/uprobes.h>
>> #include <linux/aio.h>
>> #include <linux/compiler.h>
>> +#include <linux/math64.h>
>>
>> #include <asm/pgtable.h>
>> #include <asm/pgalloc.h>
>> @@ -255,6 +256,8 @@ void __init __weak arch_task_cache_init(void) { }
>>
>> void __init fork_init(unsigned long mempages)
>> {
>> + u64 temp;
>
> That's a really poor name. We should always try to make names
> meaningful. Here, something like "threads" would be better.
ok.
>
>> #ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
>> #ifndef ARCH_MIN_TASKALIGN
>> #define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
>> @@ -273,7 +276,16 @@ void __init fork_init(unsigned long mempages)
>> * value: the thread structures can take up at most half
>> * of memory.
>> */
>> - max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
>> + temp = div64_u64((u64) mempages * (u64) PAGE_SIZE,
>> + (u64) THREAD_SIZE * 8UL);
>> +
>> + /*
>> + * The futex code assumes that tids fit into the FUTEX_TID_MASK.
>> + */
>> + if (temp < FUTEX_TID_MASK)
>> + max_threads = temp;
>> + else
>> + max_threads = FUTEX_TID_MASK;
>
> Seems rather complicated. How about
>
> max_threads = mempages / (8 * THREAD_SIZE);
If 8 * THREAD_SIZE > mempages this gives 0.
> max_threads *= PAGE_SIZE;
If mempages / (8 * THREAD_SIZE) * PAGE_SIZE > INT_MAX
an overflow occurs (e.g. total memory = 96TB, THREAD_SIZE = 4kB).
> max_threads = min(max_threads, FUTEX_TID_MASK);
>
> And while we're there, I do think the comments need a refresh. What
> does "the thread structures can take up at most half of memory" mean?
> And what's the reasoning behind that "8"? I suggest we just delete all
> that and make a new attempt at explaining why the code is this way.
ok.
>
>
On 18.02.2015 20:38, Guenter Roeck wrote:
> On Tue, Feb 17, 2015 at 03:15:11PM -0800, Andrew Morton wrote:
>> On Tue, 17 Feb 2015 20:01:38 +0100 Heinrich Schuchardt <[email protected]> wrote:
>>
>>> PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
>>> THREAD_SIZE.
>>>
>>> E.g. architecture hexagon may have page size 1M and thread size 4096.
>>>
>>> This would lead to a division by zero.
>>>
>>> The futex implementation assumes that tids fit into the FUTEX_TID_MASK.
>>> This limits the number of allowable threads.
>>>
>>> ...
>>>
>>> --- a/kernel/fork.c
>>> +++ b/kernel/fork.c
>>> @@ -74,6 +74,7 @@
>>> #include <linux/uprobes.h>
>>> #include <linux/aio.h>
>>> #include <linux/compiler.h>
>>> +#include <linux/math64.h>
>>>
>>> #include <asm/pgtable.h>
>>> #include <asm/pgalloc.h>
>>> @@ -255,6 +256,8 @@ void __init __weak arch_task_cache_init(void) { }
>>>
>>> void __init fork_init(unsigned long mempages)
>>> {
>>> + u64 temp;
>>
>> That's a really poor name. We should always try to make names
>> meaningful. Here, something like "threads" would be better.
>>
>>> #ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
>>> #ifndef ARCH_MIN_TASKALIGN
>>> #define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
>>> @@ -273,7 +276,16 @@ void __init fork_init(unsigned long mempages)
>>> * value: the thread structures can take up at most half
>>> * of memory.
>>> */
>>> - max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
>>> + temp = div64_u64((u64) mempages * (u64) PAGE_SIZE,
>>> + (u64) THREAD_SIZE * 8UL);
>>> +
>>> + /*
>>> + * The futex code assumes that tids fit into the FUTEX_TID_MASK.
>>> + */
>>> + if (temp < FUTEX_TID_MASK)
>>> + max_threads = temp;
>>> + else
>>> + max_threads = FUTEX_TID_MASK;
>>
>> Seems rather complicated. How about
>>
>> max_threads = mempages / (8 * THREAD_SIZE);
>
> Apparently it is possible that
> mempages < 8 * THREAD_SIZE
> which would result in max_threads == 0.
>
> How about the following ?
>
> max_threads = mempages / DIV_ROUND_UP(8 * THREAD_SIZE, PAGE_SIZE);
For PAGE_SIZE 2MB and THREAD_SIZE 4kB this is wrong by factor 64.
Best regards
Heinrich
On Wed, Feb 18, 2015 at 08:50:21PM +0100, Heinrich Schuchardt wrote:
> On 18.02.2015 20:38, Guenter Roeck wrote:
> > On Tue, Feb 17, 2015 at 03:15:11PM -0800, Andrew Morton wrote:
> >> On Tue, 17 Feb 2015 20:01:38 +0100 Heinrich Schuchardt <[email protected]> wrote:
> >>
> >>> PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
> >>> THREAD_SIZE.
> >>>
> >>> E.g. architecture hexagon may have page size 1M and thread size 4096.
> >>>
> >>> This would lead to a division by zero.
> >>>
> >>> The futex implementation assumes that tids fit into the FUTEX_TID_MASK.
> >>> This limits the number of allowable threads.
> >>>
> >>> ...
> >>>
> >>> --- a/kernel/fork.c
> >>> +++ b/kernel/fork.c
> >>> @@ -74,6 +74,7 @@
> >>> #include <linux/uprobes.h>
> >>> #include <linux/aio.h>
> >>> #include <linux/compiler.h>
> >>> +#include <linux/math64.h>
> >>>
> >>> #include <asm/pgtable.h>
> >>> #include <asm/pgalloc.h>
> >>> @@ -255,6 +256,8 @@ void __init __weak arch_task_cache_init(void) { }
> >>>
> >>> void __init fork_init(unsigned long mempages)
> >>> {
> >>> + u64 temp;
> >>
> >> That's a really poor name. We should always try to make names
> >> meaningful. Here, something like "threads" would be better.
> >>
> >>> #ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
> >>> #ifndef ARCH_MIN_TASKALIGN
> >>> #define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
> >>> @@ -273,7 +276,16 @@ void __init fork_init(unsigned long mempages)
> >>> * value: the thread structures can take up at most half
> >>> * of memory.
> >>> */
> >>> - max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
> >>> + temp = div64_u64((u64) mempages * (u64) PAGE_SIZE,
> >>> + (u64) THREAD_SIZE * 8UL);
> >>> +
> >>> + /*
> >>> + * The futex code assumes that tids fit into the FUTEX_TID_MASK.
> >>> + */
> >>> + if (temp < FUTEX_TID_MASK)
> >>> + max_threads = temp;
> >>> + else
> >>> + max_threads = FUTEX_TID_MASK;
> >>
> >> Seems rather complicated. How about
> >>
> >> max_threads = mempages / (8 * THREAD_SIZE);
> >
> > Apparently it is possible that
> > mempages < 8 * THREAD_SIZE
> > which would result in max_threads == 0.
> >
> > How about the following ?
> >
> > max_threads = mempages / DIV_ROUND_UP(8 * THREAD_SIZE, PAGE_SIZE);
>
> For PAGE_SIZE 2MB and THREAD_SIZE 4kB this is wrong by factor 64.
>
That really depends on the definition of "wrong", doesn't it ?
It would ensure that max_threads is never larger than mempages,
which is more defensive.
One could also use
max_threads = mempages / (max(1, 8 * THREAD_SIZE / PAGE_SIZE));
to solve the original problem without introducing 64 bit operations
and without affecting the result for more common architectures.
Guenter
On Wed, 18 Feb 2015 20:47:48 +0100 Heinrich Schuchardt <[email protected]> wrote:
> Hello Andrew,
>
> thank you for your comments. Unfortunately there is no solution with
> 32-bit calculus.
How would something along the lines of
if (PAGE_SIZE < THREAD_SIZE)
...
else
...
look?
> Please, see my answers below.
>
> As fork_init is only called once there should be not performance issue
> in using 64-bit calculus.
Sure, it's not a big issue. But please do address the code comments
and no "temp"!
> I think that my patch did not cover all problems connected to max_threads.
>
> I just had a look at the memory hotplugging code.
> Shouldn't max_threads and init_task.signal->rlim[RLIMIT_NPROC] be
> recalculated after adding or removing memory?
> This could be done in a hotplug callback.
That sounds right. We've fixed some of these inaccuracies but there
will be many more remaining. Searching for things like "mempages" and
"nr_free_buffer_pages" shows them up. mem hotplug is an ongoing thing ;)
> max_threads can be set by writing to /proc/sys/kernel/threads-max.
> Shouldn't the value be checked by the same routine
Probably.
> and shouldn't
> init_task.signal->rlim[RLIMIT_NPROC] be updated?
Harder. By this time the system has all these processes which have
inherited their rlimits from init.
In fork_init a division by zero may occur. This is addressed in the
first patch.
Furthermore max_threads is checked against FUTEX_TID_MASK._
The calculation of max_threads is moved to a separate function.
The second patch addresses max_threads being set by writing to
/proc/sys/kernel/threads-max. The same limits are applied as
in fork_init.
The third patch uses a callback function to updated max_thread when
a memory hotplug event occurs.
New in version 3:
Determination of max_threads moved to separate function.
Handling of /proc/sys/kernel/threads-max
Handling of memory hotplugging.
Heinrich Schuchardt (3):
kernel/fork.c: avoid division by zero
kernel/sysctl.c: threads-max observe limits
kernel/fork.c: memory hotplug updates max_threads
kernel/fork.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++--------
kernel/sysctl.c | 7 ++--
2 files changed, 100 insertions(+), 19 deletions(-)
--
2.1.4
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.
E.g. architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.
With 32-bit calculus there is no solution which delivers valid results
for all possible combinations of the parameters.
The code is only called once.
Hence a 64-bit calculation can be used as solution.
The calculation of max_threads is moved to a separate function.
This allows future patches to use the same logic, e.g. when
- max_threads is set by writing to /proc/sys/kernel/threads-max
- when adding or removing memory.
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
kernel/fork.c | 59 ++++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 42 insertions(+), 17 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 4dc2dda..69c30cd 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -88,6 +88,16 @@
#include <trace/events/task.h>
/*
+ * Minimum number of threads to boot the kernel
+ */
+#define MIN_THREADS 20
+
+/*
+ * Maximum number of threads
+ */
+#define MAX_THREADS FUTEX_TID_MASK
+
+/*
* Protected counters by write_lock_irq(&tasklist_lock)
*/
unsigned long total_forks; /* Handle normal Linux uptimes. */
@@ -253,6 +263,37 @@ EXPORT_SYMBOL_GPL(__put_task_struct);
void __init __weak arch_task_cache_init(void) { }
+/*
+ * set_max_threads tries to set default limit to the suggested value.
+ */
+static void set_max_threads(unsigned int max_threads_suggested)
+{
+ u64 threads;
+
+ /*
+ * The number of threads shall be limited such that the thread
+ * structures may only consume a small part of the available memory.
+ */
+ threads = div64_u64((u64) totalram_pages * (u64) PAGE_SIZE,
+ (u64) THREAD_SIZE * 8UL);
+
+ if (threads > max_threads_suggested)
+ threads = max_threads_suggested;
+
+ if (threads > MAX_THREADS)
+ threads = MAX_THREADS;
+
+ if (threads < MIN_THREADS)
+ threads = MIN_THREADS;
+
+ max_threads = (int) threads;
+
+ init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = (int) threads / 2;
+ init_task.signal->rlim[RLIMIT_NPROC].rlim_max = (int) threads / 2;
+ init_task.signal->rlim[RLIMIT_SIGPENDING] =
+ init_task.signal->rlim[RLIMIT_NPROC];
+}
+
void __init fork_init(unsigned long mempages)
{
#ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
@@ -268,23 +309,7 @@ void __init fork_init(unsigned long mempages)
/* do the arch specific task caches init */
arch_task_cache_init();
- /*
- * The default maximum number of threads is set to a safe
- * value: the thread structures can take up at most half
- * of memory.
- */
- max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
-
- /*
- * we need to allow at least 20 threads to boot a system
- */
- if (max_threads < 20)
- max_threads = 20;
-
- init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
- init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
- init_task.signal->rlim[RLIMIT_SIGPENDING] =
- init_task.signal->rlim[RLIMIT_NPROC];
+ set_max_threads(UINT_MAX);
}
int __weak arch_dup_task_struct(struct task_struct *dst,
--
2.1.4
Users can change the maximum number of threads by writing to
/proc/sys/kernel/threads-max.
With the patch the value entered is checked according to the same
limits that apply when fork_init is called.
Furthermore the limits of the init process are adjusted.
This will not update limits of any other existing processes.
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
include/linux/sysctl.h | 3 +++
kernel/fork.c | 24 ++++++++++++++++++++++++
kernel/sysctl.c | 6 ++----
3 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index b7361f8..795d5fe 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -212,4 +212,7 @@ static inline void setup_sysctl_set(struct ctl_table_set *p,
#endif /* CONFIG_SYSCTL */
+int sysctl_max_threads(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+
#endif /* _LINUX_SYSCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 69c30cd..38ffce5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -74,6 +74,7 @@
#include <linux/uprobes.h>
#include <linux/aio.h>
#include <linux/compiler.h>
+#include <linux/sysctl.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -2024,3 +2025,26 @@ int unshare_files(struct files_struct **displaced)
task_unlock(task);
return 0;
}
+
+int sysctl_max_threads(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table t;
+ int ret;
+ int threads = max_threads;
+ int min = MIN_THREADS;
+ int max = MAX_THREADS;
+
+ t = *table;
+ t.data = &threads;
+ t.extra1 = &min;
+ t.extra2 = &max;
+
+ ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
+ if (ret || !write)
+ return ret;
+
+ set_max_threads(threads);
+
+ return 0;
+}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 137c7f6..2e195ae 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -92,11 +92,9 @@
#include <linux/nmi.h>
#endif
-
#if defined(CONFIG_SYSCTL)
/* External variables not in a header file. */
-extern int max_threads;
extern int suid_dumpable;
#ifdef CONFIG_COREDUMP
extern int core_uses_pid;
@@ -709,10 +707,10 @@ static struct ctl_table kern_table[] = {
#endif
{
.procname = "threads-max",
- .data = &max_threads,
+ .data = NULL,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = sysctl_max_threads,
},
{
.procname = "random",
--
2.1.4
Memory may be added or removed while the system is online.
With the patch the value of max_threads is updated accordingly.
The limits of the init process are also updated.
This does not include updating limits of other running processes.
Tested with commands like
echo 5000 > /proc/sys/kernel/threads-max
echo 0 > /sys/devices/system/memory/memory7/online
cat /proc/sys/kernel/threads-max
echo 1 > /sys/devices/system/memory/memory7/online
cat /proc/sys/kernel/threads-max
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
kernel/fork.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index 38ffce5..7125be3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -75,6 +75,8 @@
#include <linux/aio.h>
#include <linux/compiler.h>
#include <linux/sysctl.h>
+#include <linux/memory.h>
+#include <linux/notifier.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -295,6 +297,31 @@ static void set_max_threads(unsigned int max_threads_suggested)
init_task.signal->rlim[RLIMIT_NPROC];
}
+/*
+ * Callback function called for memory hotplug events.
+ */
+static int memory_hotplug_callback(struct notifier_block *self,
+ unsigned long action, void *arg)
+{
+ switch (action) {
+ case MEM_ONLINE:
+ /*
+ * If memory was added, try to maximize the number of allowed
+ * threads.
+ */
+ set_max_threads(UINT_MAX);
+ break;
+ case MEM_OFFLINE:
+ /*
+ * If memory was removed, try to keep current value.
+ */
+ set_max_threads(max_threads);
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
void __init fork_init(unsigned long mempages)
{
#ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
@@ -311,6 +338,8 @@ void __init fork_init(unsigned long mempages)
arch_task_cache_init();
set_max_threads(UINT_MAX);
+
+ hotplug_memory_notifier(memory_hotplug_callback, 0);
}
int __weak arch_dup_task_struct(struct task_struct *dst,
--
2.1.4
* Heinrich Schuchardt <[email protected]> wrote:
> PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
> THREAD_SIZE.
>
> E.g. architecture hexagon may have page size 1M and thread size 4096.
> This would lead to a division by zero in the calculation of max_threads.
>
> With 32-bit calculus there is no solution which delivers valid results
> for all possible combinations of the parameters.
> The code is only called once.
> Hence a 64-bit calculation can be used as solution.
>
> The calculation of max_threads is moved to a separate function.
> This allows future patches to use the same logic, e.g. when
> - max_threads is set by writing to /proc/sys/kernel/threads-max
> - when adding or removing memory.
>
> Signed-off-by: Heinrich Schuchardt <[email protected]>
> ---
> kernel/fork.c | 59 ++++++++++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 42 insertions(+), 17 deletions(-)
So it would be nice to split this patch into two parts:
first do a patch that splits out a version of
set_max_threads() that matches current behavior (including
the bug).
Then a second patch that fixes the bug.
This would make it much easier to review.
Thanks,
Ingo
In fork_init a division by zero may occur.
In the first patch the calculation of max_threads is moved from fork_init
to a new separate function.
The incorrect calculation of max threads is addressed in the
second patch.
Furthermore max_threads is checked against FUTEX_TID_MASK.
The third patch addresses max_threads being set by writing to
/proc/sys/kernel/threads-max. The same limits are applied as
in fork_init.
The fourth patch uses a callback function to update max_threads when
a memory hotplug event occurs.
New in version 4:
Separate of refactoring and correction into separate patches
(as requested by Ingo Molnar)
Remove redundant argument of fork_init
New in version 3:
Determination of max_threads moved to separate function.
Handling of /proc/sys/kernel/threads-max
Handling of memory hotplugging.
Heinrich Schuchardt (4):
kernel/fork.c: new function for max_threads
kernel/fork.c: avoid division by zero
kernel/sysctl.c: threads-max observe limits
kernel/fork.c: memory hotplug updates max_threads
include/linux/sysctl.h | 3 ++
init/main.c | 4 +-
kernel/fork.c | 112 +++++++++++++++++++++++++++++++++++++++++--------
kernel/sysctl.c | 6 +--
4 files changed, 102 insertions(+), 23 deletions(-)
--
2.1.4
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.
E.g. architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.
With this patch the buggy code is moved to a separate function
set_max_threads. The error is not fixed.
After fixing the problem in a separate patch the new function can be
reused to adjust max_threads after adding or removing memory.
Argument mempages of function fork_init() is removed as totalram_pages
is an exported symbol.
The creation of separate patches for refactoring to a new function
and for fixing the logic was requested by Ingo Molnar.
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
init/main.c | 4 ++--
kernel/fork.c | 39 ++++++++++++++++++++++++---------------
2 files changed, 26 insertions(+), 17 deletions(-)
diff --git a/init/main.c b/init/main.c
index 61b99376..21394ec 100644
--- a/init/main.c
+++ b/init/main.c
@@ -94,7 +94,7 @@
static int kernel_init(void *);
extern void init_IRQ(void);
-extern void fork_init(unsigned long);
+extern void fork_init(void);
extern void radix_tree_init(void);
#ifndef CONFIG_DEBUG_RODATA
static inline void mark_rodata_ro(void) { }
@@ -655,7 +655,7 @@ asmlinkage __visible void __init start_kernel(void)
#endif
thread_info_cache_init();
cred_init();
- fork_init(totalram_pages);
+ fork_init();
proc_caches_init();
buffer_init();
key_init();
diff --git a/kernel/fork.c b/kernel/fork.c
index 4dc2dda..3a423b2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -253,27 +253,18 @@ EXPORT_SYMBOL_GPL(__put_task_struct);
void __init __weak arch_task_cache_init(void) { }
-void __init fork_init(unsigned long mempages)
+/*
+ * set_max_threads
+ * The argument is ignored.
+ */
+static void set_max_threads(unsigned int max_threads_suggested)
{
-#ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
-#ifndef ARCH_MIN_TASKALIGN
-#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
-#endif
- /* create a slab on which task_structs can be allocated */
- task_struct_cachep =
- kmem_cache_create("task_struct", sizeof(struct task_struct),
- ARCH_MIN_TASKALIGN, SLAB_PANIC | SLAB_NOTRACK, NULL);
-#endif
-
- /* do the arch specific task caches init */
- arch_task_cache_init();
-
/*
* The default maximum number of threads is set to a safe
* value: the thread structures can take up at most half
* of memory.
*/
- max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
+ max_threads = totalram_pages / (8 * THREAD_SIZE / PAGE_SIZE);
/*
* we need to allow at least 20 threads to boot a system
@@ -287,6 +278,24 @@ void __init fork_init(unsigned long mempages)
init_task.signal->rlim[RLIMIT_NPROC];
}
+void __init fork_init(void)
+{
+#ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
+#ifndef ARCH_MIN_TASKALIGN
+#define ARCH_MIN_TASKALIGN L1_CACHE_BYTES
+#endif
+ /* create a slab on which task_structs can be allocated */
+ task_struct_cachep =
+ kmem_cache_create("task_struct", sizeof(struct task_struct),
+ ARCH_MIN_TASKALIGN, SLAB_PANIC | SLAB_NOTRACK, NULL);
+#endif
+
+ /* do the arch specific task caches init */
+ arch_task_cache_init();
+
+ set_max_threads(UINT_MAX);
+}
+
int __weak arch_dup_task_struct(struct task_struct *dst,
struct task_struct *src)
{
--
2.1.4
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.
E.g. architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.
With 32-bit calculus there is no solution which delivers valid results
for all possible combinations of the parameters.
The code is only called once.
Hence a 64-bit calculation can be used as solution.
The calculation of max_threads is moved to a separate function.
This allows future patches to use the same logic, e.g. when
- max_threads is set by writing to /proc/sys/kernel/threads-max
- when adding or removing memory.
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
kernel/fork.c | 42 +++++++++++++++++++++++++++++-------------
1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 3a423b2..4d68e1e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -88,6 +88,16 @@
#include <trace/events/task.h>
/*
+ * Minimum number of threads to boot the kernel
+ */
+#define MIN_THREADS 20
+
+/*
+ * Maximum number of threads
+ */
+#define MAX_THREADS FUTEX_TID_MASK
+
+/*
* Protected counters by write_lock_irq(&tasklist_lock)
*/
unsigned long total_forks; /* Handle normal Linux uptimes. */
@@ -254,26 +264,32 @@ EXPORT_SYMBOL_GPL(__put_task_struct);
void __init __weak arch_task_cache_init(void) { }
/*
- * set_max_threads
- * The argument is ignored.
+ * set_max_threads tries to set the default limit to the suggested value.
*/
static void set_max_threads(unsigned int max_threads_suggested)
{
- /*
- * The default maximum number of threads is set to a safe
- * value: the thread structures can take up at most half
- * of memory.
- */
- max_threads = totalram_pages / (8 * THREAD_SIZE / PAGE_SIZE);
+ u64 threads;
/*
- * we need to allow at least 20 threads to boot a system
+ * The number of threads shall be limited such that the thread
+ * structures may only consume a small part of the available memory.
*/
- if (max_threads < 20)
- max_threads = 20;
+ threads = div64_u64((u64) totalram_pages * (u64) PAGE_SIZE,
+ (u64) THREAD_SIZE * 8UL);
+
+ if (threads > max_threads_suggested)
+ threads = max_threads_suggested;
+
+ if (threads > MAX_THREADS)
+ threads = MAX_THREADS;
+
+ if (threads < MIN_THREADS)
+ threads = MIN_THREADS;
+
+ max_threads = (int) threads;
- init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
- init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
+ init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = (int) threads / 2;
+ init_task.signal->rlim[RLIMIT_NPROC].rlim_max = (int) threads / 2;
init_task.signal->rlim[RLIMIT_SIGPENDING] =
init_task.signal->rlim[RLIMIT_NPROC];
}
--
2.1.4
Users can change the maximum number of threads by writing to
/proc/sys/kernel/threads-max.
With the patch the value entered is checked according to the same
limits that apply when fork_init is called.
Furthermore the limits of the init process are adjusted.
This will not update limits of any other existing processes.
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
include/linux/sysctl.h | 3 +++
kernel/fork.c | 24 ++++++++++++++++++++++++
kernel/sysctl.c | 6 ++----
3 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index b7361f8..795d5fe 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -212,4 +212,7 @@ static inline void setup_sysctl_set(struct ctl_table_set *p,
#endif /* CONFIG_SYSCTL */
+int sysctl_max_threads(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+
#endif /* _LINUX_SYSCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 4d68e1e..33b084e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -74,6 +74,7 @@
#include <linux/uprobes.h>
#include <linux/aio.h>
#include <linux/compiler.h>
+#include <linux/sysctl.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -2024,3 +2025,26 @@ int unshare_files(struct files_struct **displaced)
task_unlock(task);
return 0;
}
+
+int sysctl_max_threads(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table t;
+ int ret;
+ int threads = max_threads;
+ int min = MIN_THREADS;
+ int max = MAX_THREADS;
+
+ t = *table;
+ t.data = &threads;
+ t.extra1 = &min;
+ t.extra2 = &max;
+
+ ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
+ if (ret || !write)
+ return ret;
+
+ set_max_threads(threads);
+
+ return 0;
+}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 137c7f6..2e195ae 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -92,11 +92,9 @@
#include <linux/nmi.h>
#endif
-
#if defined(CONFIG_SYSCTL)
/* External variables not in a header file. */
-extern int max_threads;
extern int suid_dumpable;
#ifdef CONFIG_COREDUMP
extern int core_uses_pid;
@@ -709,10 +707,10 @@ static struct ctl_table kern_table[] = {
#endif
{
.procname = "threads-max",
- .data = &max_threads,
+ .data = NULL,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = sysctl_max_threads,
},
{
.procname = "random",
--
2.1.4
Memory may be added or removed while the system is online.
With the patch the value of max_threads is updated accordingly.
The limits of the init process are also updated.
This does not include updating limits of other running processes.
Tested with commands like
echo 5000 > /proc/sys/kernel/threads-max
echo 0 > /sys/devices/system/memory/memory7/online
cat /proc/sys/kernel/threads-max
echo 1 > /sys/devices/system/memory/memory7/online
cat /proc/sys/kernel/threads-max
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
kernel/fork.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index 33b084e..7fd7c0a9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -75,6 +75,8 @@
#include <linux/aio.h>
#include <linux/compiler.h>
#include <linux/sysctl.h>
+#include <linux/memory.h>
+#include <linux/notifier.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -295,6 +297,31 @@ static void set_max_threads(unsigned int max_threads_suggested)
init_task.signal->rlim[RLIMIT_NPROC];
}
+/*
+ * Callback function called for memory hotplug events.
+ */
+static int memory_hotplug_callback(struct notifier_block *self,
+ unsigned long action, void *arg)
+{
+ switch (action) {
+ case MEM_ONLINE:
+ /*
+ * If memory was added, try to maximize the number of allowed
+ * threads.
+ */
+ set_max_threads(UINT_MAX);
+ break;
+ case MEM_OFFLINE:
+ /*
+ * If memory was removed, try to keep current value.
+ */
+ set_max_threads(max_threads);
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
void __init fork_init(void)
{
#ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
@@ -311,6 +338,8 @@ void __init fork_init(void)
arch_task_cache_init();
set_max_threads(UINT_MAX);
+
+ hotplug_memory_notifier(memory_hotplug_callback, 0);
}
int __weak arch_dup_task_struct(struct task_struct *dst,
--
2.1.4
On 02/23, Heinrich Schuchardt wrote:
>
> +static int memory_hotplug_callback(struct notifier_block *self,
> + unsigned long action, void *arg)
> +{
> + switch (action) {
> + case MEM_ONLINE:
> + /*
> + * If memory was added, try to maximize the number of allowed
> + * threads.
> + */
> + set_max_threads(UINT_MAX);
> + break;
> + case MEM_OFFLINE:
> + /*
> + * If memory was removed, try to keep current value.
> + */
> + set_max_threads(max_threads);
> + break;
> + }
can't understand... set_max_threads() added by 1/4 ignore its argument.
Why does it need "int max_threads_suggested" then?
And it changes the swapper/0's rlimits. This is pointless after we fork
/sbin/init.
It seems to me these patches need some cleanups. Plus I am not sure the
kernel should update max_threads automatically, we have the "threads-max"
sysctl.
Oleg.
On 02/23, Oleg Nesterov wrote:
>
> On 02/23, Heinrich Schuchardt wrote:
> >
> > +static int memory_hotplug_callback(struct notifier_block *self,
> > + unsigned long action, void *arg)
> > +{
> > + switch (action) {
> > + case MEM_ONLINE:
> > + /*
> > + * If memory was added, try to maximize the number of allowed
> > + * threads.
> > + */
> > + set_max_threads(UINT_MAX);
> > + break;
> > + case MEM_OFFLINE:
> > + /*
> > + * If memory was removed, try to keep current value.
> > + */
> > + set_max_threads(max_threads);
> > + break;
> > + }
>
> can't understand... set_max_threads() added by 1/4 ignore its argument.
> Why does it need "int max_threads_suggested" then?
OOPS sorry, missed 2/4 ;)
> And it changes the swapper/0's rlimits. This is pointless after we fork
> /sbin/init.
>
> It seems to me these patches need some cleanups. Plus I am not sure the
> kernel should update max_threads automatically, we have the "threads-max"
> sysctl.
still true, or I am tottaly confused.
Oleg.
On Mon, Feb 23, 2015 at 09:14:35PM +0100, Heinrich Schuchardt wrote:
> With 32-bit calculus there is no solution which delivers valid results
> for all possible combinations of the parameters.
So I can't help but be bugged by this (my problem I know); calculus is
the mathematics of change; its differential equations and integrals.
What we do here is modular arithmetic on a commutative ring.
Please use calculation as you do here:
> Hence a 64-bit calculation can be used as solution.
On 23.02.2015 21:54, Oleg Nesterov wrote:
> On 02/23, Oleg Nesterov wrote:
>>
>> On 02/23, Heinrich Schuchardt wrote:
>>>
>>> +static int memory_hotplug_callback(struct notifier_block *self,
>>> + unsigned long action, void *arg)
>>> +{
>>> + switch (action) {
>>> + case MEM_ONLINE:
>>> + /*
>>> + * If memory was added, try to maximize the number of allowed
>>> + * threads.
>>> + */
>>> + set_max_threads(UINT_MAX);
>>> + break;
>>> + case MEM_OFFLINE:
>>> + /*
>>> + * If memory was removed, try to keep current value.
>>> + */
>>> + set_max_threads(max_threads);
>>> + break;
>>> + }
>>
>> can't understand... set_max_threads() added by 1/4 ignore its argument.
>> Why does it need "int max_threads_suggested" then?
>
> OOPS sorry, missed 2/4 ;)
>
>> And it changes the swapper/0's rlimits. This is pointless after we fork
>> /sbin/init.
So should writing to /proc/sys/max_threads update the limits of all
processes?
>>
>> It seems to me these patches need some cleanups. Plus I am not sure the
>> kernel should update max_threads automatically, we have the "threads-max"
>> sysctl.
The idea in the original version of fork_init is that max_threads should
be chosen such that the memory needed to store the meta-information of
max_threads threads should only be 1/8th of the total memory.
Somebody adding or removing memory will not necessarily update
/proc/sys/kernel/threads-max.
This means that if I remove 90 % of the memory I get to a situation
where max_threads allows so many threads to be created that the
meta-information occupies all memory.
With patch 4/4 max_threads is automatically reduced in this case.
Best regards
Heinrich
On 23.02.2015 22:10, Peter Zijlstra wrote:
> On Mon, Feb 23, 2015 at 09:14:35PM +0100, Heinrich Schuchardt wrote:
>> With 32-bit calculus there is no solution which delivers valid results
>> for all possible combinations of the parameters.
>
> So I can't help but be bugged by this (my problem I know); calculus is
> the mathematics of change; its differential equations and integrals.
Yes, I used the wrong word. Thanks.
>
> What we do here is modular arithmetic on a commutative ring.
>
> Please use calculation as you do here:
>
>> Hence a 64-bit calculation can be used as solution.
>
>
Is it only the text of the commit message that needs change?
Best regards
Heinrich
On 02/23, Heinrich Schuchardt wrote:
>
> On 23.02.2015 21:54, Oleg Nesterov wrote:
> >
> >> And it changes the swapper/0's rlimits. This is pointless after we fork
> >> /sbin/init.
>
> So should writing to /proc/sys/max_threads update the limits of all
> processes?
Why?
No, I think it should not touch rlimits at all.
> >> It seems to me these patches need some cleanups. Plus I am not sure the
> >> kernel should update max_threads automatically, we have the "threads-max"
> >> sysctl.
>
> The idea in the original version of fork_init is that max_threads should
> be chosen such that the memory needed to store the meta-information of
> max_threads threads should only be 1/8th of the total memory.
>
> Somebody adding or removing memory will not necessarily update
> /proc/sys/kernel/threads-max.
>
> This means that if I remove 90 % of the memory I get to a situation
> where max_threads allows so many threads to be created that the
> meta-information occupies all memory.
>
> With patch 4/4 max_threads is automatically reduced in this case.
I understand. But I think that if you remove 90 % of the memory you can
also update /proc/sys/kernel/threads-max.
And, suppose that admin specially limited max_threads, then you add more
memory. Should the kernel bump the limit silently?
And if hotplug should update max_threads, why it doesn't update, say,
files_stat.max_files?
IOW, I do not think that kernel should control max_threads after boot.
But I won't really argue. Just this looks a bit strange to me.
Oleg.
* Heinrich Schuchardt <[email protected]> wrote:
> PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
> THREAD_SIZE.
>
> E.g. architecture hexagon may have page size 1M and thread size 4096.
> This would lead to a division by zero in the calculation of max_threads.
>
> With 32-bit calculus there is no solution which delivers valid results
> for all possible combinations of the parameters.
> The code is only called once.
> Hence a 64-bit calculation can be used as solution.
>
> The calculation of max_threads is moved to a separate function.
Please keep the 'move to a separate function' patch in a
separate patch and don't mix it up with the fix, as I
suggested it in my previous review.
Thanks,
Ingo
In fork_init a division by zero may occur.
In the first patch the calculation of max_threads is moved from fork_init
to a new separate function.
The incorrect calculation of max threads is addressed in the
second patch.
Furthermore max_threads is checked against FUTEX_TID_MASK.
The third patch addresses max_threads being set by writing to
/proc/sys/kernel/threads-max. The same limits are applied as
in fork_init.
New in version 5:
Do not update limits of the init process
Do not update max_threads on memory hotplug events
Corrections to commit messages
New in version 4:
Separation of refactoring and correction into separate patches
(as requested by Ingo Molnar)
Remove redundant argument of fork_init
New in version 3:
Determination of max_threads moved to separate function.
Handling of /proc/sys/kernel/threads-max
Handling of memory hotplugging.
New in version 2:
Use div64_u64 for 64-bit division.
Thank you for your helpful feedback:
Andrew Morton <[email protected]>
Guenter Roeck <[email protected]>
Ingo Molnar <[email protected]>
Oleg Nesterov <[email protected]>
Peter Zijlstra <[email protected]>
Vladimir Davydov <[email protected]>
Heinrich Schuchardt (3):
kernel/fork.c: new function for max_threads
kernel/fork.c: avoid division by zero
kernel/sysctl.c: threads-max observe limits
include/linux/sysctl.h | 3 ++
init/main.c | 4 +--
kernel/fork.c | 75 +++++++++++++++++++++++++++++++++++++++++---------
kernel/sysctl.c | 6 ++--
4 files changed, 69 insertions(+), 19 deletions(-)
--
2.1.4
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.
E.g. architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.
With this patch the buggy code is moved to a separate function
set_max_threads. The error is not fixed.
After fixing the problem in a separate patch the new function can be
reused to adjust max_threads after adding or removing memory.
Argument mempages of function fork_init() is removed as totalram_pages
is an exported symbol.
The creation of separate patches for refactoring to a new function
and for fixing the logic was suggested by Ingo Molnar.
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
init/main.c | 4 ++--
kernel/fork.c | 35 ++++++++++++++++++++++-------------
2 files changed, 24 insertions(+), 15 deletions(-)
diff --git a/init/main.c b/init/main.c
index 61b99376..21394ec 100644
--- a/init/main.c
+++ b/init/main.c
@@ -94,7 +94,7 @@
static int kernel_init(void *);
extern void init_IRQ(void);
-extern void fork_init(unsigned long);
+extern void fork_init(void);
extern void radix_tree_init(void);
#ifndef CONFIG_DEBUG_RODATA
static inline void mark_rodata_ro(void) { }
@@ -655,7 +655,7 @@ asmlinkage __visible void __init start_kernel(void)
#endif
thread_info_cache_init();
cred_init();
- fork_init(totalram_pages);
+ fork_init();
proc_caches_init();
buffer_init();
key_init();
diff --git a/kernel/fork.c b/kernel/fork.c
index 4dc2dda..460b044 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -253,7 +253,27 @@ EXPORT_SYMBOL_GPL(__put_task_struct);
void __init __weak arch_task_cache_init(void) { }
-void __init fork_init(unsigned long mempages)
+/*
+ * set_max_threads
+ * The argument is ignored.
+ */
+static void set_max_threads(unsigned int max_threads_suggested)
+{
+ /*
+ * The default maximum number of threads is set to a safe
+ * value: the thread structures can take up at most half
+ * of memory.
+ */
+ max_threads = totalram_pages / (8 * THREAD_SIZE / PAGE_SIZE);
+
+ /*
+ * we need to allow at least 20 threads to boot a system
+ */
+ if (max_threads < 20)
+ max_threads = 20;
+}
+
+void __init fork_init(void)
{
#ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
#ifndef ARCH_MIN_TASKALIGN
@@ -268,18 +288,7 @@ void __init fork_init(unsigned long mempages)
/* do the arch specific task caches init */
arch_task_cache_init();
- /*
- * The default maximum number of threads is set to a safe
- * value: the thread structures can take up at most half
- * of memory.
- */
- max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
-
- /*
- * we need to allow at least 20 threads to boot a system
- */
- if (max_threads < 20)
- max_threads = 20;
+ set_max_threads(UINT_MAX);
init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
--
2.1.4
PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.
E.g. architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.
With 32-bit calculation there is no solution which delivers valid results
for all possible combinations of the parameters.
The code is only called once.
Hence a 64-bit calculation can be used as solution.
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
kernel/fork.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 460b044..880c78d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -88,6 +88,16 @@
#include <trace/events/task.h>
/*
+ * Minimum number of threads to boot the kernel
+ */
+#define MIN_THREADS 20
+
+/*
+ * Maximum number of threads
+ */
+#define MAX_THREADS FUTEX_TID_MASK
+
+/*
* Protected counters by write_lock_irq(&tasklist_lock)
*/
unsigned long total_forks; /* Handle normal Linux uptimes. */
@@ -254,23 +264,29 @@ EXPORT_SYMBOL_GPL(__put_task_struct);
void __init __weak arch_task_cache_init(void) { }
/*
- * set_max_threads
- * The argument is ignored.
+ * set_max_threads tries to set the default limit to the suggested value.
*/
static void set_max_threads(unsigned int max_threads_suggested)
{
- /*
- * The default maximum number of threads is set to a safe
- * value: the thread structures can take up at most half
- * of memory.
- */
- max_threads = totalram_pages / (8 * THREAD_SIZE / PAGE_SIZE);
+ u64 threads;
/*
- * we need to allow at least 20 threads to boot a system
+ * The number of threads shall be limited such that the thread
+ * structures may only consume a small part of the available memory.
*/
- if (max_threads < 20)
- max_threads = 20;
+ threads = div64_u64((u64) totalram_pages * (u64) PAGE_SIZE,
+ (u64) THREAD_SIZE * 8UL);
+
+ if (threads > max_threads_suggested)
+ threads = max_threads_suggested;
+
+ if (threads > MAX_THREADS)
+ threads = MAX_THREADS;
+
+ if (threads < MIN_THREADS)
+ threads = MIN_THREADS;
+
+ max_threads = (int) threads;
}
void __init fork_init(void)
--
2.1.4
Users can change the maximum number of threads by writing to
/proc/sys/kernel/threads-max.
With the patch the value entered is checked against the same
limits that apply when fork_init is called.
Signed-off-by: Heinrich Schuchardt <[email protected]>
---
include/linux/sysctl.h | 3 +++
kernel/fork.c | 24 ++++++++++++++++++++++++
kernel/sysctl.c | 6 ++----
3 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index b7361f8..795d5fe 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -212,4 +212,7 @@ static inline void setup_sysctl_set(struct ctl_table_set *p,
#endif /* CONFIG_SYSCTL */
+int sysctl_max_threads(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+
#endif /* _LINUX_SYSCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 880c78d..52a07c7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -74,6 +74,7 @@
#include <linux/uprobes.h>
#include <linux/aio.h>
#include <linux/compiler.h>
+#include <linux/sysctl.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -2024,3 +2025,26 @@ int unshare_files(struct files_struct **displaced)
task_unlock(task);
return 0;
}
+
+int sysctl_max_threads(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table t;
+ int ret;
+ int threads = max_threads;
+ int min = MIN_THREADS;
+ int max = MAX_THREADS;
+
+ t = *table;
+ t.data = &threads;
+ t.extra1 = &min;
+ t.extra2 = &max;
+
+ ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
+ if (ret || !write)
+ return ret;
+
+ set_max_threads(threads);
+
+ return 0;
+}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 137c7f6..2e195ae 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -92,11 +92,9 @@
#include <linux/nmi.h>
#endif
-
#if defined(CONFIG_SYSCTL)
/* External variables not in a header file. */
-extern int max_threads;
extern int suid_dumpable;
#ifdef CONFIG_COREDUMP
extern int core_uses_pid;
@@ -709,10 +707,10 @@ static struct ctl_table kern_table[] = {
#endif
{
.procname = "threads-max",
- .data = &max_threads,
+ .data = NULL,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = sysctl_max_threads,
},
{
.procname = "random",
--
2.1.4
On Tue, 24 Feb 2015, Heinrich Schuchardt wrote:
> diff --git a/init/main.c b/init/main.c
> index 61b99376..21394ec 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -94,7 +94,7 @@
> static int kernel_init(void *);
>
> extern void init_IRQ(void);
> -extern void fork_init(unsigned long);
> +extern void fork_init(void);
> extern void radix_tree_init(void);
> #ifndef CONFIG_DEBUG_RODATA
> static inline void mark_rodata_ro(void) { }
> @@ -655,7 +655,7 @@ asmlinkage __visible void __init start_kernel(void)
> #endif
> thread_info_cache_init();
> cred_init();
> - fork_init(totalram_pages);
> + fork_init();
> proc_caches_init();
> buffer_init();
> key_init();
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 4dc2dda..460b044 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -253,7 +253,27 @@ EXPORT_SYMBOL_GPL(__put_task_struct);
>
> void __init __weak arch_task_cache_init(void) { }
>
> -void __init fork_init(unsigned long mempages)
> +/*
> + * set_max_threads
> + * The argument is ignored.
> + */
> +static void set_max_threads(unsigned int max_threads_suggested)
> +{
> + /*
> + * The default maximum number of threads is set to a safe
> + * value: the thread structures can take up at most half
> + * of memory.
> + */
> + max_threads = totalram_pages / (8 * THREAD_SIZE / PAGE_SIZE);
> +
> + /*
> + * we need to allow at least 20 threads to boot a system
> + */
> + if (max_threads < 20)
> + max_threads = 20;
> +}
> +
> +void __init fork_init(void)
> {
> #ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
> #ifndef ARCH_MIN_TASKALIGN
> @@ -268,18 +288,7 @@ void __init fork_init(unsigned long mempages)
> /* do the arch specific task caches init */
> arch_task_cache_init();
>
> - /*
> - * The default maximum number of threads is set to a safe
> - * value: the thread structures can take up at most half
> - * of memory.
> - */
> - max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
> -
> - /*
> - * we need to allow at least 20 threads to boot a system
> - */
> - if (max_threads < 20)
> - max_threads = 20;
> + set_max_threads(UINT_MAX);
>
> init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
> init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
I'm afraid I don't understand this. The intent of the patch is to
separate the max_threads logic into a new function, correct? If that's
true, then I don't understand why UINT_MAX is being introduced into this
path and passed to the new function when it is ignored.
I think it would be better to simply keep passing mempages to fork_init()
and then pass it to set_max_threads() where max_threads actually gets set
using the argument passed. At least, the code would then match the intent
of the patch.
On Tue, 24 Feb 2015, Heinrich Schuchardt wrote:
> PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
> THREAD_SIZE.
>
> E.g. architecture hexagon may have page size 1M and thread size 4096.
> This would lead to a division by zero in the calculation of max_threads.
>
This should only appear in one patch in the series, and I think this is
the appropriate patch for that to happen.
> With 32-bit calculation there is no solution which delivers valid results
> for all possible combinations of the parameters.
> The code is only called once.
> Hence a 64-bit calculation can be used as solution.
>
> Signed-off-by: Heinrich Schuchardt <[email protected]>
> ---
> kernel/fork.c | 38 +++++++++++++++++++++++++++-----------
> 1 file changed, 27 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 460b044..880c78d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -88,6 +88,16 @@
> #include <trace/events/task.h>
>
> /*
> + * Minimum number of threads to boot the kernel
> + */
> +#define MIN_THREADS 20
> +
> +/*
> + * Maximum number of threads
> + */
> +#define MAX_THREADS FUTEX_TID_MASK
> +
> +/*
> * Protected counters by write_lock_irq(&tasklist_lock)
> */
> unsigned long total_forks; /* Handle normal Linux uptimes. */
> @@ -254,23 +264,29 @@ EXPORT_SYMBOL_GPL(__put_task_struct);
> void __init __weak arch_task_cache_init(void) { }
>
> /*
> - * set_max_threads
> - * The argument is ignored.
> + * set_max_threads tries to set the default limit to the suggested value.
I'm not sure that is true. At this point in the patch series,
set_max_threads() is only called with UINT_MAX and that's not what the
implementation tries to set max_threads to.
> */
> static void set_max_threads(unsigned int max_threads_suggested)
> {
> - /*
> - * The default maximum number of threads is set to a safe
> - * value: the thread structures can take up at most half
> - * of memory.
> - */
> - max_threads = totalram_pages / (8 * THREAD_SIZE / PAGE_SIZE);
> + u64 threads;
>
> /*
> - * we need to allow at least 20 threads to boot a system
> + * The number of threads shall be limited such that the thread
> + * structures may only consume a small part of the available memory.
> */
> - if (max_threads < 20)
> - max_threads = 20;
> + threads = div64_u64((u64) totalram_pages * (u64) PAGE_SIZE,
> + (u64) THREAD_SIZE * 8UL);
The artithmetic works here, but I'm wondering what guarantee is in place
to ensure that totalram_pages * PAGE_SIZE does not overflow on 64-bit
arches?
> +
> + if (threads > max_threads_suggested)
> + threads = max_threads_suggested;
Ok, so now the function argument just shows an implementation issue.
You're capping threads at UINT_MAX because you need max_threads to be an
int and relying on the caller to enforce that. set_max_threads() should
be handling all of these details itself.
> +
> + if (threads > MAX_THREADS)
> + threads = MAX_THREADS;
> +
> + if (threads < MIN_THREADS)
> + threads = MIN_THREADS;
> +
> + max_threads = (int) threads;
> }
>
> void __init fork_init(void)
On Tue, 24 Feb 2015, Heinrich Schuchardt wrote:
> Users can change the maximum number of threads by writing to
> /proc/sys/kernel/threads-max.
>
> With the patch the value entered is checked against the same
> limits that apply when fork_init is called.
>
Correct me if I'm wrong, but this is a change in functionality (without
update to Documentation/sysctl/kernel.txt which describes threads-max)
since it does not allow the value to be lowered as before from the
calculation involving totalram_pages. The value passed to
set_max_threads() only caps the value.
> Signed-off-by: Heinrich Schuchardt <[email protected]>
> ---
> include/linux/sysctl.h | 3 +++
> kernel/fork.c | 24 ++++++++++++++++++++++++
> kernel/sysctl.c | 6 ++----
> 3 files changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index b7361f8..795d5fe 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -212,4 +212,7 @@ static inline void setup_sysctl_set(struct ctl_table_set *p,
>
> #endif /* CONFIG_SYSCTL */
>
> +int sysctl_max_threads(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos);
> +
> #endif /* _LINUX_SYSCTL_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 880c78d..52a07c7 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -74,6 +74,7 @@
> #include <linux/uprobes.h>
> #include <linux/aio.h>
> #include <linux/compiler.h>
> +#include <linux/sysctl.h>
>
> #include <asm/pgtable.h>
> #include <asm/pgalloc.h>
> @@ -2024,3 +2025,26 @@ int unshare_files(struct files_struct **displaced)
> task_unlock(task);
> return 0;
> }
> +
> +int sysctl_max_threads(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> + struct ctl_table t;
> + int ret;
> + int threads = max_threads;
> + int min = MIN_THREADS;
> + int max = MAX_THREADS;
> +
> + t = *table;
> + t.data = &threads;
> + t.extra1 = &min;
> + t.extra2 = &max;
> +
> + ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
> + if (ret || !write)
> + return ret;
> +
> + set_max_threads(threads);
> +
> + return 0;
> +}
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 137c7f6..2e195ae 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -92,11 +92,9 @@
> #include <linux/nmi.h>
> #endif
>
> -
> #if defined(CONFIG_SYSCTL)
>
> /* External variables not in a header file. */
> -extern int max_threads;
> extern int suid_dumpable;
> #ifdef CONFIG_COREDUMP
> extern int core_uses_pid;
> @@ -709,10 +707,10 @@ static struct ctl_table kern_table[] = {
> #endif
> {
> .procname = "threads-max",
> - .data = &max_threads,
> + .data = NULL,
> .maxlen = sizeof(int),
> .mode = 0644,
> - .proc_handler = proc_dointvec,
> + .proc_handler = sysctl_max_threads,
> },
> {
> .procname = "random",
On 24.02.2015 22:03, David Rientjes wrote:
> On Tue, 24 Feb 2015, Heinrich Schuchardt wrote:
>
>> diff --git a/init/main.c b/init/main.c
>> index 61b99376..21394ec 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -94,7 +94,7 @@
>> static int kernel_init(void *);
>>
>> extern void init_IRQ(void);
>> -extern void fork_init(unsigned long);
>> +extern void fork_init(void);
>> extern void radix_tree_init(void);
>> #ifndef CONFIG_DEBUG_RODATA
>> static inline void mark_rodata_ro(void) { }
>> @@ -655,7 +655,7 @@ asmlinkage __visible void __init start_kernel(void)
>> #endif
>> thread_info_cache_init();
>> cred_init();
>> - fork_init(totalram_pages);
>> + fork_init();
>> proc_caches_init();
>> buffer_init();
>> key_init();
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 4dc2dda..460b044 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -253,7 +253,27 @@ EXPORT_SYMBOL_GPL(__put_task_struct);
>>
>> void __init __weak arch_task_cache_init(void) { }
>>
>> -void __init fork_init(unsigned long mempages)
>> +/*
>> + * set_max_threads
>> + * The argument is ignored.
>> + */
>> +static void set_max_threads(unsigned int max_threads_suggested)
>> +{
>> + /*
>> + * The default maximum number of threads is set to a safe
>> + * value: the thread structures can take up at most half
>> + * of memory.
>> + */
>> + max_threads = totalram_pages / (8 * THREAD_SIZE / PAGE_SIZE);
>> +
>> + /*
>> + * we need to allow at least 20 threads to boot a system
>> + */
>> + if (max_threads < 20)
>> + max_threads = 20;
>> +}
>> +
>> +void __init fork_init(void)
>> {
>> #ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
>> #ifndef ARCH_MIN_TASKALIGN
>> @@ -268,18 +288,7 @@ void __init fork_init(unsigned long mempages)
>> /* do the arch specific task caches init */
>> arch_task_cache_init();
>>
>> - /*
>> - * The default maximum number of threads is set to a safe
>> - * value: the thread structures can take up at most half
>> - * of memory.
>> - */
>> - max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
>> -
>> - /*
>> - * we need to allow at least 20 threads to boot a system
>> - */
>> - if (max_threads < 20)
>> - max_threads = 20;
>> + set_max_threads(UINT_MAX);
>>
>> init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
>> init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
>
> I'm afraid I don't understand this. The intent of the patch is to
> separate the max_threads logic into a new function, correct? If that's
> true, then I don't understand why UINT_MAX is being introduced into this
> path and passed to the new function when it is ignored.
>
> I think it would be better to simply keep passing mempages to fork_init()
> and then pass it to set_max_threads() where max_threads actually gets set
> using the argument passed. At least, the code would then match the intent
> of the patch.
>
Please, read patch 2/3 which provides support for the argument,
and patch 3/3 that finally needs it.
Best regards
Heinrich
On 24.02.2015 22:17, David Rientjes wrote:
> On Tue, 24 Feb 2015, Heinrich Schuchardt wrote:
>
>> Users can change the maximum number of threads by writing to
>> /proc/sys/kernel/threads-max.
>>
>> With the patch the value entered is checked against the same
>> limits that apply when fork_init is called.
>>
>
> Correct me if I'm wrong, but this is a change in functionality (without
> update to Documentation/sysctl/kernel.txt which describes threads-max)
> since it does not allow the value to be lowered as before from the
> calculation involving totalram_pages. The value passed to
> set_max_threads() only caps the value.
threads_max is set to the value the user inputs if it is inside the
interval [20, FUTEX_TID_MASK] and it is capped by the value calculated
from totalram_pages.
So lowering and raising is still possible (inside the limits).
>
>> Signed-off-by: Heinrich Schuchardt <[email protected]>
>> ---
>> include/linux/sysctl.h | 3 +++
>> kernel/fork.c | 24 ++++++++++++++++++++++++
>> kernel/sysctl.c | 6 ++----
>> 3 files changed, 29 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
>> index b7361f8..795d5fe 100644
>> --- a/include/linux/sysctl.h
>> +++ b/include/linux/sysctl.h
>> @@ -212,4 +212,7 @@ static inline void setup_sysctl_set(struct ctl_table_set *p,
>>
>> #endif /* CONFIG_SYSCTL */
>>
>> +int sysctl_max_threads(struct ctl_table *table, int write,
>> + void __user *buffer, size_t *lenp, loff_t *ppos);
>> +
>> #endif /* _LINUX_SYSCTL_H */
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 880c78d..52a07c7 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -74,6 +74,7 @@
>> #include <linux/uprobes.h>
>> #include <linux/aio.h>
>> #include <linux/compiler.h>
>> +#include <linux/sysctl.h>
>>
>> #include <asm/pgtable.h>
>> #include <asm/pgalloc.h>
>> @@ -2024,3 +2025,26 @@ int unshare_files(struct files_struct **displaced)
>> task_unlock(task);
>> return 0;
>> }
>> +
>> +int sysctl_max_threads(struct ctl_table *table, int write,
>> + void __user *buffer, size_t *lenp, loff_t *ppos)
>> +{
>> + struct ctl_table t;
>> + int ret;
>> + int threads = max_threads;
>> + int min = MIN_THREADS;
>> + int max = MAX_THREADS;
>> +
>> + t = *table;
>> + t.data = &threads;
>> + t.extra1 = &min;
>> + t.extra2 = &max;
>> +
>> + ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
>> + if (ret || !write)
>> + return ret;
>> +
>> + set_max_threads(threads);
>> +
>> + return 0;
>> +}
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index 137c7f6..2e195ae 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -92,11 +92,9 @@
>> #include <linux/nmi.h>
>> #endif
>>
>> -
>> #if defined(CONFIG_SYSCTL)
>>
>> /* External variables not in a header file. */
>> -extern int max_threads;
>> extern int suid_dumpable;
>> #ifdef CONFIG_COREDUMP
>> extern int core_uses_pid;
>> @@ -709,10 +707,10 @@ static struct ctl_table kern_table[] = {
>> #endif
>> {
>> .procname = "threads-max",
>> - .data = &max_threads,
>> + .data = NULL,
>> .maxlen = sizeof(int),
>> .mode = 0644,
>> - .proc_handler = proc_dointvec,
>> + .proc_handler = sysctl_max_threads,
>> },
>> {
>> .procname = "random",
>
Best regards
Heinrich
On Tue, 24 Feb 2015, Heinrich Schuchardt wrote:
> > I'm afraid I don't understand this. The intent of the patch is to
> > separate the max_threads logic into a new function, correct? If that's
> > true, then I don't understand why UINT_MAX is being introduced into this
> > path and passed to the new function when it is ignored.
> >
> > I think it would be better to simply keep passing mempages to fork_init()
> > and then pass it to set_max_threads() where max_threads actually gets set
> > using the argument passed. At least, the code would then match the intent
> > of the patch.
> >
> Please, read patch 2/3 which provides support for the argument,
> and patch 3/3 that finally needs it.
>
The problem is with the structure of your patchset. You want three
patches. There's one bugfix patch, a preparation patch, and a feature
patch. The bugfix patch should come first so that it can be applied,
possibly, to stable kernels and doesn't depend on unnecessary preparation
patches for features.
1/3: change the implementation of fork_init(), with commentary, to avoid
the divide by zero on certain arches, enforce the limits, and deal with
variable types to prevent overflow. This is the most urgent patch and
fixes a bug.
2/3: simply extract the fixed fork_init() implementation into a new
set_max_threads() in preparation to use it for threads-max, (hint:
UINT_MAX and ignored arguments should not appear in this patch),
3/3: use the new set_max_threads() implementation for threads-max with an
update to the documentation.
On Tue, 24 Feb 2015, Heinrich Schuchardt wrote:
> >> Users can change the maximum number of threads by writing to
> >> /proc/sys/kernel/threads-max.
> >>
> >> With the patch the value entered is checked against the same
> >> limits that apply when fork_init is called.
> >>
> >
> > Correct me if I'm wrong, but this is a change in functionality (without
> > update to Documentation/sysctl/kernel.txt which describes threads-max)
> > since it does not allow the value to be lowered as before from the
> > calculation involving totalram_pages. The value passed to
> > set_max_threads() only caps the value.
>
> threads_max is set to the value the user inputs if it is inside the
> interval [20, FUTEX_TID_MASK] and it is capped by the value calculated
> from totalram_pages.
>
> So lowering and raising is still possible (inside the limits).
>
I believe this information should be added to the documentation cited
above which mentions threads-max since users will otherwise be unfamiliar
with the limits imposed.
On 24.02.2015 23:16, David Rientjes wrote:
> On Tue, 24 Feb 2015, Heinrich Schuchardt wrote:
>
>>> I'm afraid I don't understand this. The intent of the patch is to
>>> separate the max_threads logic into a new function, correct? If that's
>>> true, then I don't understand why UINT_MAX is being introduced into this
>>> path and passed to the new function when it is ignored.
>>>
>>> I think it would be better to simply keep passing mempages to fork_init()
>>> and then pass it to set_max_threads() where max_threads actually gets set
>>> using the argument passed. At least, the code would then match the intent
>>> of the patch.
>>>
>> Please, read patch 2/3 which provides support for the argument,
>> and patch 3/3 that finally needs it.
>>
>
> The problem is with the structure of your patchset. You want three
> patches. There's one bugfix patch, a preparation patch, and a feature
> patch. The bugfix patch should come first so that it can be applied,
> possibly, to stable kernels and doesn't depend on unnecessary preparation
> patches for features.
>
> 1/3: change the implementation of fork_init(), with commentary, to avoid
> the divide by zero on certain arches, enforce the limits, and deal with
> variable types to prevent overflow. This is the most urgent patch and
> fixes a bug.
>
> 2/3: simply extract the fixed fork_init() implementation into a new
> set_max_threads() in preparation to use it for threads-max, (hint:
> UINT_MAX and ignored arguments should not appear in this patch),
>
> 3/3: use the new set_max_threads() implementation for threads-max with an
> update to the documentation.
>
Hello Ingo,
the current structure of the patch set is based on your suggestion in
https://lkml.org/lkml/2015/2/22/22
Would you agree with the sequence of patches proposed by David?
Best regards
Heinrich
* David Rientjes <[email protected]> wrote:
> The problem is with the structure of your patchset. You
> want three patches. There's one bugfix patch, a
> preparation patch, and a feature patch. The bugfix patch
> should come first so that it can be applied, possibly, to
> stable kernels and doesn't depend on unnecessary
> preparation patches for features.
>
> 1/3: change the implementation of fork_init(), with
> commentary, to avoid the divide by zero on certain
> arches, enforce the limits, and deal with variable types
> to prevent overflow. This is the most urgent patch and
> fixes a bug.
>
> 2/3: simply extract the fixed fork_init() implementation
> into a new set_max_threads() in preparation to use it for
> threads-max, (hint: UINT_MAX and ignored arguments should
> not appear in this patch),
>
> 3/3: use the new set_max_threads() implementation for
> threads-max with an update to the documentation.
I disagree strongly: it's better to first do low-risk
cleanups, then do any fix and other changes.
This approach has four advantages:
- it makes the bug fix more apparent, in the context of
an already cleaned up code.
- it strengthens the basic principle that 'substantial
work should be done on clean code'.
- on the off chance that the bugfix introduces problems
_upstream_, it's much easier to revert in a late -rc
kernel, than to first revert the cleanups. This
happens in practice occasionally, so it's not a
theoretical concern.
- the _backport_ to the -stable kernel will be more
robust as well, because under your proposed structure,
what gets tested upstream is the fix+cleanup, while the
backport will only be the fix, which does not get
tested by upstream in isolation. If there's any
(unintended) side effect of the cleanup that happens to
be an improvement, then we'll break -stable!
It is true that this makes backports a tiny bit more
involved (2 cherry-picks instead of just one), but -stable
kernels can backport preparatory patches just fine, and
it's actually an advantage to have -stable kernel code
match the upstream version as much as possible.
Thanks,
Ingo
On 24.02.2015 23:20, David Rientjes wrote:
> On Tue, 24 Feb 2015, Heinrich Schuchardt wrote:
>
>>>> Users can change the maximum number of threads by writing to
>>>> /proc/sys/kernel/threads-max.
>>>>
>>>> With the patch the value entered is checked against the same
>>>> limits that apply when fork_init is called.
>>>>
>>>
>>> Correct me if I'm wrong, but this is a change in functionality (without
>>> update to Documentation/sysctl/kernel.txt which describes threads-max)
>>> since it does not allow the value to be lowered as before from the
>>> calculation involving totalram_pages. The value passed to
>>> set_max_threads() only caps the value.
>>
>> threads_max is set to the value the user inputs if it is inside the
>> interval [20, FUTEX_TID_MASK] and it is capped by the value calculated
>> from totalram_pages.
>>
>> So lowering and raising is still possible (inside the limits).
>>
>
> I believe this information should be added to the documentation cited
> above which mentions threads-max since users will otherwise be unfamiliar
> with the limits imposed.
Hello David,
I guess the documentation fix should be put into a separate patch (of
the same patch series) as Jonathan Corbet uses a separate git tree to
consolidate documentation changes. Is that ok with you?
Best regards
Heinrich
On 25.02.2015 11:17, Ingo Molnar wrote:
>
> * David Rientjes <[email protected]> wrote:
>
>> The problem is with the structure of your patchset. You
>> want three patches. There's one bugfix patch, a
>> preparation patch, and a feature patch. The bugfix patch
>> should come first so that it can be applied, possibly, to
>> stable kernels and doesn't depend on unnecessary
>> preparation patches for features.
>>
>> 1/3: change the implementation of fork_init(), with
>> commentary, to avoid the divide by zero on certain
>> arches, enforce the limits, and deal with variable types
>> to prevent overflow. This is the most urgent patch and
>> fixes a bug.
>>
>> 2/3: simply extract the fixed fork_init() implementation
>> into a new set_max_threads() in preparation to use it for
>> threads-max, (hint: UINT_MAX and ignored arguments should
>> not appear in this patch),
>>
>> 3/3: use the new set_max_threads() implementation for
>> threads-max with an update to the documentation.
>
> I disagree strongly: it's better to first do low-risk
> cleanups, then do any fix and other changes.
>
> This approach has four advantages:
>
> - it makes the bug fix more apparent, in the context of
> an already cleaned up code.
>
> - it strengthens the basic principle that 'substantial
> work should be done on clean code'.
>
> - on the off chance that the bugfix introduces problems
> _upstream_, it's much easier to revert in a late -rc
> kernel, than to first revert the cleanups. This
> happens in practice occasionally, so it's not a
> theoretical concern.
>
> - the _backport_ to the -stable kernel will be more
> robust as well, because under your proposed structure,
> what gets tested upstream is the fix+cleanup, while the
> backport will only be the fix, which does not get
> tested by upstream in isolation. If there's any
> (unintended) side effect of the cleanup that happens to
> be an improvement, then we'll break -stable!
>
> It is true that this makes backports a tiny bit more
> involved (2 cherry-picks instead of just one), but -stable
> kernels can backport preparatory patches just fine, and
> it's actually an advantage to have -stable kernel code
> match the upstream version as much as possible.
>
> Thanks,
>
> Ingo
> --
Hello David,
to my understanding the biggest no-go for you was that patch 1/3
introduces a function argument that is not needed until patch 3/3.
Could you accept the sequence proposed by Ingo if I leave away the
argument max_threads_suggested in patch 1 and 2 and only introduce it in
patch 3?
So patch 1/4 will refactor the code that may result in division by zero
to new function static void set_max_threads(void). Furthermore it will
change the interface of fork_init to void __init fork_init(void).
Patch 2/4 keeps the interface static void set_max_threads(void) but
corrects the coding using div64_u64.
Patch 3/4 changes the interface to
static void set_max_threads(unsigned int max_threads_suggested),
calls this function in fork_init with value UINT_MAX and calls it
when threads-max is set with the user value.
Patch 4/4 adds the necessary information about theads-max in
Documentation/sysctl/kernel.txt.
I would not like to put this in patch 3/4 as Jonathan Corbet uses a
separate git tree.
Best regards
Heinrich
On Wed, 25 Feb 2015, Heinrich Schuchardt wrote:
> > I believe this information should be added to the documentation cited
> > above which mentions threads-max since users will otherwise be unfamiliar
> > with the limits imposed.
>
> Hello David,
>
> I guess the documentation fix should be put into a separate patch (of
> the same patch series) as Jonathan Corbet uses a separate git tree to
> consolidate documentation changes. Is that ok with you?
>
Yeah, the tunable doesn't seem to have any documentation at the moment so
if that could be changed by adding a complete description of the behavior
and limits, including what happens when someone writes something outside
the limits, as well as any corner cases, it would be very helpful.
On Wed, 25 Feb 2015, Heinrich Schuchardt wrote:
> > I disagree strongly: it's better to first do low-risk
> > cleanups, then do any fix and other changes.
> >
> > This approach has four advantages:
> >
> > - it makes the bug fix more apparent, in the context of
> > an already cleaned up code.
> >
> > - it strengthens the basic principle that 'substantial
> > work should be done on clean code'.
> >
> > - on the off chance that the bugfix introduces problems
> > _upstream_, it's much easier to revert in a late -rc
> > kernel, than to first revert the cleanups. This
> > happens in practice occasionally, so it's not a
> > theoretical concern.
> >
> > - the _backport_ to the -stable kernel will be more
> > robust as well, because under your proposed structure,
> > what gets tested upstream is the fix+cleanup, while the
> > backport will only be the fix, which does not get
> > tested by upstream in isolation. If there's any
> > (unintended) side effect of the cleanup that happens to
> > be an improvement, then we'll break -stable!
> >
> > It is true that this makes backports a tiny bit more
> > involved (2 cherry-picks instead of just one), but -stable
> > kernels can backport preparatory patches just fine, and
> > it's actually an advantage to have -stable kernel code
> > match the upstream version as much as possible.
> >
> > Thanks,
> >
> > Ingo
> > --
>
> Hello David,
>
> to my understanding the biggest no-go for you was that patch 1/3
> introduces a function argument that is not needed until patch 3/3.
>
Your series is adding an unused argument in patch 1 and passes in UINT_MAX
for no clear reason. Patch 2 then starts to use the argument because the
new helper function needs to cast to u64 and you want to prevent overflow,
so the UINT_MAX becomes apparent but should rather be part of the
implementation of the function that does the casting.
Futhermore, the most significant part of the review still hasn't been
addressed: the fact that doing (u64)totalram_pages * (u64)PAGE_SIZE can
overflow your own numerator, so the calculation will need to be changed
itself.
> Could you accept the sequence proposed by Ingo if I leave away the
> argument max_threads_suggested in patch 1 and 2 and only introduce it in
> patch 3?
>
I'm not the person who will be merging these, so feel free to follow the
guidance of whoever will be doing that. I think if there was inspection
of the series that it's really two things proposed together: a
divide-by-zero bugfix and changing threads-max to respect limits.
I understand there is depdenencies on the same code between those two
things so proposing them as part of the same patchset is good, but I would
have proposed the current divide-by-zero bugfix first which can be done
solely in fork_init() and then change threads-max afterwards.