2010-04-21 01:40:50

by Mike Travis

[permalink] [raw]
Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max

Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max
From: Hedi Berriche <[email protected]>

On a system with a substantial number of processors, the early default pid_max
of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
started before the login prompt. It's estimated that with 2048 CPU's we will pass
the 32k limit. With 4096, we'll reach that limit very early during the boot cycle,
and processes would stall waiting for an available pid.

This provides a kernel start parameter to increase the early maximum number of
pids available. It does not change any of the defaults.

Signed-off-by: Hedi Berriche <[email protected]>
Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Robin Holt <[email protected]>

---
Documentation/kernel-parameters.txt | 11 +++++++++++
kernel/pid.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+)

--- linux-2.6.32.orig/Documentation/kernel-parameters.txt
+++ linux-2.6.32/Documentation/kernel-parameters.txt
@@ -1327,6 +1327,17 @@ and is between 256 and 4096 characters.
max_luns= [SCSI] Maximum number of LUNs to probe.
Should be between 1 and 2^32-1.

+ max_pid=nn[KMG] [KNL] Maximum number of PID's to use. On a system
+ with a large amount of processors, the default
+ pid_max may not be sufficient to allow the system
+ to boot. The range of allowed values is limited from
+ pid_max_min to pid_max_max (configuration dependent.)
+ See kernel/pid.c and include/linux/threads.h for
+ specific values. Note that specifying a value
+ too small may cause the system to fail to boot,
+ so that value is ignored. Using a value too large,
+ and the largest allowed value will be used instead.
+
max_report_luns=
[SCSI] Maximum number of LUNs received.
Should be between 1 and 16384.
--- linux-2.6.32.orig/kernel/pid.c
+++ linux-2.6.32/kernel/pid.c
@@ -53,6 +53,36 @@ int pid_max_max = PID_MAX_LIMIT;
#define BITS_PER_PAGE (PAGE_SIZE*8)
#define BITS_PER_PAGE_MASK (BITS_PER_PAGE-1)

+static int __init set_pid_max(char *str)
+{
+ u64 maxp;
+
+ if (!str)
+ return -EINVAL;
+
+ maxp = memparse(str, &str);
+
+ if (maxp < pid_max_min) {
+ pr_warning(
+ "pid_max smaller than minimum allowed value (%u)\n",
+ pid_max_min);
+ return -EINVAL;
+ }
+ if (maxp > pid_max_max) {
+ pr_warning(
+ "pid_max larger than maximum allowed value, using %u\n",
+ pid_max_max);
+ pid_max = pid_max_max;
+ } else {
+ pid_max = maxp;
+ pr_info("pid_max set to %u\n", pid_max);
+ }
+
+ return 0;
+}
+
+early_param("pid_max", set_pid_max);
+
static inline int mk_pid(struct pid_namespace *pid_ns,
struct pidmap *map, int off)
{


2010-04-21 01:52:57

by Mike Travis

[permalink] [raw]
Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

[Sorry, the previous patch I sent was an incorrect version. The arg specified
in the Documentation file was wrong.]

Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max
From: Hedi Berriche <[email protected]>

On a system with a substantial number of processors, the early default pid_max
of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
started before the login prompt. It's estimated that with 2048 CPU's we will pass
the 32k limit. With 4096, we'll reach that limit very early during the boot cycle,
and processes would stall waiting for an available pid.

This provides a kernel start parameter to increase the early maximum number of
pids available. It does not change any of the defaults.

Signed-off-by: Hedi Berriche <[email protected]>
Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Robin Holt <[email protected]>

---
Documentation/kernel-parameters.txt | 11 +++++++++++
kernel/pid.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+)

--- linux-2.6.32.orig/Documentation/kernel-parameters.txt
+++ linux-2.6.32/Documentation/kernel-parameters.txt
@@ -2033,6 +2033,17 @@ and is between 256 and 4096 characters.
pg. [PARIDE]
See Documentation/blockdev/paride.txt.

+ pid_max=nn[KMG] [KNL] Maximum number of PID's to use. On a system
+ with a large amount of processors, the default
+ pid_max may not be sufficient to allow the system
+ to boot. The range of allowed values is limited from
+ pid_max_min to pid_max_max (configuration dependent.)
+ See kernel/pid.c and include/linux/threads.h for
+ specific values. Note that specifying a value
+ too small may cause the system to fail to boot,
+ so that value is ignored. Using a value too large,
+ and the largest allowed value will be used instead.
+
pirq= [SMP,APIC] Manual mp-table setup
See Documentation/x86/i386/IO-APIC.txt.

--- linux-2.6.32.orig/kernel/pid.c
+++ linux-2.6.32/kernel/pid.c
@@ -53,6 +53,36 @@ int pid_max_max = PID_MAX_LIMIT;
#define BITS_PER_PAGE (PAGE_SIZE*8)
#define BITS_PER_PAGE_MASK (BITS_PER_PAGE-1)

+static int __init set_pid_max(char *str)
+{
+ u64 maxp;
+
+ if (!str)
+ return -EINVAL;
+
+ maxp = memparse(str, &str);
+
+ if (maxp < pid_max_min) {
+ pr_warning(
+ "pid_max smaller than minimum allowed value (%u)\n",
+ pid_max_min);
+ return -EINVAL;
+ }
+ if (maxp > pid_max_max) {
+ pr_warning(
+ "pid_max larger than maximum allowed value, using %u\n",
+ pid_max_max);
+ pid_max = pid_max_max;
+ } else {
+ pid_max = maxp;
+ pr_info("pid_max set to %u\n", pid_max);
+ }
+
+ return 0;
+}
+
+early_param("pid_max", set_pid_max);
+
static inline int mk_pid(struct pid_namespace *pid_ns,
struct pidmap *map, int off)
{

2010-04-21 09:19:46

by Alan

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

> of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
> started before the login prompt. It's estimated that with 2048 CPU's we will pass

Is that perhaps the bug not the 32K limit ? and does Tejun's work on work
queue sanity help avoid the need for this ?

2010-04-21 16:59:39

by Hedi Berriche

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
| > of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
| > started before the login prompt. It's estimated that with 2048 CPU's we will pass
|
| Is that perhaps the bug not the 32K limit?

Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 26844
tasks, all but few being kernel threads.

Worst case scenario i.e. 4096 CPUs system (+ typically thousands of disks) will
most certainly pain to boot, if it ever manages to, when pid_max is set to 32K.

Cheers,
Hedi.
--
Be careful of reading health books, you might die of a misprint.
-- Mark Twain

2010-04-21 17:19:18

by Rik van Riel

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On 04/21/2010 12:59 PM, Hedi Berriche wrote:
> On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
> |> of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
> |> started before the login prompt. It's estimated that with 2048 CPU's we will pass
> |
> | Is that perhaps the bug not the 32K limit?
>
> Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 26844
> tasks, all but few being kernel threads.

That is 15 kernel threads per CPU.

Reducing the number of kernel threads sounds like a
useful thing to do.

2010-04-21 17:54:28

by Mike Travis

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2



Rik van Riel wrote:
> On 04/21/2010 12:59 PM, Hedi Berriche wrote:
>> On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
>> |> of 32k will not be enough. A system with 1664 CPU's, there are
>> 25163 processes
>> |> started before the login prompt. It's estimated that with 2048
>> CPU's we will pass
>> |
>> | Is that perhaps the bug not the 32K limit?
>>
>> Doubt it: I just checked on an *idle* 1664 CPUs system and I can see
>> 26844
>> tasks, all but few being kernel threads.
>
> That is 15 kernel threads per CPU.
>
> Reducing the number of kernel threads sounds like a
> useful thing to do.

I'm doing more research but all the udev modprobes seem to spawn
quite a few tasks. And even though they go away, when the pid
pool is limited, I'm guessing many of them are waiting.

On the last test I did yesterday, the pid # was up in the 77000
range at the login prompt (I started the 1664 cpu system with
pid_max=128k).

2010-04-21 17:54:38

by Alan

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Wed, 21 Apr 2010 17:59:34 +0100
Hedi Berriche <[email protected]> wrote:

> On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
> | > of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
> | > started before the login prompt. It's estimated that with 2048 CPU's we will pass
> |
> | Is that perhaps the bug not the 32K limit?
>
> Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 26844
> tasks, all but few being kernel threads.

So why have we got 26844 tasks. Isn't that a rather more relevant
question.

And as I asked before - how does Tejun's work on sanitizing work queues
affect this ?

Alan

2010-04-21 19:12:20

by Hedi Berriche

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Wed, Apr 21, 2010 at 18:54 Alan Cox wrote:
| Hedi Berriche <[email protected]> wrote:
|
| > I just checked on an *idle* 1664 CPUs system and I can see 26844 tasks, all
| > but few being kernel threads.
|
| So why have we got 26844 tasks. Isn't that a rather more relevant
| question.

OK, here's a rough breakdown of the tasks

104 kswapd
1664 aio
1664 ata
1664 crypto
1664 events
1664 ib_cm
1664 kintegrityd
1664 kondemand
1664 ksoftirqd
1664 kstop
1664 migration
1664 rpciod
1664 scsi_tgtd
1664 xfsconvertd
1664 xfsdatad
1664 xfslogd

that's 25064, omitting the rest as its contribution to the overall total is
negligible.

[[

Let's also not forget all those ephemeral user space tasks (udev and the likes)
that will be spawned at boot time on even large systems with even more
thousands of disks, arguably one might consider hack initrd and similar to work
around the problem and set pid_max as soon as /proc becomes available but it's
a bit of a PITA.

]]

| And as I asked before - how does Tejun's work on sanitizing work queues
| affect this ?

I'm not familiar with the work in question so I (we) will have to look it up,
and at it and see whether it's relevant to what we're seeing here. It does sound
like it might help, to certain extent at least.

That said, while I am genuinely interested in spending time on this and digging
further to see whether something has/can be done about keeping under control the
number of tasks required to comfortably boot a system of this size, I think that
in the meantime the boot parameter approach is useful in the sense that it addresses
the immediate problem of being able such systems *without* any risk to break the
code or alter the default behaviour.

Cheers,
Hedi.
--
Be careful of reading health books, you might die of a misprint.
-- Mark Twain

2010-04-21 19:30:20

by John Stoffel

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

>>>>> "Rik" == Rik van Riel <[email protected]> writes:

Rik> On 04/21/2010 12:59 PM, Hedi Berriche wrote:
>> On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
>> |> of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
>> |> started before the login prompt. It's estimated that with 2048 CPU's we will pass
>> |
>> | Is that perhaps the bug not the 32K limit?
>>
>> Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 26844
>> tasks, all but few being kernel threads.

Rik> That is 15 kernel threads per CPU.

Rik> Reducing the number of kernel threads sounds like a
Rik> useful thing to do.

Isn't that already a project? I thought someone (Jeff? Jorn? Tejun? Bueller
bueller....?) was already proposing a patch set to reduce the number
of kernel threads by having dynamic workqueues instead, so that we
didn't spawn a bunch of threads that never did anything?

John

2010-04-21 19:33:54

by Hedi Berriche

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Wed, Apr 21, 2010 at 20:15 John Stoffel wrote:
| >>>>> "Rik" == Rik van Riel <[email protected]> writes:
|
| Rik> That is 15 kernel threads per CPU.
|
| Rik> Reducing the number of kernel threads sounds like a
| Rik> useful thing to do.
|
| Isn't that already a project?

Yes, thanks to Alan's probing I looked it up

http://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git

but we're definitely talking long term solution vs. something that can ease
pain now.

Cheers,
Hedi.
--
Be careful of reading health books, you might die of a misprint.
-- Mark Twain

2010-04-21 19:51:26

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Wed, Apr 21, 2010 at 08:12:13PM +0100, Hedi Berriche wrote:
> Let's also not forget all those ephemeral user space tasks (udev and the likes)
> that will be spawned at boot time on even large systems with even more
> thousands of disks, arguably one might consider hack initrd and similar to work
> around the problem and set pid_max as soon as /proc becomes available but it's
> a bit of a PITA.

udev should properly handle large numbers of cpus and the tasks that it
spawns so as to not overload things. If not, and you feel it is
creating too many tasks, please let the udev developers know and they
will be glad to work with you on this issue.

thanks,

greg k-h

2010-04-21 20:10:46

by John Stoffel

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

>>>>> "Hedi" == Hedi Berriche <[email protected]> writes:

Hedi> On Wed, Apr 21, 2010 at 20:15 John Stoffel wrote:
Hedi> | >>>>> "Rik" == Rik van Riel <[email protected]> writes:
Hedi> |
Hedi> | Rik> That is 15 kernel threads per CPU.
Hedi> |
Hedi> | Rik> Reducing the number of kernel threads sounds like a
Hedi> | Rik> useful thing to do.
Hedi> |
Hedi> | Isn't that already a project?

Hedi> Yes, thanks to Alan's probing I looked it up

Hedi> http://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git

Hedi> but we're definitely talking long term solution vs. something
Hedi> that can ease pain now.

It seems to me that running Linux on such a large machine is such a
specialized niche, the putting in your change to the regular kernel
isn't a near term need either. And from the sounds of it, Tejun's
work has better long term potential.

But hey, I'm generally clueless, so take what I say with a grain of
salt. :]

John

2010-04-21 20:12:24

by Hedi Berriche

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Wed, Apr 21, 2010 at 20:52 Greg KH wrote:
| On Wed, Apr 21, 2010 at 08:12:13PM +0100, Hedi Berriche wrote:
| > Let's also not forget all those ephemeral user space tasks (udev and the likes)
| > that will be spawned at boot time on even large systems with even more
| > thousands of disks, arguably one might consider hack initrd and similar to work
| > around the problem and set pid_max as soon as /proc becomes available but it's
| > a bit of a PITA.
|
| udev should properly handle large numbers of cpus and the tasks that it
| spawns so as to not overload things. If not, and you feel it is
| creating too many tasks, please let the udev developers know and they
| will be glad to work with you on this issue.

Just to be clear here --and be done with the udev parenthesis-- we kind of need
udev to take advantage of the fact that there's a large number of CPUs on the
machine especially on in the case of a config with thousands of disks, as that
shortens the time required to have a box in a working state with all disks
available and all.

IOW, I am not after throttling or serialising udev, just mentioned it as an
example of user space beast that can contribute --in the current state of things--
to the need of having a large number of pid_max on certain configurations.

That said I do realise that bit too should be looked at and any problems, as you
quite rightly pointed out, should be discussed with the udev chaps.

Cheers,
Hedi.
--
Be careful of reading health books, you might die of a misprint.
-- Mark Twain

2010-04-21 22:05:17

by tip-bot for Jack Steiner

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Wed, Apr 21, 2010 at 08:12:13PM +0100, Hedi Berriche wrote:
> On Wed, Apr 21, 2010 at 18:54 Alan Cox wrote:
> | Hedi Berriche <[email protected]> wrote:
> |
> | > I just checked on an *idle* 1664 CPUs system and I can see 26844 tasks, all
> | > but few being kernel threads.
> |
> | So why have we got 26844 tasks. Isn't that a rather more relevant
> | question.
>
> OK, here's a rough breakdown of the tasks
>
> 104 kswapd
> 1664 aio
> 1664 ata
> 1664 crypto
> 1664 events
> 1664 ib_cm
> 1664 kintegrityd
> 1664 kondemand
> 1664 ksoftirqd
> 1664 kstop
> 1664 migration
> 1664 rpciod
> 1664 scsi_tgtd
> 1664 xfsconvertd
> 1664 xfsdatad
> 1664 xfslogd
>
> that's 25064, omitting the rest as its contribution to the overall total is
> negligible.

Also, our target for the number of cpus is 4096. We are not even halfway there.
(I certainly expect other issues to arise scaling to 4096p but running out of pids
_should_ not be one of them...)



>
> [[
>
> Let's also not forget all those ephemeral user space tasks (udev and the likes)
> that will be spawned at boot time on even large systems with even more
> thousands of disks, arguably one might consider hack initrd and similar to work
> around the problem and set pid_max as soon as /proc becomes available but it's
> a bit of a PITA.
>
> ]]
>
> | And as I asked before - how does Tejun's work on sanitizing work queues
> | affect this ?
>
> I'm not familiar with the work in question so I (we) will have to look it up,
> and at it and see whether it's relevant to what we're seeing here. It does sound
> like it might help, to certain extent at least.
>
> That said, while I am genuinely interested in spending time on this and digging
> further to see whether something has/can be done about keeping under control the
> number of tasks required to comfortably boot a system of this size, I think that
> in the meantime the boot parameter approach is useful in the sense that it addresses
> the immediate problem of being able such systems *without* any risk to break the
> code or alter the default behaviour.
>
> Cheers,
> Hedi.
> --
> Be careful of reading health books, you might die of a misprint.
> -- Mark Twain

2010-04-21 22:37:31

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Wed, Apr 21, 2010 at 04:10:08PM -0400, John Stoffel wrote:
> >>>>> "Hedi" == Hedi Berriche <[email protected]> writes:
>
> Hedi> On Wed, Apr 21, 2010 at 20:15 John Stoffel wrote:
> Hedi> | >>>>> "Rik" == Rik van Riel <[email protected]> writes:
> Hedi> |
> Hedi> | Rik> That is 15 kernel threads per CPU.
> Hedi> |
> Hedi> | Rik> Reducing the number of kernel threads sounds like a
> Hedi> | Rik> useful thing to do.
> Hedi> |
> Hedi> | Isn't that already a project?
>
> Hedi> Yes, thanks to Alan's probing I looked it up
>
> Hedi> http://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git
>
> Hedi> but we're definitely talking long term solution vs. something
> Hedi> that can ease pain now.
>
> It seems to me that running Linux on such a large machine is such a
> specialized niche, the putting in your change to the regular kernel
> isn't a near term need either. And from the sounds of it, Tejun's
> work has better long term potential.

Tejun's work has much better long term potential, but this is still an
issue for large #cpu systems, which we want Linux to support well. This
isn't a "specialized niche" for Linux, at all, Linux pretty much
dominates this hardware area, and it would be nice to ensure that this
continues.

thanks,

greg k-h

2010-04-21 22:49:53

by Rik van Riel

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On 04/21/2010 06:24 PM, Greg KH wrote:

> Tejun's work has much better long term potential, but this is still an
> issue for large #cpu systems, which we want Linux to support well. This
> isn't a "specialized niche" for Linux, at all, Linux pretty much
> dominates this hardware area, and it would be nice to ensure that this
> continues.

Yes, the pid_max patch seems like a decent stop gap for
distro kernels right now. However, Tejun's work is
probably a more appropriate path forward.

2010-04-21 23:45:10

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Wed, Apr 21, 2010 at 06:49:22PM -0400, Rik van Riel wrote:
> On 04/21/2010 06:24 PM, Greg KH wrote:
>
> >Tejun's work has much better long term potential, but this is still an
> >issue for large #cpu systems, which we want Linux to support well. This
> >isn't a "specialized niche" for Linux, at all, Linux pretty much
> >dominates this hardware area, and it would be nice to ensure that this
> >continues.
>
> Yes, the pid_max patch seems like a decent stop gap for
> distro kernels right now. However, Tejun's work is
> probably a more appropriate path forward.

Distros don't want to take a patch that adds a new boot param that is
not accepted upstream, otherwise they will be stuck forward porting it
from now until, well, forever :)

As this solves a problem that people are having today, on the kernel.org
kernel, on a known machine, and we really don't know when the "reduce
the number of processes per cpu" work will be done, or if it really will
solve this issue, then why can't we take it now? If the work does solve
the problem in the future, then we can take the command line option out,
and everyone is happy.

Sound reasonable?

thanks,

greg k-h

2010-04-22 09:24:31

by Alan

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

> Distros don't want to take a patch that adds a new boot param that is
> not accepted upstream, otherwise they will be stuck forward porting it
> from now until, well, forever :)

So for an obscure IA64 specific problem you want the upstream kernel to
port it forward forever instead ?
>
> As this solves a problem that people are having today, on the kernel.org
> kernel, on a known machine, and we really don't know when the "reduce
> the number of processes per cpu" work will be done, or if it really will
> solve this issue, then why can't we take it now? If the work does solve
> the problem in the future, then we can take the command line option out,
> and everyone is happy.
>
> Sound reasonable?

No - to start with it would be far saner for everything involved if the
4096 processor minority fixed it for the moment in their arch code by
doing something like

if (max_pids < PIDS_PER_CPU * num_cpus) {
max_pids = ...
printk(something informative)
}

in their __init marked code.

Because when Tejun's stuff is in the patch can go away, and also if it's
not sufficient then the patch above should keep it sane when they go to
32000 cpus or whatever is next.

Alan

2010-04-22 12:58:09

by tip-bot for Jack Steiner

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Thu, Apr 22, 2010 at 10:28:52AM +0100, Alan Cox wrote:
> > Distros don't want to take a patch that adds a new boot param that is
> > not accepted upstream, otherwise they will be stuck forward porting it
> > from now until, well, forever :)
>
> So for an obscure IA64 specific problem you want the upstream kernel to
> port it forward forever instead ?

FWIW, the problem is occurring on systems that use x86 processors - not
IA64.


> >
> > As this solves a problem that people are having today, on the kernel.org
> > kernel, on a known machine, and we really don't know when the "reduce
> > the number of processes per cpu" work will be done, or if it really will
> > solve this issue, then why can't we take it now? If the work does solve
> > the problem in the future, then we can take the command line option out,
> > and everyone is happy.
> >
> > Sound reasonable?
>
> No - to start with it would be far saner for everything involved if the
> 4096 processor minority fixed it for the moment in their arch code by
> doing something like
>
> if (max_pids < PIDS_PER_CPU * num_cpus) {
> max_pids = ...
> printk(something informative)
> }
>
> in their __init marked code.
>
> Because when Tejun's stuff is in the patch can go away, and also if it's
> not sufficient then the patch above should keep it sane when they go to
> 32000 cpus or whatever is next.
>
> Alan

2010-04-22 13:57:45

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

> No - to start with it would be far saner for everything involved if the
> 4096 processor minority fixed it for the moment in their arch code by
> doing something like
>
> if (max_pids < PIDS_PER_CPU * num_cpus) {
> max_pids = ...
> printk(something informative)
> }
>
> in their __init marked code.

I don't understand how it would be possible for the arch maintainers
to predict what a particular machine's configuration would need for
PIDS_PER_CPU. Many of the extra pids needed on a per-cpu basis are
brought in by device drivers or subsystems.

Are you proposing a typical configuration be used for the basis or an
extreme configuration?

If your basis is the typical configuration, how would an administrator
of the extreme configuration get themselves out of the situation of
pids_max being too small without the same command line option.

If we use the extreme case, then we end up with a lot of extraneous pids,
however I don't see that as being too terrible of a situation.

Robin

2010-04-22 14:51:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2



On Thu, 22 Apr 2010, Alan Cox wrote:
>
> > Distros don't want to take a patch that adds a new boot param that is
> > not accepted upstream, otherwise they will be stuck forward porting it
> > from now until, well, forever :)
>
> So for an obscure IA64 specific problem you want the upstream kernel to
> port it forward forever instead ?

Ehh. Nobody does ia64 any more. It's dead, Jim.

This is x86. SGI finally long ago gave up on the Intel/HP clusterf*ck.

Which I'm not entirely sure makes the case for the kernel parameter much
stronger, though. I wonder if it's not more appropriate to just have a
total hack saying

if (max_pids < N * max_cpus) {
printk("We have %d CPUs, increasing max_pids to %d\n");
max_pids = N*max_cpus;
}

where "N" is just some random fudge-factor. It's reasonable to expect a
certain minimum number of processes per CPU, after all.

Linus

2010-04-22 17:08:11

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

> Which I'm not entirely sure makes the case for the kernel parameter much
> stronger, though. I wonder if it's not more appropriate to just have a
> total hack saying
>
> if (max_pids < N * max_cpus) {
> printk("We have %d CPUs, increasing max_pids to %d\n");
> max_pids = N*max_cpus;
> }
>
> where "N" is just some random fudge-factor. It's reasonable to expect a
> certain minimum number of processes per CPU, after all.

How about:

pid_max_min = max(pid_max_min, 19 * num_possible_cpus());
pid_max_baseline = 2048 * num_possible_cpus();

if (pid_max < pid_max_baseline) {
printk("We have %d CPUs, increasing pid_max to %d\n"...
pid_max = pid_max_baseline;
}


This would scale pid_max_min by a sane amount, leave the default value
of pid_max_min and pid_max untouched below 16 cpus and then scale both
up linearly beyond that.

Robin

2010-04-22 18:12:24

by John Stoffel

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

>>>>> "Robin" == Robin Holt <[email protected]> writes:

>> Which I'm not entirely sure makes the case for the kernel parameter much
>> stronger, though. I wonder if it's not more appropriate to just have a
>> total hack saying
>>
>> if (max_pids < N * max_cpus) {
>> printk("We have %d CPUs, increasing max_pids to %d\n");
>> max_pids = N*max_cpus;
>> }
>>
>> where "N" is just some random fudge-factor. It's reasonable to expect a
>> certain minimum number of processes per CPU, after all.

Robin> How about:

Robin> pid_max_min = max(pid_max_min, 19 * num_possible_cpus());
Robin> pid_max_baseline = 2048 * num_possible_cpus();

Robin> if (pid_max < pid_max_baseline) {
Robin> printk("We have %d CPUs, increasing pid_max to %d\n"...
Robin> pid_max = pid_max_baseline;
Robin> }


Robin> This would scale pid_max_min by a sane amount, leave the default value
Robin> of pid_max_min and pid_max untouched below 16 cpus and then scale both
Robin> up linearly beyond that.

Looks good, but how about some comments and some defines for the magic
numbers of 2048 and 19?

John

2010-04-22 20:36:26

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

On Thu, 22 Apr 2010 12:08:02 -0500
Robin Holt <[email protected]> wrote:

> > Which I'm not entirely sure makes the case for the kernel parameter much
> > stronger, though. I wonder if it's not more appropriate to just have a
> > total hack saying
> >
> > if (max_pids < N * max_cpus) {
> > printk("We have %d CPUs, increasing max_pids to %d\n");
> > max_pids = N*max_cpus;
> > }
> >
> > where "N" is just some random fudge-factor. It's reasonable to expect a
> > certain minimum number of processes per CPU, after all.
>
> How about:
>
> pid_max_min = max(pid_max_min, 19 * num_possible_cpus());
> pid_max_baseline = 2048 * num_possible_cpus();
>
> if (pid_max < pid_max_baseline) {
> printk("We have %d CPUs, increasing pid_max to %d\n"...
> pid_max = pid_max_baseline;
> }
>
>
> This would scale pid_max_min by a sane amount, leave the default value
> of pid_max_min and pid_max untouched below 16 cpus and then scale both
> up linearly beyond that.

Something like that would work. We shouild ensure that pid_max cannot
end up being less than the current PID_MAX_DEFAULT.

2010-04-25 07:16:14

by Pavel Machek

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

Hi!

> > > Distros don't want to take a patch that adds a new boot param that is
> > > not accepted upstream, otherwise they will be stuck forward porting it
> > > from now until, well, forever :)
> >
> > So for an obscure IA64 specific problem you want the upstream kernel to
> > port it forward forever instead ?
>
> Ehh. Nobody does ia64 any more. It's dead, Jim.
>
> This is x86. SGI finally long ago gave up on the Intel/HP clusterf*ck.
>
> Which I'm not entirely sure makes the case for the kernel parameter much
> stronger, though. I wonder if it's not more appropriate to just have a
> total hack saying
>
> if (max_pids < N * max_cpus) {
> printk("We have %d CPUs, increasing max_pids to %d\n");
> max_pids = N*max_cpus;
> }
>
> where "N" is just some random fudge-factor. It's reasonable to expect a
> certain minimum number of processes per CPU, after all.

Issue with max_pids is that it can break userspace, right?

At that point it seems saner to require a parameter --- just adding
cpus to the system should not do it...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2010-04-25 17:16:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2



On Sun, 25 Apr 2010, Pavel Machek wrote:
>
> Issue with max_pids is that it can break userspace, right?

Iirc, some _really_ old code used 'short' for pid_t, and we wanted to be
really safe when we raised the limits.

I seriously doubt we need to worry about old binaries like that on any 16+
CPU machines, though.

The other issue is just the size of the pidmap[] array. Instead of walking
all the processes to see "is this pid in use" (like I think the original
Linux kernel did), we have a bitmap of used pids. When you raise pid_max,
that bitmap obviously still needs to be big enough. Right now we allocate
that statically (rather than growing it dynamically), so we end up having
a _hard_ limit of PID_MAX_LIMIT too.

On 32-bit, I think that still maximum limit ends up being basically 32767.
So again, on a _legacy_ system, you end up being limited in the number of
pid_t entries.

Linus

2010-04-25 17:29:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2


On Sun, 25 Apr 2010, Linus Torvalds wrote:
>
> Iirc, some _really_ old code used 'short' for pid_t, and we wanted to be
> really safe when we raised the limits.

.. I dug into the history, and this is from August 2002..

We used to limit it to sixteen bits, but that was too tight even then for
some people, so first we did this:

Author: Linus Torvalds <[email protected]>
Date: Thu Aug 8 03:57:42 2002 -0700

Make pid allocation use 30 of the 32 bits, instead of 15.

diff --git a/include/linux/threads.h b/include/linux/threads.h
index 880b990..6804ee7 100644
--- a/include/linux/threads.h
+++ b/include/linux/threads.h
@@ -19,6 +19,7 @@
/*
* This controls the maximum pid allocated to a process
*/
-#define PID_MAX 0x8000
+#define PID_MASK 0x3fffffff
+#define PID_MAX (PID_MASK+1)

#endif
diff --git a/kernel/fork.c b/kernel/fork.c
index d40d246..017740d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -142,7 +142,7 @@ static int get_pid(unsigned long flags)
return 0;

spin_lock(&lastpid_lock);
- if((++last_pid) & 0xffff8000) {
+ if((++last_pid) & ~PID_MASK) {
last_pid = 300; /* Skip daemons etc. */
goto inside;
}
@@ -157,7 +157,7 @@ inside:
p->tgid == last_pid ||
p->session == last_pid) {
if(++last_pid >= next_safe) {
- if(last_pid & 0xffff8000)
+ if(last_pid & ~PID_MASK)
last_pid = 300;
next_safe = PID_MAX;
}

which just upped the limits. That, in turn, _did_ end up breaking some
silly old binaries, so then a month later Ingo did a "pid-max" patch
that made the maximum dynamic, with a default of the old 15-bit limit,
and a sysctl to raise it.

And then a couple of weeks later, Ingo did another patch to fix the
scalability problems we had with lots of pids (avoiding the whole
"for_each_task()" crud to figure out which pids were ok, and using a
'struct pid' instead).

So the whole worry about > 15-bit pids goes back to 2002. I think we're
pretty safe now.

Linus

2010-04-25 19:08:06

by Pavel Machek

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2

Hi!

> > Iirc, some _really_ old code used 'short' for pid_t, and we wanted to be
> > really safe when we raised the limits.
>
> .. I dug into the history, and this is from August 2002..
>
> We used to limit it to sixteen bits, but that was too tight even then for
> some people, so first we did this:
>
> Author: Linus Torvalds <[email protected]>
> Date: Thu Aug 8 03:57:42 2002 -0700
>
> Make pid allocation use 30 of the 32 bits, instead of 15.
...
> which just upped the limits. That, in turn, _did_ end up breaking some
> silly old binaries, so then a month later Ingo did a "pid-max" patch
> that made the maximum dynamic, with a default of the old 15-bit limit,
> and a sysctl to raise it.
>
> And then a couple of weeks later, Ingo did another patch to fix the
> scalability problems we had with lots of pids (avoiding the whole
> "for_each_task()" crud to figure out which pids were ok, and using a
> 'struct pid' instead).
>
> So the whole worry about > 15-bit pids goes back to 2002. I think we're
> pretty safe now.

>From principle of least surprise PoV: breaking old userspace when you
pass special config option is less surpising than breaking old
userspace when you add more CPUs.

Whether the breakage will be common enough that this matters is other
question.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2010-04-26 19:48:15

by Mike Travis

[permalink] [raw]
Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3

Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3
From: Hedi Berriche <[email protected]>

On a system with a substantial number of processors, the early default pid_max
of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
started before the login prompt. It's estimated that with 2048 CPU's we will pass
the 32k limit. With 4096, we'll reach that limit very early during the boot cycle,
and processes would stall waiting for an available pid.

This patch increases the early maximum number of pids available, and increases
the minimum number of pids that can be set during runtime.

Signed-off-by: Hedi Berriche <[email protected]>
Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Robin Holt <[email protected]>

---
include/linux/threads.h | 9 +++++++++
kernel/pid.c | 7 +++++++
2 files changed, 16 insertions(+)

--- linux-2.6.32.orig/include/linux/threads.h
+++ linux-2.6.32/include/linux/threads.h
@@ -33,4 +33,13 @@
#define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
(sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))

+/*
+ * Define a minimum number of pids per cpu. Heuristically based
+ * on original pid max of 32k for 32 cpus. Also, increase the
+ * minimum settable value for pid_max on the running system based
+ * on similar defaults. See kernel/pid.c:pidmap_init() for details.
+ */
+#define PIDS_PER_CPU_DEFAULT 1024
+#define PIDS_PER_CPU_MIN 8
+
#endif
--- linux-2.6.32.orig/kernel/pid.c
+++ linux-2.6.32/kernel/pid.c
@@ -511,6 +511,13 @@ void __init pidhash_init(void)

void __init pidmap_init(void)
{
+ /* bump default and minimum pid_max based on number of cpus */
+ pid_max = min(pid_max_max, max(pid_max,
+ PIDS_PER_CPU_DEFAULT * num_possible_cpus()));
+ pid_max_min = max(pid_max_min,
+ PIDS_PER_CPU_MIN * num_possible_cpus());
+ pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min);
+
init_pid_ns.pidmap[0].page = kzalloc(PAGE_SIZE, GFP_KERNEL);
/* Reserve PID 0. We never call free_pidmap(0) */
set_bit(0, init_pid_ns.pidmap[0].page);

2010-04-26 20:47:22

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3

On Mon, Apr 26, 2010 at 12:48:09PM -0700, Mike Travis wrote:
> Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3

Your subject is now incorrect, based on the patch. You should also
adjust the body of the changelog to reflect the code change.

thanks,

greg k-h

2010-04-27 00:42:10

by Mike Travis

[permalink] [raw]
Subject: [Patch 1/1] init: Increase pid_max based on num_possible_cpus v4

Subject: [Patch 1/1] init: Increase pid_max based on num_possible_cpus v4
From: Hedi Berriche <[email protected]>

On a system with a substantial number of processors, the early default pid_max
of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes
started before the login prompt. It's estimated that with 2048 CPU's we will pass
the 32k limit. With 4096, we'll reach that limit very early during the boot cycle,
and processes would stall waiting for an available pid.

This patch increases the early maximum number of pids available, and increases
the minimum number of pids that can be set during runtime.

Signed-off-by: Hedi Berriche <[email protected]>
Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Robin Holt <[email protected]>

---
Version 4: Fix subject line

Version 3: Automatically increase pid_max based on number of cpus instead of
adding a cmdline option for the operator to set it.
---
include/linux/threads.h | 9 +++++++++
kernel/pid.c | 7 +++++++
2 files changed, 16 insertions(+)

--- linux-2.6.32.orig/include/linux/threads.h
+++ linux-2.6.32/include/linux/threads.h
@@ -33,4 +33,13 @@
#define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
(sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))

+/*
+ * Define a minimum number of pids per cpu. Heuristically based
+ * on original pid max of 32k for 32 cpus. Also, increase the
+ * minimum settable value for pid_max on the running system based
+ * on similar defaults. See kernel/pid.c:pidmap_init() for details.
+ */
+#define PIDS_PER_CPU_DEFAULT 1024
+#define PIDS_PER_CPU_MIN 8
+
#endif
--- linux-2.6.32.orig/kernel/pid.c
+++ linux-2.6.32/kernel/pid.c
@@ -511,6 +511,13 @@ void __init pidhash_init(void)

void __init pidmap_init(void)
{
+ /* bump default and minimum pid_max based on number of cpus */
+ pid_max = min(pid_max_max, max(pid_max,
+ PIDS_PER_CPU_DEFAULT * num_possible_cpus()));
+ pid_max_min = max(pid_max_min,
+ PIDS_PER_CPU_MIN * num_possible_cpus());
+ pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min);
+
init_pid_ns.pidmap[0].page = kzalloc(PAGE_SIZE, GFP_KERNEL);
/* Reserve PID 0. We never call free_pidmap(0) */
set_bit(0, init_pid_ns.pidmap[0].page);

2010-04-27 00:43:17

by Mike Travis

[permalink] [raw]
Subject: Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3



Greg KH wrote:
> On Mon, Apr 26, 2010 at 12:48:09PM -0700, Mike Travis wrote:
>> Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3
>
> Your subject is now incorrect, based on the patch. You should also
> adjust the body of the changelog to reflect the code change.
>
> thanks,
>
> greg k-h

Thanks for that catch. I had changed the name of the patch, but not the subject.

-Mike