LinuxLists.cc - pid_t range question

2006-02-07 16:24:01

Subject: pid_t range question

On Linux, type pid_t is defined as an int if you look
through all the intermediate definitions such as S32_T,
etc. However, it wraps at 32767, the next value being 300.

Does anybody know why it doesn't go to 0x7fffffff and
then wrap to the first unused pid value? I know the
code "reserves" the first 300 pids. That's not the
question. I wonder why. Also I see the code setting
the upper limit as well. I want to know why it is
set within the range of a short and is not allowed
to use the full range of an int. Nothing I see in
the kernel, related to the pid, ever uses a short
and no 'C' runtime interface limits this either!

Also, attempts to change /proc/sys/kernel/pid_max fail
if I attempt to increase it, but I can decrease it
to where I don't have enough pids available to fork()
the next command! Is this the correct behavior?

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
_

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-07 22:17:06

by Eric W. Biederman

[permalink] [raw]

Subject: Re: pid_t range question

"linux-os \(Dick Johnson\)" <[email protected]> writes:

> On Linux, type pid_t is defined as an int if you look
> through all the intermediate definitions such as S32_T,
> etc. However, it wraps at 32767, the next value being 300.
>
> Does anybody know why it doesn't go to 0x7fffffff and
> then wrap to the first unused pid value? I know the
> code "reserves" the first 300 pids. That's not the
> question. I wonder why. Also I see the code setting
> the upper limit as well. I want to know why it is
> set within the range of a short and is not allowed
> to use the full range of an int. Nothing I see in
> the kernel, related to the pid, ever uses a short
> and no 'C' runtime interface limits this either!

I have a vague memory about some old kernel interfaces
where pid was a short. That said 32768 is also the number
of bits in a page so it is a very good number for the bitmap
allocator we currently have.

I know for certain that proc assumes it can fit pid in
the upper bits of an ino_t taking the low 16bits for itself
so that may the entire reason for the limit.

> Also, attempts to change /proc/sys/kernel/pid_max fail
> if I attempt to increase it, but I can decrease it
> to where I don't have enough pids available to fork()
> the next command! Is this the correct behavior?

You can increase pid_max if you have a 64bit kernel.

Eric

2006-02-08 00:19:58

by Ulrich Drepper

[permalink] [raw]

Subject: Re: pid_t range question

On 2/7/06, Eric W. Biederman <[email protected]> wrote:
> I know for certain that proc assumes it can fit pid in
> the upper bits of an ino_t taking the low 16bits for itself
> so that may the entire reason for the limit.

Is this still the case? For the 100,000 threads tests Ingo and I were
running Ingo certainly came up with some patches to make /proc behave
better. This was before we had subdirs for thread groups.

Anyway, I think we should put a reasonable top on the number of bits
for the PIDs. One reason is that the current (and fastest) design for
more complex mutexes needs to encode more information than the PID in
an 'int'. See the latest robust mutex patches for an example. If the
limit could be, say, 28 bits that would still enable using more
processes and threads then anybody wants so far. Who know, when we
hit this limit, maybe we have separate namespaces. If not, we can
still fix the existing limits but this would come at a cost which is
why I think it's not worth doing now.

2006-02-08 02:41:41

by Eric W. Biederman

[permalink] [raw]

Subject: Re: pid_t range question

Ulrich Drepper <[email protected]> writes:

> On 2/7/06, Eric W. Biederman <[email protected]> wrote:
>> I know for certain that proc assumes it can fit pid in
>> the upper bits of an ino_t taking the low 16bits for itself
>> so that may the entire reason for the limit.
>
> Is this still the case? For the 100,000 threads tests Ingo and I were
> running Ingo certainly came up with some patches to make /proc behave
> better. This was before we had subdirs for thread groups.

It isn't too hard to change but it is still the case. Truth is proc
really doesn't use inodes internally it is just a reporting thing.
So /proc will work but user space might get terribly confused.

> Anyway, I think we should put a reasonable top on the number of bits
> for the PIDs. One reason is that the current (and fastest) design for
> more complex mutexes needs to encode more information than the PID in
> an 'int'. See the latest robust mutex patches for an example. If the
> limit could be, say, 28 bits that would still enable using more
> processes and threads then anybody wants so far. Who know, when we
> hit this limit, maybe we have separate namespaces. If not, we can
> still fix the existing limits but this would come at a cost which is
> why I think it's not worth doing now.

>From threads.h:
> /*
> * This controls the default maximum pid allocated to a process
> */
> #define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)
>
> /*
> * A maximum of 4 million PIDs should be enough for a while:
> */
> #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
> (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
>

So I think as long as this is a kernel implementation things should work ok.
I hate to have user space make assumptions about how many bits are in a pid
though.

Eric

2006-02-09 16:58:38

by Jan Engelhardt

[permalink] [raw]

Subject: Re: pid_t range question

>> On Linux, type pid_t is defined as an int if you look
>> through all the intermediate definitions such as S32_T,
>> etc. However, it wraps at 32767, the next value being 300.

There is also an aesthetical reason. If pids were allowed to exceed, say,
ten million, you would need a quite wide field in `ps` for the process
number which is on "normal desktop user" systems just require 5 or 6
decimal places. Well, what I mean, just look at this sample ps output:

17:59 shanghai:../fs/proc # ps
PID TTY TIME CMD
1 - 00:00:00 init [3]
4215914607 tty2 00:00:00 bash
4215914653 tty2 00:00:00 ps

mingw/msys and cygwin already have this "cosmetic problem" since windows
"pids" are usually above one million.

>> I know the
>> code "reserves" the first 300 pids.

I cannot confirm that. When I start in "-b" mode and 'use' up all pids by
repeatedly executing /bin/noop, I someday get pids as low as 10
again, defined by how many kernel threads there are active before /bin/bash
started.

>I know for certain that proc assumes it can fit pid in
>the upper bits of an ino_t taking the low 16bits for itself
>so that may the entire reason for the limit.
>
inode number in /proc/XXX/fd creation currently is, IIRC
ino = (pid << 16) | fd
which limits both pid to 16 bits and the fdtable to 16 bits. See
fs/proc/inode-alloc.txt. At best, procfs should start using 64bit inode
numbers.

Jan Engelhardt
--

2006-02-09 18:12:44

by Eric W. Biederman

[permalink] [raw]

Subject: Re: pid_t range question

Jan Engelhardt <[email protected]> writes:

>>> On Linux, type pid_t is defined as an int if you look
>>> through all the intermediate definitions such as S32_T,
>>> etc. However, it wraps at 32767, the next value being 300.
>
> There is also an aesthetical reason. If pids were allowed to exceed, say,
> ten million, you would need a quite wide field in `ps` for the process
> number which is on "normal desktop user" systems just require 5 or 6
> decimal places. Well, what I mean, just look at this sample ps output:
>
> 17:59 shanghai:../fs/proc # ps
> PID TTY TIME CMD
> 1 - 00:00:00 init [3]
> 4215914607 tty2 00:00:00 bash
> 4215914653 tty2 00:00:00 ps
>
> mingw/msys and cygwin already have this "cosmetic problem" since windows
> "pids" are usually above one million.

Yes. Although this I'm not I'm not certain how bad the cosmetic problem
is. Certainly significant enough that we don't want to change a good
thing when we got it. But if there were real problems a big pid
would solve I don't expect large pid numbers to stop us.

>>> I know the
>>> code "reserves" the first 300 pids.
>
> I cannot confirm that. When I start in "-b" mode and 'use' up all pids by
> repeatedly executing /bin/noop, I someday get pids as low as 10
> again, defined by how many kernel threads there are active before /bin/bash
> started.

Odd. When the search wraps it starts searching at 300.
Still there are no locks around last_pid.

>>I know for certain that proc assumes it can fit pid in
>>the upper bits of an ino_t taking the low 16bits for itself
>>so that may the entire reason for the limit.
>>
> inode number in /proc/XXX/fd creation currently is, IIRC
> ino = (pid << 16) | fd
> which limits both pid to 16 bits and the fdtable to 16 bits. See
> fs/proc/inode-alloc.txt. At best, procfs should start using 64bit inode
> numbers.

Well it does use 64bit inode numbers but only on 64bit systems.
Internally /proc doesn't care about the inode it is only for keep find
and friends from getting confused.

Figuring out how to use find_inode_number would likely be interesting,
and a random inode allocation scheme would be interesting.

Eric

2006-02-09 20:13:29

by Jesper Juhl

[permalink] [raw]

Subject: Re: pid_t range question

On 2/9/06, Eric W. Biederman <[email protected]> wrote:
> Jan Engelhardt <[email protected]> writes:
>
> >>> On Linux, type pid_t is defined as an int if you look
> >>> through all the intermediate definitions such as S32_T,
> >>> etc. However, it wraps at 32767, the next value being 300.
> >
> > There is also an aesthetical reason. If pids were allowed to exceed, say,
> > ten million, you would need a quite wide field in `ps` for the process
> > number which is on "normal desktop user" systems just require 5 or 6
> > decimal places. Well, what I mean, just look at this sample ps output:
> >
> > 17:59 shanghai:../fs/proc # ps
> > PID TTY TIME CMD
> > 1 - 00:00:00 init [3]
> > 4215914607 tty2 00:00:00 bash
> > 4215914653 tty2 00:00:00 ps
> >
> > mingw/msys and cygwin already have this "cosmetic problem" since windows
> > "pids" are usually above one million.
>
> Yes. Although this I'm not I'm not certain how bad the cosmetic problem
> is. Certainly significant enough that we don't want to change a good
> thing when we got it. But if there were real problems a big pid
> would solve I don't expect large pid numbers to stop us.
>

I can think of at least 3 ways to at least hide that cosmetic problem
a bit. Won't solve the problem but will make it less likely that most
people will ever encounter it.

(assuming below that we want something like 64bit pids but want to
keep pids at 5 digits as much as possible)

1. When allocating a pid for a new process, always assign the lowest
available free pid.

2. Allocate pid's as we currently do, but once we hit 99999 wrap the
pids and start allocating from free pids starting from 2 and up. only
if no pids below 99999 are free do we continue upwards and allocate
pid 100000.

3. Whenever a process terminates put its pid on a pid_reuse list. When
a new pid needs to be allocated always pick a pid from the pid_reuse
list if any are available, otherwise allocate pids as we currently do.

Any of those 3 scheemes should keep pids below 6 digits as much as
possible. We can still hit the cosmetic problem on boxes where more
than 99999 processes are actually running at the same time, but most
users will never encounter that.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-10 01:19:45

by Bodo Eggert

[permalink] [raw]

Subject: Re: pid_t range question

Jesper Juhl <[email protected]> wrote:

> I can think of at least 3 ways to at least hide that cosmetic problem
[of /bin/ps]
> a bit. Won't solve the problem but will make it less likely that most
> people will ever encounter it.
>
> (assuming below that we want something like 64bit pids but want to
> keep pids at 5 digits as much as possible)
[...]
> 2. Allocate pid's as we currently do, but once we hit 99999 wrap the
> pids and start allocating from free pids starting from 2 and up. only
> if no pids below 99999 are free do we continue upwards and allocate
> pid 100000.

2b) don't try that hard, and hopefully speed things up:

<pseudocode>
for (max = 3814; /* max == 999817216 * 8 */; max = max * 8) {
// the above values result from int(1000000000/8^6){,*8^6}
newpid = random(max)+1;
if allocate_pid(newpid)
goto got_the_pid;
// repeat the above in order to make it less likely
// to get a high PID? I hope it's not nescensary.
if (max == 999817216) // otherwise an uint32 will overflow
break;
}
// possible here to increase the chance for a low pid but also for
// long runs while searching for the first free pid:
// newpid = random(99999)+1;
pid_search_stop = newpid;
while (++newpid != pid_search_stop) {
if allocate_pid(newpid)
goto got_the_pid;
}
got_the_pid:
</pseudocode>

TOSOLVE:

Find a cheap random function.

What to do on 4294967295 allocated processes?

Eternal starvation if nearly 4294967295 are present and the right ones
get stopped/started?

How to get CPU power to run 4294967295 processes?

--
Ich danke GMX daf?r, die Verwendung meiner Adressen mittels per SPF
verbreiteten L?gen zu sabotieren.

2006-02-10 13:22:00

by Jan Engelhardt

[permalink] [raw]

Subject: Re: pid_t range question

>
>Any of those 3 scheemes should keep pids below 6 digits as much as
>possible. We can still hit the cosmetic problem on boxes where more
>than 99999 processes are actually running at the same time, but most
>users will never encounter that.
>
I'd say let's remain doing whatever we're doing now. That is, a maximum of
32768 concurrent pids, and whoever needs more (e.g. Sourceforge shell,
etc.) can always raise it to their needs.

Jan Engelhardt
--

2006-02-15 20:41:11

by David Lang

[permalink] [raw]

Subject: Re: pid_t range question

On Fri, 10 Feb 2006, Jan Engelhardt wrote:

>> Any of those 3 scheemes should keep pids below 6 digits as much as
>> possible. We can still hit the cosmetic problem on boxes where more
>> than 99999 processes are actually running at the same time, but most
>> users will never encounter that.
>>
> I'd say let's remain doing whatever we're doing now. That is, a maximum of
> 32768 concurrent pids, and whoever needs more (e.g. Sourceforge shell,
> etc.) can always raise it to their needs.

when you say 'continue doing what we are doing now' do you mean to include
the hard-coded limit of 32K pids? or do you mean to not worry about the
cosmetic issue and change the code to not hard-code the limit, but instead
honor a max_pid >32K?

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-02-15 21:01:54

by Eric W. Biederman

[permalink] [raw]

Subject: Re: pid_t range question

David Lang <[email protected]> writes:

> On Fri, 10 Feb 2006, Jan Engelhardt wrote:
>
>>> Any of those 3 scheemes should keep pids below 6 digits as much as
>>> possible. We can still hit the cosmetic problem on boxes where more
>>> than 99999 processes are actually running at the same time, but most
>>> users will never encounter that.
>>>
>> I'd say let's remain doing whatever we're doing now. That is, a maximum of
>> 32768 concurrent pids, and whoever needs more (e.g. Sourceforge shell,
>> etc.) can always raise it to their needs.
>
> when you say 'continue doing what we are doing now' do you mean to include the
> hard-coded limit of 32K pids? or do you mean to not worry about the cosmetic
> issue and change the code to not hard-code the limit, but instead honor a
> max_pid >32K?

We actually do honor a max_pid > 32K but only if we are 64bit.

We need to fix /proc and resolve the issue that 32K pids takes about 320M
of RAM. Which is 1/2 to 1/3 of all of low memory, on a 32bit box, if
we want a hight max_pid than 32K. Of course 32K is also a very nice
number for the pid bitmap allocator as it is only 1 page.

With about 80K task structures+stack the machine goes 00M, because you
have exhausted all of low memory.

Eric

2006-02-17 16:38:51

by Jan Engelhardt

[permalink] [raw]

Subject: Re: pid_t range question

>> > Any of those 3 scheemes should keep pids below 6 digits as much as
>> > possible. We can still hit the cosmetic problem on boxes where more
>> > than 99999 processes are actually running at the same time, but most
>> > users will never encounter that.
>> >
>> I'd say let's remain doing whatever we're doing now. That is, a maximum of
>> 32768 concurrent pids, and whoever needs more (e.g. Sourceforge shell,
>> etc.) can always raise it to their needs.
>
> when you say 'continue doing what we are doing now' do you mean to include the
> hard-coded limit of 32K pids? or do you mean to not worry about the cosmetic
> issue and change the code to not hard-code the limit, but instead honor a
> max_pid >32K?
>
Stay with the 32K limit. I doubt the majority of users ever exceeds
creating 32767 simultaneous processes.

Jan Engelhardt
--

2006-02-17 21:21:25

by David Lang

[permalink] [raw]

Subject: Re: pid_t range question

On Fri, 17 Feb 2006, Jan Engelhardt wrote:

>>>> Any of those 3 scheemes should keep pids below 6 digits as much as
>>>> possible. We can still hit the cosmetic problem on boxes where more
>>>> than 99999 processes are actually running at the same time, but most
>>>> users will never encounter that.
>>>>
>>> I'd say let's remain doing whatever we're doing now. That is, a maximum of
>>> 32768 concurrent pids, and whoever needs more (e.g. Sourceforge shell,
>>> etc.) can always raise it to their needs.
>>
>> when you say 'continue doing what we are doing now' do you mean to include the
>> hard-coded limit of 32K pids? or do you mean to not worry about the cosmetic
>> issue and change the code to not hard-code the limit, but instead honor a
>> max_pid >32K?
>>
> Stay with the 32K limit. I doubt the majority of users ever exceeds
> creating 32767 simultaneous processes.

I agree that the mojority of users don't hit this limit, but I've got a
couple of boxes that push it (they run out of ram before that, but more
ram is on order).

however it sounds like switching to a 64 bit kernel will avoid this limit,
so I'll put my efforts into configuring a box to do that.

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-02-17 21:41:05

by Eric W. Biederman

[permalink] [raw]

Subject: Re: pid_t range question

David Lang <[email protected]> writes:

> I agree that the mojority of users don't hit this limit, but I've got a couple
> of boxes that push it (they run out of ram before that, but more ram is on
> order).
>
> however it sounds like switching to a 64 bit kernel will avoid this limit, so
> I'll put my efforts into configuring a box to do that.

That is what I would recommend. Unless you do something weird an painful
like configure a kernel doing the 4G/4G split a 32bit box is going to
have memory problems with more than 32K tasks.

Just remember you need push up /proc/sys/kernel/pid-max to raise the default
on a 64bit box.

Eric

2006-02-21 11:22:46

by Herbert Poetzl

[permalink] [raw]

Subject: Re: pid_t range question

On Fri, Feb 17, 2006 at 02:39:55PM -0700, Eric W. Biederman wrote:
> David Lang <[email protected]> writes:
>
> > I agree that the mojority of users don't hit this limit, but I've
> > got a couple of boxes that push it (they run out of ram before that,
> > but more ram is on order).
> >
> > however it sounds like switching to a 64 bit kernel will avoid this
> > limit, so I'll put my efforts into configuring a box to do that.
>
> That is what I would recommend. Unless you do something weird an
> painful like configure a kernel doing the 4G/4G split a 32bit box is
> going to have memory problems with more than 32K tasks.

or configure the newly introduced 2.13/1.87 (or 1/3 split)

I don't think low memory is really the issue here, the
scheduling of 32k+ tasks is much more a problem ...

best,
Herbert

> Just remember you need push up /proc/sys/kernel/pid-max to raise the
> default on a 64bit box.
>
> Eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/