2002-09-06 14:00:32

by Paul Larson

[permalink] [raw]
Subject: pid_max hang again...

In the nightly bk pull testing I do, I saw that this got commited
yesterday:

[email protected], 2002-09-05 08:45:49-07:00, [email protected]
- [PATCH] pid-max-2.5.33-A0
-
- This is the pid-max patch, the one i sent for 2.5.31 was botched. I
- have removed the 'once' debugging stupidity - now PIDs start at 0
- again.
- Also, for an unknown reason the previous patch missed the hunk that
- had the declaration of 'DEFAULT_PID_MAX' which made it not compile

It looks like this change dropped us back to the same error all this was
originally supposed to fix. When you hit PID_MAX, get_pid() starts
looping forever looking for a free pid and hangs. I could probably make
my original fix work on this very easily if you'd like.

I wonder though, would it be possible to do this in a more simple way by
just throttling max_threads back to something more sane if it gets
defaulted too high? Since it gets checked before we even get to the
get_pid call in copy_process(). That would keep the number of processes
down to a sane level without the risk.

Thanks,
Paul Larson
http://www.linuxtestproject.org


2002-09-06 15:37:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: pid_max hang again...


On 6 Sep 2002, Paul Larson wrote:

> It looks like this change dropped us back to the same error all this was
> originally supposed to fix. When you hit PID_MAX, get_pid() starts
> looping forever looking for a free pid and hangs. I could probably make
> my original fix work on this very easily if you'd like.

yes please send a patch for this. Reintroduction of the looping bug was
unintended.

> I wonder though, would it be possible to do this in a more simple way by
> just throttling max_threads back to something more sane if it gets
> defaulted too high? Since it gets checked before we even get to the
> get_pid call in copy_process(). That would keep the number of processes
> down to a sane level without the risk.

this is a good approach as well, but now pid_max can be adjusted runtime
so truncating max_threads as a side-effect looks a bit problematic. We
should rather fail the fork() cleanly.

Ingo

2002-09-06 15:54:56

by Paul Larson

[permalink] [raw]
Subject: Re: pid_max hang again...

On Fri, 2002-09-06 at 10:39, Ingo Molnar wrote:
>
> On 6 Sep 2002, Paul Larson wrote:
>
> > It looks like this change dropped us back to the same error all this was
> > originally supposed to fix. When you hit PID_MAX, get_pid() starts
> > looping forever looking for a free pid and hangs. I could probably make
> > my original fix work on this very easily if you'd like.
>
> yes please send a patch for this. Reintroduction of the looping bug was
> unintended.
>
> > I wonder though, would it be possible to do this in a more simple way by
> > just throttling max_threads back to something more sane if it gets
> > defaulted too high? Since it gets checked before we even get to the
> > get_pid call in copy_process(). That would keep the number of processes
> > down to a sane level without the risk.
>
> this is a good approach as well, but now pid_max can be adjusted runtime
> so truncating max_threads as a side-effect looks a bit problematic. We
> should rather fail the fork() cleanly.
I agree, unless this was just going to be temporary. I'll pull the
get_pid() fix up to the current version and send it in a bit.

Thanks,
Paul Larson

2002-09-06 17:51:03

by Paul Larson

[permalink] [raw]
Subject: [PATCH] pid_max hang again...

On Fri, 2002-09-06 at 10:39, Ingo Molnar wrote:
> On 6 Sep 2002, Paul Larson wrote:
> > It looks like this change dropped us back to the same error all this was
> > originally supposed to fix. When you hit PID_MAX, get_pid() starts
> > looping forever looking for a free pid and hangs. I could probably make
> > my original fix work on this very easily if you'd like.
>
> yes please send a patch for this. Reintroduction of the looping bug was
> unintended.

Here's the original one without all the bitmap stuff, JUST the fix for
the hang problem plain and simple. It should apply cleanly against the
current bk. The only other small bit in here moves the test for flags &
CLONE_IDLETASK outside of get_pid since we can skip the unnecessary call
to get_pid if it's true. Except for the last part, this is the same
thing I put into 2.4 some time ago.

Please comment or apply.

Thanks,
Paul Larson

diff -Nru a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c Fri Sep 6 20:19:21 2002
+++ b/kernel/fork.c Fri Sep 6 20:19:21 2002
@@ -28,6 +28,7 @@
#include <linux/security.h>
#include <linux/futex.h>
#include <linux/ptrace.h>
+#include <linux/compiler.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -162,12 +163,10 @@
static int get_pid(unsigned long flags)
{
struct task_struct *p;
- int pid;
-
- if (flags & CLONE_IDLETASK)
- return 0;
+ int pid, beginpid;

spin_lock(&lastpid_lock);
+ beginpid = last_pid;
if (++last_pid > pid_max) {
last_pid = 300; /* Skip daemons etc. */
goto inside;
@@ -188,6 +187,8 @@
last_pid = 300;
next_safe = pid_max;
}
+ if (unlikely(last_pid == beginpid))
+ goto nomorepids;
goto repeat;
}
if (p->pid > last_pid && next_safe > p->pid)
@@ -205,6 +206,11 @@
spin_unlock(&lastpid_lock);

return pid;
+
+nomorepids:
+ read_unlock(&tasklist_lock);
+ spin_unlock(&lastpid_lock);
+ return 0;
}

static inline int dup_mmap(struct mm_struct * mm)
@@ -707,7 +713,13 @@
p->state = TASK_UNINTERRUPTIBLE;

copy_flags(clone_flags, p);
- p->pid = get_pid(clone_flags);
+ if (clone_flags & CLONE_IDLETASK)
+ p->pid = 0;
+ else {
+ p->pid = get_pid(clone_flags);
+ if (p->pid == 0)
+ goto bad_fork_cleanup;
+ }
p->proc_dentry = NULL;

INIT_LIST_HEAD(&p->run_list);

2002-09-06 21:02:00

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...


Searching for a free pid value and inserting the thread into the task
list should be atomic, otherwise the same pid value could be given to 2
threads.

do_fork runs without the BLK, you might have to search for the pid
within the write_lock_irq(&tasklist_lock) block.

--
Manfred

2002-09-07 07:58:18

by Hanumanthu. H

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

Hi,

The following patch tries to fix the max_pid hang problem. The
idea is to make use of find_task_by_pid(), rather than looping
over entire tasklist. We can make use of the fact that, the
maximum number of processes at any time does not exceed max value
of a 30-bit integer (last_pid).

As Manfred pointed out, pid allocation and inserting it into task
list should be atomic. But going by the range of pids available
in 2.5.33 (Linux made it 30-bits), it is unlikely that same pid
will be given to two processes, since last_pid is protected by
single lock. If a process gets a pid of x, all following processes
will get pids > x. So, another process getting pid x again means,
last_pid looped over it's maximum value and comes back to x while
the first process which got pid x is not yet inserted into the
list. This is definitely unlikely.


Given patch works as follows

get_pid starts by incrementing last_pid as usual. But when it
crosses PID_MAX, it sets a variable max_pid_cross to TRUE (1).
If last_pid is within PID_MAX and max_pid_cross is not set, we
are sure that we got a free pid, which is/was not yet allocated
to any process.

If last_pid is within PID_MAX and max_pid_cross is set, this
pid might have been given to another process. So, goto the
corresponding hashlist and check for its existence. If no task
with given pid found, then get_pid is free to return pid as the
available pid. If there is a process last_pid as its pid,
then increment last_pid, continue the looping in the respective
hashlist. get_pid() holds tasklist_lock before calling
find_task_by_pid() and releases it after find_task_by_pid()
returns. This ensures that any exiting processes will get
a chance to acquire the tasklist_lock and remove its entry
from tasklist. Since last_pid is protected by a single lock
we are sure that get_pid() will surely returns a free pid
without hang.

The patch applies to 2.5.33 kernel

Any comments/flames ? If nothing, please consider to apply.


~Hanumanthu
Wipro Ltd.


diff -Nru linux-2.5.33/kernel/fork.c linux-2.5.33.1/kernel/fork.c

--- linux-2.5.33/kernel/fork.c Sun Sep 1 03:34:49 2002
+++ linux-2.5.33.1/kernel/fork.c Sat Sep 7 15:04:49 2002
@@ -76,7 +76,6 @@
}
}

-/* Protects next_safe and last_pid. */
void add_wait_queue(wait_queue_head_t *q, wait_queue_t * wait)
{
unsigned long flags;
@@ -151,50 +150,34 @@
return tsk;
}

+/* Protects next_safe and last_pid. */
spinlock_t lastpid_lock = SPIN_LOCK_UNLOCKED;

static int get_pid(unsigned long flags)
{
- static int next_safe = PID_MAX;
- struct task_struct *p;
+ static int max_pid_cross;
int pid;

if (flags & CLONE_IDLETASK)
return 0;

spin_lock(&lastpid_lock);
- if((++last_pid) & ~PID_MASK) {
+
+ if ((++last_pid) & ~PID_MASK) {
+ max_pid_cross = 1;
last_pid = 300; /* Skip daemons etc. */
- goto inside;
- }
- if(last_pid >= next_safe) {
-inside:
- next_safe = PID_MAX;
- read_lock(&tasklist_lock);
- repeat:
- for_each_task(p) {
- if(p->pid == last_pid ||
- p->pgrp == last_pid ||
- p->tgid == last_pid ||
- p->session == last_pid) {
- if(++last_pid >= next_safe) {
- if(last_pid & ~PID_MASK)
- last_pid = 300;
- next_safe = PID_MAX;
- }
- goto repeat;
- }
- if(p->pid > last_pid && next_safe > p->pid)
- next_safe = p->pid;
- if(p->pgrp > last_pid && next_safe > p->pgrp)
- next_safe = p->pgrp;
- if(p->tgid > last_pid && next_safe > p->tgid)
- next_safe = p->tgid;
- if(p->session > last_pid && next_safe > p->session)
- next_safe = p->session;
- }
+ } else if(!max_pid_cross)
+ goto get_out;
+repeat:
+ read_lock(&tasklist_lock);
+ if (find_task_by_pid(last_pid)) {
read_unlock(&tasklist_lock);
+ if ((++last_pid) & ~PID_MASK)
+ last_pid = 300;
+ goto repeat;
}
+ read_unlock(&tasklist_lock);
+get_out:
pid = last_pid;
spin_unlock(&lastpid_lock);


2002-09-07 09:01:36

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

> As Manfred pointed out, pid allocation and inserting it into task
> list should be atomic. But going by the range of pids available
> in 2.5.33 (Linux made it 30-bits), it is unlikely that same pid
> will be given to two processes, since last_pid is protected by
> single lock.
Right now there is quite a lot of code between get_pid and the insertion
into the task list: copy mm, files, etc.

And just before the endless loop in get_pid(), there is only one pid
left --> probability of getting the same pid again is high. If you fix
the hang, you should fix the atomicity, too.

> If last_pid is within PID_MAX and max_pid_cross is set, this
> pid might have been given to another process. So, goto the
> corresponding hashlist and check for its existence. If no task
> with given pid found, then get_pid is free to return pid as the
> available pid.

Doesn't work: find_task_by_pid() only checks task->pid. But the result
of get_pid mustn't be in use as a session or tgid value either

--
Manfred

2002-09-09 14:01:07

by Hanumanthu. H

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...


On Sat, 7 Sep 2002, Manfred Spraul wrote:

>
> Doesn't work: find_task_by_pid() only checks task->pid. But
> the result of get_pid mustn't be in use as a session or tgid
> value either
>

Thanks to Manfred, to clarfy me further that my patch doesn't
work. But I am still in feeling that current pid allocation can
be improved. If we assume that the atomicity problem with pid
allocation and linking task struct into process list can be
ignored (sorry Manfred), does it make any sense to maintain a
list of `freed' pids ? For me (some of others too !!), it makes
lot of sense to re-use pids of exited processes.

If I am not wrong, an exiting process pid can be re-used (after
the task dies) if its pid

1) not used as session ID (i.e p->session != p->pid)
(or p->leader == 0)
2) not used as process group ID (p->pgrp != p->pid)
3) not used as thread group-ID (p->tgid != p->pid)

Lets us maintain a list of atmost (PAGE_SIZE /sizeof(pid))
entries for freed pids, protected by a spin_lock (freepid_lock)
and let last_pid start from (300 + num_free_pids).

get_pid() looks in freepid_list first to get one free pid.
If it finds one, sets its entry to zero and return this pid.
If it couldn't find any free pid, then it continues as usual
by incrementing last_pid and searching through entire task
list. The hang problem might be due to holding the lock
forever if pids are not free. Current approach does not
re-look at the pids of exiting processes which might be
waiting on tasklist_lock to free up its task_struct.


get_pid() {

if(pid = get_free_pid()) return pid;
// usual code of incrementing last_pid
// and comparing next_safe + searching in
// tasklist and exiting (with Paul's fix).
}

get_free_pid() searches in freepid_list returns a free
pid if it finds one, otherwise 0.

This approach reduces the search time and avoids holding
of tasklist_lock for longer times.

A process exiting releases its pid to the freepid_list
if its pid matches with about creteria.

Let me know if it is worth implementing this.



~Hanumanthu

2002-09-09 15:04:09

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

> Thanks to Manfred, to clarfy me further that my patch doesn't
> work. But I am still in feeling that current pid allocation can
> be improved. If we assume that the atomicity problem with pid
> allocation and linking task struct into process list can be
> ignored (sorry Manfred), does it make any sense to maintain a
> list of `freed' pids ? For me (some of others too !!), it makes
> lot of sense to re-use pids of exited processes.
>
> If I am not wrong, an exiting process pid can be re-used (after
> the task dies) if its pid
>
> 1) not used as session ID (i.e p->session != p->pid)
> (or p->leader == 0)
> 2) not used as process group ID (p->pgrp != p->pid)
> 3) not used as thread group-ID (p->tgid != p->pid)

How do you track this? Can you keep a reference count for each
PID and score 1 for each of the above usages (and one for normal
use)? Or do you think it's better to just fall back to the current
system in the non-trivial case?

> Lets us maintain a list of atmost (PAGE_SIZE /sizeof(pid))
> entries for freed pids, protected by a spin_lock (freepid_lock)
> and let last_pid start from (300 + num_free_pids).

OK, well whilst you're at it, do you want make the free pid list
per CPU with no locking, and make it even more efficient? ;-)
We now have plenty of free PIDs to play with, and you could refill
the list with 100 entries or so if you had to fall back to scanning.
Actually, you could do that even if your current proposal doesn't
work out ....

> Let me know if it is worth implementing this.

Personally, I think it'd be great ... the current scanning doesn't
seem like an efficient algorithm at all.

M.

2002-09-09 22:35:36

by jw schultz

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

On Mon, Sep 09, 2002 at 08:07:05AM -0700, Martin J. Bligh wrote:
> > Thanks to Manfred, to clarfy me further that my patch doesn't
> > work. But I am still in feeling that current pid allocation can
> > be improved. If we assume that the atomicity problem with pid
> > allocation and linking task struct into process list can be
> > ignored (sorry Manfred), does it make any sense to maintain a
> > list of `freed' pids ? For me (some of others too !!), it makes
> > lot of sense to re-use pids of exited processes.
> >
> > If I am not wrong, an exiting process pid can be re-used (after
> > the task dies) if its pid
> >
> > 1) not used as session ID (i.e p->session != p->pid)
> > (or p->leader == 0)
> > 2) not used as process group ID (p->pgrp != p->pid)
> > 3) not used as thread group-ID (p->tgid != p->pid)
>
> How do you track this? Can you keep a reference count for each
> PID and score 1 for each of the above usages (and one for normal
> use)? Or do you think it's better to just fall back to the current
> system in the non-trivial case?
>
> > Lets us maintain a list of atmost (PAGE_SIZE /sizeof(pid))
> > entries for freed pids, protected by a spin_lock (freepid_lock)
> > and let last_pid start from (300 + num_free_pids).
>
> OK, well whilst you're at it, do you want make the free pid list
> per CPU with no locking, and make it even more efficient? ;-)
> We now have plenty of free PIDs to play with, and you could refill
> the list with 100 entries or so if you had to fall back to scanning.
> Actually, you could do that even if your current proposal doesn't
> work out ....
>
> > Let me know if it is worth implementing this.
>
> Personally, I think it'd be great ... the current scanning doesn't
> seem like an efficient algorithm at all.

I don't like the idea of reusing recent pids at all.

The current repeated scanning is awful. I've been mulling this over
in my mind for awhile now and i've had two thoughts.

If we insist on scanning the task list we should at least
reap multiple pids. Have an array of free pids managed
inside get_pid() that it pulls from until it runs out at which
point it would do the scan to refill the array + 1 for
immediate use.

Here is my better suggestion.

Dont destroy the task_struct of exited tasks that
are session, thread group or process group leaders
until last group member exits. Give them a status
so that they don't show up in /proc. Heck go ahead
and mark them as zombies if you want though i'd
rather another status.

Preserved the pid number sort order of the
prev_task/next_task list. It looks to me like this
is already the case.

get_pid() continues to preserve next_safe.

Also preserve a task_struct *last_safe which would be
set to the last pid created. release_task() whould update
it to p->prev_task if last_safe == p.

get_pid() would use *last_safe->next_task to walk
until it found a gap >= next_safe. (looping if it
hits PID_MAX)

So by adding a couple of bits of logic in wait() and keeping
around a relatively small number of task_structs for exited
processes we can eliminate the scan alltogether.

Defering the release_task() for group leaders eliminates the
need for rescanning and opens the possiblity of using
find_task_by_pid() instead of scanning as someone recently
suggested.

What have i missed? This seems much simpler than the other
aproaches i've seen proposed. It should also be much more
deterministic in behavior.

--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-09-10 09:49:42

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

On Mon, Sep 09, 2002 at 03:39:56PM -0700, jw schultz wrote:

> The current repeated scanning is awful.

I don't know what the problem is. The present scheme is very
efficient on the average (since the pid space is very large,
much larger than the number of processes, this scan is hardly
ever done). All proposed alternatives are clumsy, incorrect,
and much less efficient.

Andries

2002-09-10 19:24:40

by jw schultz

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

On Tue, Sep 10, 2002 at 11:54:23AM +0200, Andries Brouwer wrote:
> On Mon, Sep 09, 2002 at 03:39:56PM -0700, jw schultz wrote:
>
> > The current repeated scanning is awful.
>
> I don't know what the problem is. The present scheme is very
> efficient on the average (since the pid space is very large,
> much larger than the number of processes, this scan is hardly
> ever done). All proposed alternatives are clumsy, incorrect,
> and much less efficient.

The scan itself i don't mind. It is the rescan that bothers
me. That is bother as in almost, but not quite, an itch.
"Awful" is perhaps an overstatement. I agree the space
should be sparse enough to make rescans infrequent and i
have trouble imagining what these systems must be like that
the rescanning actually loops interminably. Increasing
PID_MAX should make the pid space sparse enough to make the
problem implausible.

I was just responding to the frequent discussions of
"get_pid hang" and when i looked at get_pid said "yuck".

One of the bigger problems i have with the scan is the very
idea that there are pids that cannot be used because they
are still referenced but don't exist. That sounds lax to
me. I can understand the advantage of it. The only place
it seems to impact is get_pid. Keeping the task structs
around would mean coping with them in several other places.
It does occur to me that there could be some advantage to
having these terminated session, thread and process group
leaders show up in ps.

Well, i've put in my $0.02 plus interest.


--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-09-11 08:38:15

by Hanumanthu. H

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...


>> I don't know what the problem is. The present scheme is very
>> efficient on the average (since the pid space is very large,
>> much larger than the number of processes, this scan is hardly
>> ever done)

> The scan itself i don't mind. It is the rescan that bothers me

And most of others too. One thing that strikes some minds
immediatly after looking at current pid allocation, is the need
for improvement. Well, even though the proposals are be clumsy,
in-efficient (really ?) we should not ignore the fact that this
is an area to improve. Ok, here is my final (more better) proposal
which fixes the atomicity problem addressed by ManFred.


Lets us have a structure to represent pid, session, pgrp and tgid.

struct idobject {
struct idobject *id_next;
struct idobject *id_prev;
int value;
atomic_t users;
task_t *taskp;
};

Rather than linking task_structs in pid_hash table, we maintain
these ID objects in pidhash table. So, remove pidhash link ptrs
from task struct and put them in idobject structure. The members
represent:

id_next & id_prev : links to hash next & prev entries in pidhash
table

value : the value represented by this object

users : number of users for this object. This counts
the number of references for this object.

taskp : task that uses this object as PID


And each task_struct will have the following pointers:


struct idobject *pidp; // ptr to pid object
struct idobject *sessionp; // ptr to session object
struct idobject *pgrpp; // ptr to pgrp object
struct idobject *tgidp; // ptr to thread grp object

(To avoid all changes at once, we can retain their non-ptr
versions i.e session, pid, pgrp and tgid members of task_struct).

A task acquire's pid by calling set_pid() function, giving its
task_struct so that pid assignment would be atomic (i,e to address
the issue raised by Manfred). set_pid() allocates an idobject
structure, assigns a free pid and sets the objp->taskp to the
given task. Based on the pid, it hashes the idobject structure
into pidhash table. All this by holding lastpid_lock only, so that
pid assignments won't be duplicated.

Upon acquiring a pid, pidp's users member will be incremented by 1.
Likewise sessionp, pgrpp and tgidp users also will be incremented
whenever this task establishes relationship with them.

When a process is exiting, it decrements `users' members of
these objects and if they become zero, it un-hashes the object
based on its value.

This design allows two things:

1. Atomicity in object allocation

2. Efficiency in pid allocation, because reduction in
search time and avoiding tasklist_lock.

If people does not bother in maintaining few more pointers,
we can achive an efficient way of tracking all members of
a session or process group.

Hmm...does this idea make any better sense ? If people have
any comments, please let me know.. If this is going to be
another clumsy, in-efficient idea, I will assure that I will
stop thinking about it :-)


Regards
Hanumanthu

2002-09-11 17:14:50

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

On Wed, Sep 11, 2002 at 02:29:45PM +0530, Hanumanthu. H wrote:

> >> I don't know what the problem is. The present scheme is very
> >> efficient on the average (since the pid space is very large,
> >> much larger than the number of processes, this scan is hardly
> >> ever done)
>
> > The scan itself i don't mind. It is the rescan that bothers me
>
> And most of others too. One thing that strikes some minds
> immediatly after looking at current pid allocation, is the need
> for improvement. Well, even though the proposals are be clumsy,
> in-efficient (really ?) we should not ignore the fact that this
> is an area to improve. Ok, here is my final (more better) proposal
> which fixes the atomicity problem addressed by ManFred.
>
>
> Lets us have a structure to represent pid, session, pgrp and tgid.
>
> struct idobject {
> struct idobject *id_next;
> struct idobject *id_prev;
> int value;
> atomic_t users;
> task_t *taskp;
> };

Again. We have 2^30 = 10^9 pids. In reality there are fewer than 10^4
processes. So once in 10^5 pid allocations do we make a scan over
these 10^4 processes, that is: for each pid allocation we look at
0.1 other processes. This 0.1 is a small number. As soon as you start
introducing structures that have to be updated for each fork or exit,
things become at least ten times as expensive as they are now.

Some polishing is possible in that code. I think I once gave a shorter
and more efficient version. The fragment "if(last_pid & ~PID_MASK);
last_pid = 300;" occurs twice, and the correct version has it only once.
The correct version does not have the "goto inside".

But, the code may only become smaller and more beautiful.
Large ugly code can be justified only by the need for efficiency,
and there is no such need here, and indeed, none of the proposals
made things more efficient. Once the number of processes gets
above 10^5 we can invent simpleminded schemes to make this
for_each_task faster.

Andries

2002-09-11 20:18:50

by jw schultz

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

On Wed, Sep 11, 2002 at 07:19:34PM +0200, Andries Brouwer wrote:
> > >> I don't know what the problem is. The present scheme is very
> > >> efficient on the average (since the pid space is very large,
> > >> much larger than the number of processes, this scan is hardly
> > >> ever done)
> >
> > > The scan itself i don't mind. It is the rescan that bothers me
>
> Again. We have 2^30 = 10^9 pids. In reality there are fewer than 10^4
> processes. So once in 10^5 pid allocations do we make a scan over
> these 10^4 processes, that is: for each pid allocation we look at
> 0.1 other processes. This 0.1 is a small number. As soon as you start
> introducing structures that have to be updated for each fork or exit,
> things become at least ten times as expensive as they are now.

Some clarification. The problems were triggered when
PID_MAX was 2^15. Linus has bumped it with a recomendation
of somthing on the order of 2^20 - 2^24 not 2^30
to allow for SSI clusters.

Once last_pid cylces we do a complete scan of the task list
testing four task_struct values for every free pid we get.
If the attempted pid is in use the scan will abort
(somewhere about half way through, perhaps less, on average)
and a new scan will be started. The increase of PID_MAX will
(when it takes effect) dramatically reduce the frequency of
collisons causing rescans but not eliminate them.

I am less certain than you that a little more structure
managment on fork and exit might not reduce the amount of
scanning we have to do. From what i see in sched there is
lot of task list scanning going on there as well. Structure
management is a fixed cost. The cost of scanning the entire
task list is linear.

> Some polishing is possible in that code. I think I once gave a shorter
> and more efficient version. The fragment "if(last_pid & ~PID_MASK);
> last_pid = 300;" occurs twice, and the correct version has it only once.
> The correct version does not have the "goto inside".
>
> But, the code may only become smaller and more beautiful.
> Large ugly code can be justified only by the need for efficiency,
> and there is no such need here, and indeed, none of the proposals
> made things more efficient. Once the number of processes gets
> above 10^5 we can invent simpleminded schemes to make this
> for_each_task faster.

Let's relax and see what comes out. Maybe someone will
suprise you. I trust Linus to reject patches that make
things worse especially if they haven't been vetted by a
leutenant.

--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-09-12 01:06:42

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

On Wed, 11 Sep 2002, Andries Brouwer wrote:

> Again. We have 2^30 = 10^9 pids.

It's on my TODO list for procps ;)

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-12 01:34:51

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

Andries Brouwer wrote:
>
> ...
> Again. We have 2^30 = 10^9 pids. In reality there are fewer than 10^4
> processes. So once in 10^5 pid allocations do we make a scan over
> these 10^4 processes,

Inside tasklist_lock? That's pretty bad from a latency point of
view. A significant number of users would take the slower common
case to avoid this.

2002-09-12 20:18:28

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

On Wed, Sep 11, 2002 at 06:54:47PM -0700, Andrew Morton wrote:
> Andries Brouwer wrote:
> >
> > ...
> > Again. We have 2^30 = 10^9 pids. In reality there are fewer than 10^4
> > processes. So once in 10^5 pid allocations do we make a scan over
> > these 10^4 processes,
>
> Inside tasklist_lock? That's pretty bad from a latency point of
> view. A significant number of users would take the slower common
> case to avoid this.

That would be unwise of all these users.
As soon as people have so many processes that this becomes a problem,
then instead of making things slower they should make things faster
still, at the cost of a little bit of extra code.

Similarly, people that need real-time guarantees will probably add
that extra code.

Now that things are ten thousand times better than they were very
recently I find it a bit surprising that people worry. But yes, when
needed it is very easy to come with further improvements.
Once people stand up and say that they need Linux machines with 10^6
processes, or with 10^4 processes and real time guarantees, then we
must have a discussion about data structures, and a discussion about
standards.

The standards part is this: what values are allowed for p->pgrp,
p->tgid, p->session? Can these be arbitrary numbers? Or can the
positive values among them be restricted to pid's?
If this restriction can be imposed we can be slightly more efficient.

But these future discussions must not be about get_pid() but about
task list handling, scheduling, sending signals, all places that
today have for_each_task(p) ...

Andries

2002-09-12 21:12:46

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

On Thu, 12 Sep 2002, Andries Brouwer wrote:

> Once people stand up and say that they need Linux machines with 10^6
> processes, or with 10^4 processes and real time guarantees, then we
> must have a discussion about data structures, and a discussion about
> standards.

IIRC that happened last month.

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-12 21:19:27

by Victor Yodaiken

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

On Thu, Sep 12, 2002 at 06:17:26PM -0300, Rik van Riel wrote:
> On Thu, 12 Sep 2002, Andries Brouwer wrote:
>
> > Once people stand up and say that they need Linux machines with 10^6
> > processes, or with 10^4 processes and real time guarantees, then we
> > must have a discussion about data structures, and a discussion about
> > standards.
>
> IIRC that happened last month.

And what about a discussion about reality?

--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com

2002-09-13 05:33:57

by Hanumanthu. H

[permalink] [raw]
Subject: Re: [PATCH] pid_max hang again...

> > > Again. We have 2^30 = 10^9 pids. In reality there are fewer than 10^4
> > > processes. So once in 10^5 pid allocations do we make a scan over
> > > these 10^4 processes,
> >
> > Inside tasklist_lock? That's pretty bad from a latency point of
> > view. A significant number of users would take the slower common
> > case to avoid this.
>
> That would be unwise of all these users.
> As soon as people have so many processes that this becomes a problem,
> then instead of making things slower they should make things faster
> still, at the cost of a little bit of extra code.

This was my initial impression. When Manfred clarified me that, a pid
can be in use (as session for example) even if the process corresponding
to that pid has died. The proposed approaches don't make things slower.
It is not just targeted towards get_pid() avoiding re-scan. One of the main
goals was to reduce contention for tasklist_lock.

> Similarly, people that need real-time guarantees will probably add
> that extra code.
> Now that things are ten thousand times better than they were very
> recently I find it a bit surprising that people worry. But yes, when
> needed it is very easy to come with further improvements.
> Once people stand up and say that they need Linux machines with 10^6
> processes, or with 10^4 processes and real time guarantees, then we
> must have a discussion about data structures, and a discussion about
> standards.

This doesn't sound good for me (I am sure for most of others too).

>
> The standards part is this: what values are allowed for p->pgrp,
> p->tgid, p->session? Can these be arbitrary numbers?

They can't be arbitrary numbers. POSIX has defined what p->pgrp
and p->session should be.

> But these future discussions must not be about get_pid() but about
> task list handling, scheduling, sending signals, all places that
> today have for_each_task(p) ...

Whatever approaches proposed don't stop at making get_pid() better.
They improve/change stuff a lot ofcourse !!

~Hanu