2002-03-05 14:50:46

by Hubertus Franke

[permalink] [raw]
Subject: Fwd: [Lse-tech] get_pid() performance fix


Can somebody post why this patch shouldn't be picked up ?
The attached program shows the problem in user space
and the patch is almost trivial ..

-- Hubertus

---------- Forwarded Message ----------

Subject: [Lse-tech] get_pid() performance fix
Date: Tue, 26 Feb 2002 18:58:51 -0500
From: "Rajan Ravindran" <[email protected]>
To: [email protected]
Cc: [email protected], [email protected]

Paul Larson posted a patch which fixes the get_pid() hang,
if we run out of available pids. Nevertheless, we have
observed that it takes a long time to find the next available
pid once the last pid reaches the PID_MAX. This is due to
a constant rescan of the task list while progressing only one
pid at time, yielding an O(n**2) problem.
Here is a patch (together with Paul's fix) which eliminates the
time taken to search for an available pid, by not constantly
restarting the search through the entire task list.

Attached is also a user level test program getpid.c),
by which one can simulate creation and deletion of processes.
This application will measure the time taken for the get_pid()
routine to find the next available process.

example:
getpid -p 32770 -n 3

which will try to create 32770 process, eventually get_pid can't
find any free pid after 32767, so it will delete 3 pids randomly
from the existing list and start calculating the time taken to
find the available pid (which we deleted).

In getpid.c the new fixes are inside the #define NEW_METHOD.
Try compiling the getpid.c with and without -DNEW_METHOD compile
option to see the performance improvement.

here is an example output for the old and the new algorithm and
their respective execution time to find a new pid.

(See attached file: output)

This can/should be applied to 2.5 and 2.4 kernels.


(See attached file: getpid.c)

Thanks,
Rajan


diff -Naur linux-2.5.5/kernel/fork.c linux-2.5.5-new/kernel/fork.c
--- linux-2.5.5/kernel/fork.c Tue Feb 19 21:10:55 2002
+++ linux-2.5.5-new/kernel/fork.c Tue Feb 26 15:46:36 2002
@@ -24,6 +24,7 @@
#include <linux/file.h>
#include <linux/binfmts.h>
#include <linux/fs.h>
+#include <linux/compiler.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -129,12 +130,13 @@
{
static int next_safe = PID_MAX;
struct task_struct *p;
- int pid;
+ int pid, beginpid;

if (flags & CLONE_PID)
return current->pid;

spin_lock(&lastpid_lock);
+ beginpid = last_pid;
if((++last_pid) & 0xffff8000) {
last_pid = 300; /* Skip daemons etc. */
goto inside;
@@ -153,13 +155,18 @@
if(last_pid & 0xffff8000)
last_pid = 300;
next_safe = PID_MAX;
+ goto repeat;
}
- goto repeat;
+ if(unlikely(last_pid == beginpid))
+ goto nomorepids;
+ continue;
}
if(p->pid > last_pid && next_safe > p->pid)
next_safe = p->pid;
if(p->pgrp > last_pid && next_safe > p->pgrp)
next_safe = p->pgrp;
+ if(p->tgid > last_pid && next_safe > p->tgid)
+ next_safe = p->tgid;
if(p->session > last_pid && next_safe > p->session)
next_safe = p->session;
}
@@ -169,6 +176,11 @@
spin_unlock(&lastpid_lock);

return pid;
+
+nomorepids:
+ read_unlock(&tasklist_lock);
+ spin_unlock(&lastpid_lock);
+ return 0;
}

static inline int dup_mmap(struct mm_struct * mm)
@@ -667,6 +679,8 @@

copy_flags(clone_flags, p);
p->pid = get_pid(clone_flags);
+ if (p->pid == 0 && current->pid != 0)
+ goto bad_fork_cleanup;

INIT_LIST_HEAD(&p->run_list);

-------------------------------------------------------



--
-- Hubertus Franke ([email protected])


Attachments:
output (0.99 kB)
getpid.c (4.50 kB)
Download all attachments

2002-03-05 16:27:20

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

Hubertus Franke <[email protected]> writes:

> @@ -153,13 +155,18 @@
> if(last_pid & 0xffff8000)
> last_pid = 300;
> next_safe = PID_MAX;
> + goto repeat;
> }
> - goto repeat;
> + if(unlikely(last_pid == beginpid))
> + goto nomorepids;
> + continue;

It isn't guaranteed that pid is unique.

In the case:
task->pid = 300, task->xxx = 301
pid 301 is free

This get_pid() returns 301.

Regards.
--
OGAWA Hirofumi <[email protected]>

2002-03-05 16:43:20

by Hubertus Franke

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Tuesday 05 March 2002 11:26 am, OGAWA Hirofumi wrote:
> Hubertus Franke <[email protected]> writes:
> > @@ -153,13 +155,18 @@
> > if(last_pid & 0xffff8000)
> > last_pid = 300;
> > next_safe = PID_MAX;
> > + goto repeat;
> > }
> > - goto repeat;
> > + if(unlikely(last_pid == beginpid))
> > + goto nomorepids;
> > + continue;
>
> It isn't guaranteed that pid is unique.
>
> In the case:
> task->pid = 300, task->xxx = 301
> pid 301 is free
>
> This get_pid() returns 301.
>
> Regards.

No the point of this patch was to limit the search time for finding
the next available pid. We are not mocking around with the
logic that declares a pid available or not. That stays the same.
However, one doesn't need to start every single time from the
beginning to find the next available pid.

--
-- Hubertus Franke ([email protected])

2002-03-05 16:44:10

by Rajan Ravindran

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix



Yes, pid's are guaranteed to be unique.
Here the problem we focused is the time taken in finding the next
available free pid.
I really don't mean by your task->xxx.

-Rajan



Hubertus Franke <[email protected]> writes:

> @@ -153,13 +155,18 @@
> if(last_pid & 0xffff8000)
> last_pid = 300;
> next_safe = PID_MAX;
> + goto repeat;
> }
> - goto repeat;
> + if(unlikely(last_pid == beginpid))
> + goto nomorepids;
> + continue;

It isn't guaranteed that pid is unique.

In the case:
task->pid = 300, task->xxx = 301
pid 301 is free

This get_pid() returns 301.

Regards.
--
OGAWA Hirofumi <[email protected]>

_______________________________________________
Lse-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lse-tech




2002-03-05 17:58:22

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

"Rajan Ravindran" <[email protected]> writes:

> Yes, pid's are guaranteed to be unique.
> Here the problem we focused is the time taken in finding the next
> available free pid.
> I really don't mean by your task->xxx.

I'm confused.

I said:
task { pid = 300, pgrp = 301, };
301 is free;

get_pid() returns 301.

"task 301" can't call setsid(). pid 301 is available?
--
OGAWA Hirofumi <[email protected]>

2002-03-05 19:52:47

by Hubertus Franke

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Tuesday 05 March 2002 12:57 pm, OGAWA Hirofumi wrote:
> "Rajan Ravindran" <[email protected]> writes:
> > Yes, pid's are guaranteed to be unique.
> > Here the problem we focused is the time taken in finding the next
> > available free pid.
> > I really don't mean by your task->xxx.
>
> I'm confused.
>

yes you are .....

> I said:
> task { pid = 300, pgrp = 301, };
> 301 is free;
>
> get_pid() returns 301.
>
> "task 301" can't call setsid(). pid 301 is available?

The original code is/was:

if(p->pid == last_pid ||
p->pgrp == last_pid ||
p->tgid == last_pid ||
p->session == last_pid) {
if(++last_pid >= next_safe) {
if(last_pid & 0xffff8000)
last_pid = 300;
next_safe = PID_MAX;
}
goto repeat;
}

if any process holds the pgrp=301 as in your case, 301 won't be eligible due
to (p->pgrp == last_pid) check.

--
-- Hubertus Franke ([email protected])

2002-03-05 20:11:30

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

Hubertus Franke <[email protected]> writes:

> > I said:
> > task { pid = 300, pgrp = 301, };
> > 301 is free;
> >
> > get_pid() returns 301.
> >
> > "task 301" can't call setsid(). pid 301 is available?
>
> The original code is/was:
>
> if(p->pid == last_pid ||
> p->pgrp == last_pid ||
> p->tgid == last_pid ||
> p->session == last_pid) {
> if(++last_pid >= next_safe) {
> if(last_pid & 0xffff8000)
> last_pid = 300;
> next_safe = PID_MAX;
> }
> goto repeat;
> }
>
> if any process holds the pgrp=301 as in your case, 301 won't be eligible due
> to (p->pgrp == last_pid) check.

I know.

> @@ -153,13 +155,18 @@
> if(last_pid & 0xffff8000)
> last_pid = 300;
> next_safe = PID_MAX;
> + goto repeat;
> }
> - goto repeat;
> + if(unlikely(last_pid == beginpid))
> + goto nomorepids;
> + continue;

You changed it. No?
--
OGAWA Hirofumi <[email protected]>

2002-03-05 21:58:53

by Hubertus Franke

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Tuesday 05 March 2002 03:10 pm, OGAWA Hirofumi wrote:
> Hubertus Franke <[email protected]> writes:
> > > I said:
> > > task { pid = 300, pgrp = 301, };
> > > 301 is free;
> > >
> > > get_pid() returns 301.
> > >
> > > "task 301" can't call setsid(). pid 301 is available?
> >
> > The original code is/was:
> >
> > if(p->pid == last_pid ||
> > p->pgrp == last_pid ||
> > p->tgid == last_pid ||
> > p->session == last_pid) {
> > if(++last_pid >= next_safe) {
> > if(last_pid & 0xffff8000)
> > last_pid = 300;
> > next_safe = PID_MAX;
> > }
> > goto repeat;
> > }
> >
> > if any process holds the pgrp=301 as in your case, 301 won't be eligible
> > due to (p->pgrp == last_pid) check.
>
> I know.
>
> > @@ -153,13 +155,18 @@
> > if(last_pid & 0xffff8000)
> > last_pid = 300;
> > next_safe = PID_MAX;
> > + goto repeat;
> > }
> > - goto repeat;
> > + if(unlikely(last_pid == beginpid))
> > + goto nomorepids;
> > + continue;
>
> You changed it. No?

Yes, we changed but only the logic that once a pid is busy we start searching
for every task again. This is exactly the O(n**2) problem.
Run the program and you'll see.

--
-- Hubertus Franke ([email protected])

2002-03-05 23:40:38

by Hubertus Franke

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Tuesday 05 March 2002 05:48 pm, OGAWA Hirofumi wrote:
> Hubertus Franke <[email protected]> writes:
> > > > @@ -153,13 +155,18 @@
> > > > if(last_pid & 0xffff8000)
> > > > last_pid = 300;
> > > > next_safe = PID_MAX;
> > > > + goto repeat;
> > > > }
> > > > - goto repeat;
> > > > + if(unlikely(last_pid == beginpid))
> > > > + goto nomorepids;
> > > > + continue;
> > >
> > > You changed it. No?
> >
> > Yes, we changed but only the logic that once a pid is busy we start
> > searching for every task again. This is exactly the O(n**2) problem.
> > Run the program and you'll see.
>
> Run the attached file.
>
> Result,
> new pid: 301, 300: pid 300, pgrp 301
>
> new pid == task(300)->pgrp. This get_pid() has bug.
> I'm missing something?

Yipp you are right.
I stay corrected, we are missing something. Will work on it and see
whether we can correct it, either way we should be able to
get ride of the o(n**2) effect.

--
-- Hubertus Franke ([email protected])

2002-03-07 03:53:34

by Rusty Russell

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Mon, 4 Mar 2002 20:57:49 -0500
Hubertus Franke <[email protected]> wrote:

>
> Can somebody post why this patch shouldn't be picked up ?
> The attached program shows the problem in user space
> and the patch is almost trivial ..

At a cursory glance, this seems to be three patches:
1) Fix the get_pid() hang.
2) Speed up get_pid().
3) And this piece I'm not sure about:
> + if(p->tgid > last_pid && next_safe > p->tgid)
> + next_safe = p->tgid;

Please split, and send the fix get_pid() hang to trivial patch monkey,
and push optimization to Linus.

Cheers!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-03-07 14:34:28

by Hubertus Franke

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Wednesday 06 March 2002 10:56 pm, Rusty Russell wrote:
> On Mon, 4 Mar 2002 20:57:49 -0500
>
> Hubertus Franke <[email protected]> wrote:
> > Can somebody post why this patch shouldn't be picked up ?
> > The attached program shows the problem in user space
> > and the patch is almost trivial ..
>
> At a cursory glance, this seems to be three patches:
> 1) Fix the get_pid() hang.
> 2) Speed up get_pid().
>
> 3) And this piece I'm not sure about:
> > + if(p->tgid > last_pid && next_safe > p->tgid)
> > + next_safe = p->tgid;
>
> Please split, and send the fix get_pid() hang to trivial patch monkey,
> and push optimization to Linus.
>
> Cheers!
> Rusty.

Thanks, patch was bad and not properly functioning as pointed out to us.
We are rewriting right now (actually <[email protected]>
is doing the coding. I am just there for idea bouncing
easy if the office is 2 doors away.

1) was done by Greg Larson and was already submitted
2) once properly done, we will circulate before bothering Linus again
3) this must have come in because of a wrong patch generation.

--
-- Hubertus Franke ([email protected])

2002-03-07 14:55:28

by Guest section DW

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Thu, Mar 07, 2002 at 09:35:09AM -0500, Hubertus Franke wrote:
...

Long ago I submitted a patch that changed the max pid from 15 bits to
24 or 30 bits or so. (And of course removed the inefficiency noticed
by some people in the current thread.)
Probably this is a good moment to try and see what Linus thinks
about this today.

[Of course Albert Cahalan will object that this is bad for the columns
of ps output. Maybe Alan will mutter something about sysvipc.
Roughly speaking there are only advantages, especially since
I think we'll have to do this sooner or later, and in such cases
sooner is better.]

2002-03-07 15:23:35

by Dave McCracken

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix



--On Thursday, March 07, 2002 09:35:09 AM -0500 Hubertus Franke
<[email protected]> wrote:

>> At a cursory glance, this seems to be three patches:
>> 1) Fix the get_pid() hang.
>> 2) Speed up get_pid().
>>
>> 3) And this piece I'm not sure about:
>> > + if(p->tgid > last_pid && next_safe > p->tgid)
>> > + next_safe = p->tgid;

> 1) was done by Greg Larson and was already submitted
> 2) once properly done, we will circulate before bothering Linus again
> 3) this must have come in because of a wrong patch generation.

Actually that was Paul Larson, but close enough :)

3) is actually a separate bugfix. I had a patch accepted some time back
that added the check for tgid, but missed adding it to the section below.
Paul caught it when he did his patch and added it for me.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2002-03-07 19:07:49

by Hubertus Franke

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Thursday 07 March 2002 09:54 am, Guest section DW wrote:
> On Thu, Mar 07, 2002 at 09:35:09AM -0500, Hubertus Franke wrote:
> ...
>
> Long ago I submitted a patch that changed the max pid from 15 bits to
> 24 or 30 bits or so. (And of course removed the inefficiency noticed
> by some people in the current thread.)
> Probably this is a good moment to try and see what Linus thinks
> about this today.
>
> [Of course Albert Cahalan will object that this is bad for the columns
> of ps output. Maybe Alan will mutter something about sysvipc.
> Roughly speaking there are only advantages, especially since
> I think we'll have to do this sooner or later, and in such cases
> sooner is better.]

I don't think that will solve the N^2 problem you still have the algorithm
do the following:

if (++last_pid > next_safe) {
repeat:
last_pid++;
foralltasks p:
deal with wraparound;
if (p uses last_pid) goto repeat
determine next_safe
}
[ last_pid ... next_safe ) is the range that can be saftely used

By extending it to 24 or larger bits all you do is handle the wraparound
at some later point and less frequent It also becomes expensive if a large
number of threads is present.
Well, the problem can be fixed. Even in the current 16 bit approach.
What we are experimenting around
is a <mark-and-sweep> algorithm that would be invoked if the nr_threads is
above a threshold.
The algorithm would do something like this. Rajan will code it up and see
its efficacy.

if (last_pid >= next_safe) {
inside:
if (nr_threads > threshold) { // constant
last_pid = get_pid_map(last_pid,&next_safe);
} else {
.. <as now>
}
}

static unsigned long pid_map[1<<12];

/* determine a range of pids that is available for sure
* [ last_pid .. next_safe )
* pid_map has stale information. some pids might be marked
* as used even if they had been freed in the meantime
*/

int get_pid_map(int last_pid, int *next_safe)
{
int again = 1;
repeat:
for_each_task(p)
mark_pids_in_bitmap;
last_pid = ffz(pid_map); /* we will start from last_pid with wraparound */
if ((last_pid == -1) && (again)) {
again = 0;
memset(pid_map,0);
goto repeat
}
}
next_safe = first_non_zero(pid_map,last_pid); /* starting from last_pid */
return last_pid;
}



Note, if the pid map is to large, it can be done in smaller sections or
windows. Also, note keeping stale information is OK actually desirable, as
it avoids the sweep in almost all cases.

--
-- Hubertus Franke ([email protected])

2002-03-07 19:44:55

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Thu, 7 Mar 2002, Hubertus Franke wrote:

> On Thursday 07 March 2002 09:54 am, Guest section DW wrote:
> > On Thu, Mar 07, 2002 at 09:35:09AM -0500, Hubertus Franke wrote:
> > ...
> >
> > Long ago I submitted a patch that changed the max pid from 15 bits to
> > 24 or 30 bits or so. (And of course removed the inefficiency noticed
> > by some people in the current thread.)
> > Probably this is a good moment to try and see what Linus thinks
> > about this today.
> >
> > [Of course Albert Cahalan will object that this is bad for the columns
> > of ps output. Maybe Alan will mutter something about sysvipc.
> > Roughly speaking there are only advantages, especially since
> > I think we'll have to do this sooner or later, and in such cases
> > sooner is better.]
>
> I don't think that will solve the N^2 problem you still have the algorithm
> do the following:
>
> if (++last_pid > next_safe) {
> repeat:
> last_pid++;
> foralltasks p:
> deal with wraparound;
> if (p uses last_pid) goto repeat
> determine next_safe
> }
> [ last_pid ... next_safe ) is the range that can be saftely used
>
> By extending it to 24 or larger bits all you do is handle the wraparound
> at some later point and less frequent It also becomes expensive if a large
> number of threads is present.
> Well, the problem can be fixed. Even in the current 16 bit approach.
> What we are experimenting around
> is a <mark-and-sweep> algorithm that would be invoked if the nr_threads is
> above a threshold.
> The algorithm would do something like this. Rajan will code it up and see
> its efficacy.
>
> if (last_pid >= next_safe) {
> inside:
> if (nr_threads > threshold) { // constant
> last_pid = get_pid_map(last_pid,&next_safe);
> } else {
> .. <as now>
> }
> }
>
> static unsigned long pid_map[1<<12];
>
> /* determine a range of pids that is available for sure
> * [ last_pid .. next_safe )
> * pid_map has stale information. some pids might be marked
> * as used even if they had been freed in the meantime
> */
>
> int get_pid_map(int last_pid, int *next_safe)
> {
> int again = 1;
> repeat:
> for_each_task(p)
> mark_pids_in_bitmap;
> last_pid = ffz(pid_map); /* we will start from last_pid with wraparound */
> if ((last_pid == -1) && (again)) {
> again = 0;
> memset(pid_map,0);
> goto repeat
> }
> }
> next_safe = first_non_zero(pid_map,last_pid); /* starting from last_pid */
> return last_pid;
> }
>
>
>
> Note, if the pid map is to large, it can be done in smaller sections or
> windows. Also, note keeping stale information is OK actually desirable, as
> it avoids the sweep in almost all cases.
>
> --
> -- Hubertus Franke ([email protected])
> -

If security issues were not a concern, you save the last PID, freed at
_exit() and use that immediately. If it's already used, it's zeroed.
exit() just stuffs the exit() PID into the variable, overwriting any
previous, including zero. It needs to be handled under a lock, but
it gets a quick return on investment for the usual fork() exec() stuff
that interactive users use.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (799.53 BogoMips).

Bill Gates? Who?

2002-03-07 19:46:35

by Guest section DW

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Thu, Mar 07, 2002 at 02:07:38PM -0500, Hubertus Franke wrote:
> On Thursday 07 March 2002 09:54 am, Guest section DW wrote:
> > On Thu, Mar 07, 2002 at 09:35:09AM -0500, Hubertus Franke wrote:
> > ...
> >
> > Long ago I submitted a patch that changed the max pid from 15 bits to
> > 24 or 30 bits or so. (And of course removed the inefficiency noticed
> > by some people in the current thread.)

> I don't think that will solve the N^2 problem

Do you understand "inefficiency"? And "remove"?

2002-03-07 22:37:48

by Manfred Spraul

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

--- 2.5/ipc/sem.c Sun Mar 3 12:04:00 2002
+++ build-2.5/ipc/sem.c Thu Mar 7 23:33:02 2002
@@ -251,39 +251,39 @@
for (sop = sops; sop < sops + nsops; sop++) {
curr = sma->sem_base + sop->sem_num;
sem_op = sop->sem_op;
-
- if (!sem_op && curr->semval)
+ result = curr->semval;
+
+ if (!sem_op && result)
goto would_block;

- curr->sempid = (curr->sempid << 16) | pid;
- curr->semval += sem_op;
- if (sop->sem_flg & SEM_UNDO)
- {
+ result += sem_op;
+ if (result < 0)
+ goto would_block;
+ if (result > SEMVMX)
+ goto out_of_range;
+ if (sop->sem_flg & SEM_UNDO) {
int undo = un->semadj[sop->sem_num] - sem_op;
/*
* Exceeding the undo range is an error.
*/
if (undo < (-SEMAEM - 1) || undo > SEMAEM)
- {
- /* Don't undo the undo */
- sop->sem_flg &= ~SEM_UNDO;
goto out_of_range;
- }
- un->semadj[sop->sem_num] = undo;
}
- if (curr->semval < 0)
- goto would_block;
- if (curr->semval > SEMVMX)
- goto out_of_range;
+ curr->semval = result;
}

- if (do_undo)
- {
- sop--;
+ if (do_undo) {
result = 0;
goto undo;
}
-
+ sop--;
+ while (sop >= sops) {
+ sma->sem_base[sop->sem_num].sempid = pid;
+ if (sop->sem_flg & SEM_UNDO)
+ un->semadj[sop->sem_num] -= sop->sem_op;
+ sop--;
+ }
+
sma->sem_otime = CURRENT_TIME;
return 0;

@@ -298,13 +298,9 @@
result = 1;

undo:
+ sop--;
while (sop >= sops) {
- curr = sma->sem_base + sop->sem_num;
- curr->semval -= sop->sem_op;
- curr->sempid >>= 16;
-
- if (sop->sem_flg & SEM_UNDO)
- un->semadj[sop->sem_num] += sop->sem_op;
+ sma->sem_base[sop->sem_num].semval -= sop->sem_op;
sop--;
}

@@ -624,7 +620,7 @@
err = curr->semval;
goto out_unlock;
case GETPID:
- err = curr->sempid & 0xffff;
+ err = curr->sempid;
goto out_unlock;
case GETNCNT:
err = count_semncnt(sma,semnum);


Attachments:
patch-sem (1.81 kB)

2002-03-07 23:13:32

by Hubertus Franke

[permalink] [raw]
Subject: Re: Fwd: [Lse-tech] get_pid() performance fix

On Thursday 07 March 2002 02:46 pm, Guest section DW wrote:
> On Thu, Mar 07, 2002 at 02:07:38PM -0500, Hubertus Franke wrote:
> > On Thursday 07 March 2002 09:54 am, Guest section DW wrote:
> > > On Thu, Mar 07, 2002 at 09:35:09AM -0500, Hubertus Franke wrote:
> > > ...
> > >
> > > Long ago I submitted a patch that changed the max pid from 15 bits to
> > > 24 or 30 bits or so. (And of course removed the inefficiency noticed
> > > by some people in the current thread.)
> >
> > I don't think that will solve the N^2 problem
>
> Do you understand "inefficiency"? And "remove"?


I do, what's your point ?
I haven't seen your patch picked up....

I do understand that when you occasionally get into this scenario
of the loop that particularly for large thread counts this stalls the system
for seconds (say 30, with the tasklock held).


--
-- Hubertus Franke ([email protected])

2002-03-14 23:18:22

by Hubertus Franke

[permalink] [raw]
Subject: [PATCH] get_pid() performance fix

I implemented an alternative version of getpid, that for large thread counts
( > 210000), provides "significantly" better performance as shown in attached
performance analysis. This is particulary viable for PID_MAX=32768.

-- Hubertus Franke <[email protected]>

---------------------------------------------------------------------------------

Currently the getpid algorithm works as follows:
At any given time an interval of [ last_pid .. next_safe ) is known to hold

unused pids. Initially the interval is set to [0 .. 32767]

Hence to allocate a new pid the following is sufficient:

if (++last_pid < next_safe) return last_pid;

However, if we move out of the safe interval, the next safe interval needs
to be established first.
This is currently done by a repetive search

repeat:
foralltasks(p) {
if (p uses lastpid) { last_pid++; goto repeat; }
/* narrow [ last_pid .. next_safe ) */
if (p->pids in [ last_pid .. next_safe ) ) next_safe = p->pid
}

Particulary for large number of tasks, this can lead to frequent exercise of
the repeat resulting in a O(N^2) algorithm. We call this : <algo-0>.

Instead I have provided an alternative mechanism that at the time
of determining the next interval marks a bitmask by walking the tasklist
once [ O(N) ] and then finding the proper bit offsets to mark the next free
interval starting from last_pid. The bitmap requires 4096 bytes.
This is <algo-1>.

An optimization to this to keep the last bitmap instead of clearing it
with every search. Only if we fail to obtain a free range, then we have
to go back and clear the bitmap and redo the search one more time.
This is <algo-2>.

I dragged the various algorithms into a userlevel test program to figure
out where the cut off points are with PID_MAX=32768. In this testprogram
I maintain A tasks, and for 10 rounds (delete D random tasks and
reallocate D tasks again) resulting in T=10*D total measured allocations.

Si states how many interval searches where needed for algo-i.
Gi states the average overhead per get_pid for algo-i in usecs.

Based on that one should use the current algorithm until ~ 22K tasks and
beyond that use algo-2. Only the last 15 tasks are a bit faster under algo-1.
We can safely ignore that case.

Based on that providing an adaptive implementation seems the right choice.
The patch for 2.5.7-pre1 is attached.

testprogram is attached:
executed program example: getpid -c 2 -s 10 -e 100 -D 10 -t <0,1>
0 is old 1 is new algo 2.

A D T | S0 G0 | S1 G1 | S2 G2
----------------------------------------------------------------------------
10 10 80 | 1 0.34 | 1 0.59 | 1 0.81
20 10 100 | 1 0.30 | 1 0.49 | 1 0.64
30 10 100 | 1 0.29 | 1 0.55 | 1 0.65
40 10 100 | 1 0.35 | 1 0.51 | 1 0.65
50 10 100 | 1 0.35 | 1 0.54 | 1 0.67
60 10 100 | 2 0.38 | 21 1.95 | 2 0.79
70 10 100 | 1 0.39 | 1 0.59 | 1 0.76
80 10 100 | 1 0.41 | 1 0.62 | 1 0.76
100 50 500 | 2 0.22 | 63 1.26 | 2 0.30
150 50 500 | 3 0.24 | 12 0.56 | 4 0.36
200 50 500 | 3 0.27 | 56 2.26 | 5 0.46
250 50 500 | 2 0.26 | 119 5.63 | 6 0.54
300 50 500 | 3 0.32 | 148 8.73 | 9 0.76
350 50 500 | 5 0.45 | 168 11.51 | 6 0.76
400 50 500 | 4 0.44 | 90 7.28 | 10 1.10
450 50 500 | 6 0.61 | 143 13.08 | 7 0.97
500 50 500 | 6 0.65 | 100 10.47 | 7 1.06
550 50 500 | 5 0.63 | 71 8.10 | 9 1.34
600 50 500 | 7 0.86 | 115 14.32 | 14 2.04
650 50 500 | 8 1.00 | 112 15.08 | 13 2.07
700 50 500 | 8 1.06 | 127 18.12 | 10 1.79
750 50 500 | 8 1.26 | 62 9.73 | 15 2.73
800 50 500 | 11 1.68 | 92 15.14 | 12 2.42
850 50 500 | 14 2.03 | 78 13.73 | 13 2.67
900 50 500 | 21 3.17 | 102 18.74 | 27 5.18
1000 1000 9980 | 1 0.18 | 4 0.19 | 1 0.18
2000 1000 10000 | 76 1.22 | 3604 53.03 | 322 4.81
3000 1000 10000 | 161 3.84 | 4502 112.24 | 606 15.49
4000 1000 10000 | 359 11.17 | 4912 183.37 | 901 33.76
5000 1000 10000 | 539 23.33 | 4949 257.35 | 1165 59.91
6000 1000 10000 | 724 43.42 | 4918 349.59 | 1498 104.36
7000 1000 10000 | 1026 85.38 | 4886 447.58 | 1835 165.08
8000 1000 10000 | 1228 126.45 | 4870 565.29 | 2084 234.73
9000 1000 10000 | 1516 193.62 | 4826 699.85 | 2489 354.27
10000 1000 10000 | 1818 289.32 | 4910 843.32 | 2763 472.47
11000 1000 10000 | 2093 389.33 | 5005 1023.08 | 3095 629.70
12000 1000 10000 | 2305 506.23 | 5095 1194.71 | 3277 773.06
13000 1000 10000 | 2680 683.66 | 5289 1424.81 | 3711 1003.67
14000 1000 10000 | 2959 853.10 | 5358 1602.05 | 3878 1172.70
15000 1000 10000 | 3167 1037.79 | 5539 1835.64 | 4301 1436.40
16000 1000 10000 | 3466 1272.80 | 5669 2087.03 | 4485 1635.48
17000 1000 10000 | 3743 1539.06 | 5932 2338.50 | 4844 1924.27
18000 1000 10000 | 4069 1869.63 | 6097 2613.60 | 5218 2232.52
19000 1000 10000 | 4293 2183.98 | 6242 2866.34 | 5501 2519.60
20000 1000 10000 | 4616 2607.10 | 6508 3175.90 | 5770 2823.98
21000 1000 10000 | 4974 3119.34 | 6700 3460.95 | 6161 3183.73
22000 1000 10000 | 5177 3609.28 | 6944 3788.19 | 6389 3492.97
23000 1000 10000 | 5483 4214.03 | 7183 4136.25 | 6665 3823.38
24000 1000 10000 | 5838 4971.60 | 7404 4460.62 | 6982 4199.61
25000 1000 10000 | 6183 5880.92 | 7736 4891.80 | 7209 4546.18
26000 1000 10000 | 6413 6829.07 | 7890 5210.85 | 7533 4939.12
27000 1000 10000 | 6748 8132.96 | 8148 5598.19 | 7959 5442.25
28000 1000 10000 | 7139 10065.52 | 8445 6047.42 | 8140 5767.13
29000 1000 10000 | 7638 12967.20 | 8736 6475.23 | 8501 6250.86
30000 1000 10000 | 8178 16991.05 | 8994 6907.40 | 8911 6791.97
32000 50 500 | 482 26446.69 | 488 7405.63 | 487 7494.39
32050 50 500 | 488 34769.89 | 488 7463.11 | 486 7541.61
32100 50 500 | 489 44564.86 | 493 7593.99 | 486 7589.02
32150 50 500 | 486 58150.58 | 487 7549.96 | 492 7731.18
32200 50 500 | 490 64875.38 | 495 7721.82 | 497 7854.59
32250 50 500 | 491 81790.21 | 491 7697.57 | 490 7795.12
32300 50 500 | 489 88975.62 | 493 7763.04 | 495 7909.77
32350 50 500 | 489 115797.38 | 492 7782.34 | 495 7967.86
32400 50 500 | 490 120958.50 | 497 7898.45 | 496 8018.98
32450 50 500 | 492 147541.84 | 493 7874.27 | 492 7982.34
32500 50 500 | 493 175498.39 | 495 7940.18 | 495 8060.97
32550 50 500 | 492 207229.29 | 496 7973.88 | 498 8134.02
32600 50 500 | 495 267057.05 | 498 8028.86 | 498 8171.97
32650 50 500 | 492 375722.28 | 500 8088.30 | 498 8213.85
32700 50 500 | 497 528321.07 | 500 8110.51 | 499 8267.67
32751 1 10 | 10 259785.80 | 10 7851.50 | 10 8549.30
32752 1 10 | 10 1121285.60 | 10 7846.30 | 10 8556.10
32753 1 10 | 10 383729.50 | 10 7848.60 | 10 8562.20
32754 1 10 | 10 1061467.50 | 10 7849.80 | 10 8564.40
32755 1 10 | 10 612726.50 | 10 7853.00 | 10 8553.90
32756 1 10 | 10 1725559.90 | 10 7851.90 | 10 8553.00
32757 1 10 | 10 1679818.50 | 10 7847.80 | 10 8552.10
32758 1 10 | 10 2991838.60 | 10 7865.70 | 10 8557.20
32759 1 10 | 10 883388.90 | 10 7859.40 | 10 8562.00
32760 1 10 | 10 4830405.90 | 10 7850.50 | 10 9336.60
32761 1 10 | 10 7105809.20 | 10 7863.90 | 10 9337.20
32762 1 10 | 10 7919703.40 | 10 7867.10 | 10 9340.70
32763 1 10 | 10 1537522.50 | 10 7869.40 | 10 9340.70
32764 1 10 | 10 6173019.20 | 10 7866.60 | 10 9340.00
32765 1 10 | 10 8104105.00 | 10 7876.20 | 10 10112.80
32766 1 10 | 10 16145415.40 | 10 7880.80 | 10 10893.50
32767 1 10 | 10 16135267.10 | 10 7878.60 | 10 11674.40


Other variants are possible, for instance if 4096 bytes is too much
(hell I don't know how that could be), one can break it up into smaller
search chunks (e.g. 256 bytes).

Another alternative is to allocate the page on the first occasion of
getting into get_pid_my_map....

In the following I give a comparative result between algo-2 and
algo-2 with a max interval size of 256. The times are very comparative.
Also the search count values are identical, but in 2 cases suggesting
that a interval size particular for large thread counts of 256 is certainly
sufficient, but it brings some small overhead. Question to answer is
wether setting aside an extra page is such a crime.....

A D T | S2 G2 | S2-256 G2-256
-------------------------------------------------------
10 10 80 | 1 0.81 | 1 0.84
20 10 100 | 1 0.64 | 1 0.67
30 10 100 | 1 0.65 | 1 0.68
40 10 100 | 1 0.65 | 1 0.69
50 10 100 | 1 0.67 | 1 0.71
60 10 100 | 2 0.79 | 2 0.82
70 10 100 | 1 0.76 | 1 0.76
80 10 100 | 1 0.76 | 1 0.79
100 50 500 | 2 0.30 | 2 0.31
150 50 500 | 4 0.36 | 5 0.39 <=
200 50 500 | 5 0.46 | 5 0.46
250 50 500 | 6 0.54 | 6 0.55
300 50 500 | 9 0.76 | 9 0.76
350 50 500 | 6 0.76 | 6 0.75
400 50 500 | 10 1.10 | 10 1.10
450 50 500 | 7 0.97 | 7 0.97
500 50 500 | 7 1.06 | 7 1.06
550 50 500 | 9 1.34 | 9 1.35
600 50 500 | 14 2.04 | 14 2.06
650 50 500 | 13 2.07 | 13 2.09
700 50 500 | 10 1.79 | 10 1.82
750 50 500 | 15 2.73 | 15 2.69
800 50 500 | 12 2.42 | 12 2.38
850 50 500 | 13 2.67 | 13 2.66
900 50 500 | 27 5.18 | 27 5.25
1000 1000 9980 | 1 0.18 | 3 0.19 <=
2000 1000 10000 | 322 4.81 | 322 4.84
3000 1000 10000 | 606 15.49 | 606 15.55
4000 1000 10000 | 901 33.76 | 901 34.42
5000 1000 10000 | 1165 59.91 | 1165 62.35
6000 1000 10000 | 1498 104.36 | 1498 105.55
7000 1000 10000 | 1835 165.08 | 1835 174.82
8000 1000 10000 | 2084 234.73 | 2084 244.18
9000 1000 10000 | 2489 354.27 | 2489 372.11
10000 1000 10000 | 2763 472.47 | 2763 486.73
11000 1000 10000 | 3095 629.70 | 3095 648.31
12000 1000 10000 | 3277 773.06 | 3277 784.75
13000 1000 10000 | 3711 1003.67 | 3711 1006.94
14000 1000 10000 | 3878 1172.70 | 3878 1168.71
15000 1000 10000 | 4301 1436.40 | 4301 1429.89
16000 1000 10000 | 4485 1635.48 | 4485 1620.90
17000 1000 10000 | 4844 1924.27 | 4844 1904.92
18000 1000 10000 | 5218 2232.52 | 5218 2218.80
19000 1000 10000 | 5501 2519.60 | 5501 2508.83
20000 1000 10000 | 5770 2823.98 | 5770 2895.66
21000 1000 10000 | 6161 3183.73 | 6161 3307.54
22000 1000 10000 | 6389 3492.97 | 6389 3620.53
23000 1000 10000 | 6665 3823.38 | 6665 3995.63
24000 1000 10000 | 6982 4199.61 | 6982 4347.95
25000 1000 10000 | 7209 4546.18 | 7209 4701.95
26000 1000 10000 | 7533 4939.12 | 7533 5088.13
27000 1000 10000 | 7959 5442.25 | 7959 5599.85
28000 1000 10000 | 8140 5767.13 | 8140 5817.86
29000 1000 10000 | 8501 6250.86 | 8501 6250.30
30000 1000 10000 | 8911 6791.97 | 8911 6788.51
32000 50 500 | 487 7494.39 | 487 7493.47
32050 50 500 | 486 7541.61 | 486 7541.05
32100 50 500 | 486 7589.02 | 486 7586.12
32150 50 500 | 492 7731.18 | 492 7728.76
32200 50 500 | 497 7854.59 | 497 7854.94
32250 50 500 | 490 7795.12 | 490 7783.10
32300 50 500 | 495 7909.77 | 495 7902.70
32350 50 500 | 495 7967.86 | 495 7946.20
32400 50 500 | 496 8018.98 | 496 7999.34
32450 50 500 | 492 7982.34 | 492 7962.93
32500 50 500 | 495 8060.97 | 495 8048.18
32550 50 500 | 498 8134.02 | 498 8122.08
32600 50 500 | 498 8171.97 | 498 8169.34
32650 50 500 | 498 8213.85 | 498 8209.95
32700 50 500 | 499 8267.67 | 499 8266.13
32751 1 10 | 10 8549.30 | 10 8629.00
32752 1 10 | 10 8556.10 | 10 8636.30
32753 1 10 | 10 8562.20 | 10 8632.00
32754 1 10 | 10 8564.40 | 10 8633.40
32755 1 10 | 10 8553.90 | 10 8635.40
32756 1 10 | 10 8553.00 | 10 8637.60
32757 1 10 | 10 8552.10 | 10 8640.90
32758 1 10 | 10 8557.20 | 10 8644.90
32759 1 10 | 10 8562.00 | 10 8644.10
32760 1 10 | 10 9336.60 | 10 9436.10
32761 1 10 | 10 9337.20 | 10 9435.60
32762 1 10 | 10 9340.70 | 10 9439.10
32763 1 10 | 10 9340.70 | 10 9433.60
32764 1 10 | 10 9340.00 | 10 9440.60
32765 1 10 | 10 10112.80 | 10 10228.40
32766 1 10 | 10 10893.50 | 10 11023.50
32767 1 10 | 10 11674.40 | 10 11813.70


--
-- Hubertus Franke ([email protected])


diff -urbN linux-2.5.7-pre1/kernel/fork.c linux-2.5.7-pre1pid/kernel/fork.c
--- linux-2.5.7-pre1/kernel/fork.c Thu Mar 7 21:18:12 2002
+++ linux-2.5.7-pre1pid/kernel/fork.c Thu Mar 14 16:03:23 2002
@@ -125,24 +125,101 @@
/* Protects next_safe and last_pid. */
spinlock_t lastpid_lock = SPIN_LOCK_UNLOCKED;

+/* this should be provided in every architecture */
+#ifndef SHIFT_PER_LONG
+#if BITS_PER_LONG == 64
+# define SHIFT_PER_LONG 6
+#elif BITS_PER_LONG == 32
+# define SHIFT_PER_LONG 5
+#else
+#error "SHIFT_PER_LONG"
+#endif
+#endif
+
+#define RESERVED_PIDS (300)
+#define GETPID_THRESHOLD (21000) /* when to switch to a mapped algo */
+#define PID_MAP_SIZE (PID_MAX >> SHIFT_PER_LONG)
+static unsigned long pid_map[PID_MAP_SIZE];
+static int next_safe = PID_MAX;
+
+static inline void mark_pid(int pid)
+{
+ pid_map[(pid) >> SHIFT_PER_LONG] |= (1 << ((pid) & (BITS_PER_LONG-1)));
+}
+
+static int get_pid_by_map(int last_pid)
+{
+ struct task_struct *p;
+ int i;
+ int again = 1;
+ unsigned long mask;
+ int fpos;
+
+repeat:
+ for_each_task(p) {
+ mark_pid(p->pid);
+ mark_pid(p->pgrp);
+ mark_pid(p->tgid);
+ mark_pid(p->session);
+ }
+
+ /* find next free pid */
+ i = (last_pid >> SHIFT_PER_LONG);
+ mask = pid_map[i] | (((last_pid & (BITS_PER_LONG-1)) << 1) - 1);
+
+ while ((mask == ~0) && (++i < PID_MAP_SIZE))
+ mask = pid_map[i];
+
+ if (i == PID_MAP_SIZE) {
+ if (again) {
+ /* we didn't find any pid , sweep and try again */
+ again = 0;
+ memset(pid_map, 0, PID_MAP_SIZE * sizeof(unsigned long));
+ last_pid = RESERVED_PIDS;
+ goto repeat;
+ }
+ next_safe = RESERVED_PIDS;
+ return 0;
+ }
+
+ fpos = ffz(mask);
+ i &= (PID_MAX-1);
+ last_pid = (i << SHIFT_PER_LONG) + fpos;
+
+ /* find next save pid */
+ mask &= ~((1 << fpos) - 1);
+
+ while ((mask == 0) && (++i < PID_MAP_SIZE))
+ mask = pid_map[i];
+
+ if (i==PID_MAP_SIZE)
+ next_safe = PID_MAX;
+ else
+ next_safe = (i << SHIFT_PER_LONG) + ffs(mask) - 1;
+ return last_pid;
+}
+
static int get_pid(unsigned long flags)
{
- static int next_safe = PID_MAX;
struct task_struct *p;
- int pid;
+ int pid,beginpid;

if (flags & CLONE_PID)
return current->pid;

spin_lock(&lastpid_lock);
+ beginpid = last_pid;
if((++last_pid) & 0xffff8000) {
- last_pid = 300; /* Skip daemons etc. */
+ last_pid = RESERVED_PIDS; /* Skip daemons etc. */
goto inside;
}
if(last_pid >= next_safe) {
inside:
next_safe = PID_MAX;
read_lock(&tasklist_lock);
+ if (nr_threads > GETPID_THRESHOLD) {
+ last_pid = get_pid_by_map(last_pid);
+ } else {
repeat:
for_each_task(p) {
if(p->pid == last_pid ||
@@ -151,9 +228,11 @@
p->session == last_pid) {
if(++last_pid >= next_safe) {
if(last_pid & 0xffff8000)
- last_pid = 300;
+ last_pid = RESERVED_PIDS;
next_safe = PID_MAX;
}
+ if(unlikely(last_pid == beginpid))
+ goto nomorepids;
goto repeat;
}
if(p->pid > last_pid && next_safe > p->pid)
@@ -162,6 +241,9 @@
next_safe = p->pgrp;
if(p->session > last_pid && next_safe > p->session)
next_safe = p->session;
+ if(p->tgid > last_pid && next_safe > p->tgid)
+ next_safe = p->tgid;
+ }
}
read_unlock(&tasklist_lock);
}
@@ -169,6 +251,10 @@
spin_unlock(&lastpid_lock);

return pid;
+nomorepids:
+ read_unlock(&tasklist_lock);
+ spin_unlock(&lastpid_lock);
+ return 0;
}

static inline int dup_mmap(struct mm_struct * mm)


Attachments:
getpid1.c (12.39 kB)
Test Program

2002-03-15 14:58:22

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] get_pid() performance fix

Hubertus Franke <[email protected]> writes:

> + if (i == PID_MAP_SIZE) {
> + if (again) {
> + /* we didn't find any pid , sweep and try again */
> + again = 0;
> + memset(pid_map, 0, PID_MAP_SIZE * sizeof(unsigned long));
> + last_pid = RESERVED_PIDS;
> + goto repeat;
> + }
> + next_safe = RESERVED_PIDS;
> + return 0;

I think the bug is here.

Probaly, the following is test case: ./getpid1 -r300 -c3

case 3:
populate_all(0, 1);
pid = get_pid(0);
printf("new pid: %d\n", pid);
tsk = find_task_by_pid(400);
del_task(tsk);
pid = get_pid(0);
printf("new pid: %d\n", pid);

break;

result,
new pid: 0
new pid: 1

> + }
> +
> + fpos = ffz(mask);
> + i &= (PID_MAX-1);
> + last_pid = (i << SHIFT_PER_LONG) + fpos;
> +
> + /* find next save pid */
> + mask &= ~((1 << fpos) - 1);
> +
> + while ((mask == 0) && (++i < PID_MAP_SIZE))
> + mask = pid_map[i];
> +
> + if (i==PID_MAP_SIZE)
> + next_safe = PID_MAX;
> + else
> + next_safe = (i << SHIFT_PER_LONG) + ffs(mask) - 1;
> + return last_pid;
> +}
> +
> static int get_pid(unsigned long flags)
> {
> - static int next_safe = PID_MAX;
> struct task_struct *p;
> - int pid;
> + int pid,beginpid;
>
> if (flags & CLONE_PID)
> return current->pid;
>
> spin_lock(&lastpid_lock);
> + beginpid = last_pid;
> if((++last_pid) & 0xffff8000) {
> - last_pid = 300; /* Skip daemons etc. */
> + last_pid = RESERVED_PIDS; /* Skip daemons etc. */
> goto inside;
> }
> if(last_pid >= next_safe) {
> inside:
> next_safe = PID_MAX;
> read_lock(&tasklist_lock);
> + if (nr_threads > GETPID_THRESHOLD) {
> + last_pid = get_pid_by_map(last_pid);
> + } else {
> repeat:
> for_each_task(p) {
> if(p->pid == last_pid ||
> @@ -151,9 +228,11 @@
> p->session == last_pid) {
> if(++last_pid >= next_safe) {
> if(last_pid & 0xffff8000)
> - last_pid = 300;
> + last_pid = RESERVED_PIDS;
> next_safe = PID_MAX;
> }
> + if(unlikely(last_pid == beginpid))
> + goto nomorepids;
> goto repeat;
> }
> if(p->pid > last_pid && next_safe > p->pid)
> @@ -162,6 +241,9 @@
> next_safe = p->pgrp;
> if(p->session > last_pid && next_safe > p->session)
> next_safe = p->session;
> + if(p->tgid > last_pid && next_safe > p->tgid)
> + next_safe = p->tgid;
> + }
> }
> read_unlock(&tasklist_lock);
> }
> @@ -169,6 +251,10 @@
> spin_unlock(&lastpid_lock);
>
> return pid;
> +nomorepids:

Probably, I think the following line are required, here.

+ next_safe = RESERVED_PIDS; or 0

> + read_unlock(&tasklist_lock);
> + spin_unlock(&lastpid_lock);
> + return 0;
> }
>
> static inline int dup_mmap(struct mm_struct * mm)
>

--
OGAWA Hirofumi <[email protected]>

2002-03-15 15:17:33

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] get_pid() performance fix

Whoops! I'm sorry. previous email was the middle of writing.

Hubertus Franke <[email protected]> writes:

> + if (i == PID_MAP_SIZE) {
> + if (again) {
> + /* we didn't find any pid , sweep and try again */
> + again = 0;
> + memset(pid_map, 0, PID_MAP_SIZE * sizeof(unsigned long));
> + last_pid = RESERVED_PIDS;
> + goto repeat;
> + }
> + next_safe = RESERVED_PIDS;
> + return 0;

Probably, the bug is here.

the following is test case: ./getpid1 -r300 -c3

case 3:
populate_all(0, 1);
pid = get_pid(0);
printf("new pid: %d\n", pid);
tsk = find_task_by_pid(400);
del_task(tsk);
pid = get_pid(0);
printf("new pid: %d\n", pid);

break;

result,
new pid: 0
new pid: 1

> + }
> +
> + fpos = ffz(mask);
> + i &= (PID_MAX-1);
> + last_pid = (i << SHIFT_PER_LONG) + fpos;
> +
> + /* find next save pid */
> + mask &= ~((1 << fpos) - 1);
> +
> + while ((mask == 0) && (++i < PID_MAP_SIZE))
> + mask = pid_map[i];
> +
> + if (i==PID_MAP_SIZE)
> + next_safe = PID_MAX;
> + else
> + next_safe = (i << SHIFT_PER_LONG) + ffs(mask) - 1;
> + return last_pid;
> +}
> +
> static int get_pid(unsigned long flags)
> {
> - static int next_safe = PID_MAX;
> struct task_struct *p;
> - int pid;
> + int pid,beginpid;
>
> if (flags & CLONE_PID)
> return current->pid;
>
> spin_lock(&lastpid_lock);
> + beginpid = last_pid;
> if((++last_pid) & 0xffff8000) {
> - last_pid = 300; /* Skip daemons etc. */
> + last_pid = RESERVED_PIDS; /* Skip daemons etc. */
> goto inside;
> }
> if(last_pid >= next_safe) {
> inside:
> next_safe = PID_MAX;
> read_lock(&tasklist_lock);
> + if (nr_threads > GETPID_THRESHOLD) {
> + last_pid = get_pid_by_map(last_pid);
> + } else {
> repeat:
> for_each_task(p) {
> if(p->pid == last_pid ||
> @@ -151,9 +228,11 @@
> p->session == last_pid) {
> if(++last_pid >= next_safe) {
> if(last_pid & 0xffff8000)
> - last_pid = 300;
> + last_pid = RESERVED_PIDS;
> next_safe = PID_MAX;
> }
> + if(unlikely(last_pid == beginpid))
> + goto nomorepids;
> goto repeat;
> }
> if(p->pid > last_pid && next_safe > p->pid)
> @@ -162,6 +241,9 @@
> next_safe = p->pgrp;
> if(p->session > last_pid && next_safe > p->session)
> next_safe = p->session;
> + if(p->tgid > last_pid && next_safe > p->tgid)
> + next_safe = p->tgid;
> + }
> }
> read_unlock(&tasklist_lock);
> }
> @@ -169,6 +251,10 @@
> spin_unlock(&lastpid_lock);
>
> return pid;
> +nomorepids:

Probably, the following line are required, here.

+ next_safe = RESERVED_PIDS; /* or 0 */

> + read_unlock(&tasklist_lock);
> + spin_unlock(&lastpid_lock);
> + return 0;
> }


Basically nice, I think.

BTW, How about using the __set_bit(), find_next_zero_bit(), and
find_next_bit() in get_pid_by_map().

Thanks for nice work.
--
OGAWA Hirofumi <[email protected]>

2002-03-15 18:36:40

by Hubertus Franke

[permalink] [raw]
Subject: Re: [PATCH] get_pid() performance fix

On Friday 15 March 2002 10:16 am, OGAWA Hirofumi wrote:
> Whoops! I'm sorry. previous email was the middle of writing.
>
> Hubertus Franke <[email protected]> writes:
> > + if (i == PID_MAP_SIZE) {
> > + if (again) {
> > + /* we didn't find any pid , sweep and try again */
> > + again = 0;
> > + memset(pid_map, 0, PID_MAP_SIZE * sizeof(unsigned long));
> > + last_pid = RESERVED_PIDS;
> > + goto repeat;
> > + }
> > + next_safe = RESERVED_PIDS;
> > + return 0;
>
> Probably, the bug is here. No bug ....

>
> + next_safe = RESERVED_PIDS; /* or 0 */
>
> > + read_unlock(&tasklist_lock);
> > + spin_unlock(&lastpid_lock);
> > + return 0;
> > }
>
> Basically nice, I think.
>
> BTW, How about using the __set_bit(), find_next_zero_bit(), and
> find_next_bit() in get_pid_by_map().
>
> Thanks for nice work.

OGAWA, honestly I only tried testcase 2.
But looking at your suggestion its not clear to me whether
there is a bug.
Remember we need to determine a valid interval [ last_pid .. next_safe ).
In the pid_map function, if no pid is available, then
[ PID_MAX .. PID_MAX ) will be returned.
The other path should also end up with this as well.
Could you point where you see this not happening.

In the next release, I'll look at using your bitmap function suggestion.

--
-- Hubertus Franke ([email protected])

2002-03-16 05:13:07

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] get_pid() performance fix

Hubertus Franke <[email protected]> writes:

> On Friday 15 March 2002 10:16 am, OGAWA Hirofumi wrote:
> > Whoops! I'm sorry. previous email was the middle of writing.
> >
> > Hubertus Franke <[email protected]> writes:
> > > + if (i == PID_MAP_SIZE) {
> > > + if (again) {
> > > + /* we didn't find any pid , sweep and try again */
> > > + again = 0;
> > > + memset(pid_map, 0, PID_MAP_SIZE * sizeof(unsigned long));
> > > + last_pid = RESERVED_PIDS;
> > > + goto repeat;
> > > + }
> > > + next_safe = RESERVED_PIDS;
> > > + return 0;
> >
> > Probably, the bug is here. No bug ....
>
> >
> > + next_safe = RESERVED_PIDS; /* or 0 */
> >
> > > + read_unlock(&tasklist_lock);
> > > + spin_unlock(&lastpid_lock);
> > > + return 0;
> > > }
> >
> > Basically nice, I think.
> >
> > BTW, How about using the __set_bit(), find_next_zero_bit(), and
> > find_next_bit() in get_pid_by_map().
> >
> > Thanks for nice work.
>
> OGAWA, honestly I only tried testcase 2.
> But looking at your suggestion its not clear to me whether
> there is a bug.
> Remember we need to determine a valid interval [ last_pid .. next_safe ).
> In the pid_map function, if no pid is available, then
> [ PID_MAX .. PID_MAX ) will be returned.
> The other path should also end up with this as well.
> Could you point where you see this not happening.

Maybe my point was unclear. Sorry.

Please consider what happens after using up pid. Then,
get_pid_by_map() returns 0.

And last_pid = 0, next_safe = RESERVED_PIDS. After it, get_pid()
returns the values between 0 and RESERVED_PIDS.

And the line which I added is also the same reason.

Regards.
--
OGAWA Hirofumi <[email protected]>

2002-03-18 21:44:26

by Hubertus Franke

[permalink] [raw]
Subject: Re: [PATCH] get_pid() performance fix

On Saturday 16 March 2002 12:12 am, OGAWA Hirofumi wrote:
> Hubertus Franke <[email protected]> writes:
> > On Friday 15 March 2002 10:16 am, OGAWA Hirofumi wrote:
> > > Whoops! I'm sorry. previous email was the middle of writing.
> > >
> > > Hubertus Franke <[email protected]> writes:
> > > > + if (i == PID_MAP_SIZE) {
> > > > + if (again) {
> > > > + /* we didn't find any pid , sweep and try again */
> > > > + again = 0;
> > > > + memset(pid_map, 0, PID_MAP_SIZE * sizeof(unsigned long));
> > > > + last_pid = RESERVED_PIDS;
> > > > + goto repeat;
> > > > + }
> > > > + next_safe = RESERVED_PIDS;
> > > > + return 0;
> > >
> > > Probably, the bug is here. No bug ....
> > >
> > >
> > > + next_safe = RESERVED_PIDS; /* or 0 */
> > >
> > > > + read_unlock(&tasklist_lock);
> > > > + spin_unlock(&lastpid_lock);
> > > > + return 0;
> > > > }
> > >
> > > Basically nice, I think.
> > >
> > > BTW, How about using the __set_bit(), find_next_zero_bit(), and
> > > find_next_bit() in get_pid_by_map().
> > >
> > > Thanks for nice work.
> >
> > OGAWA, honestly I only tried testcase 2.
> > But looking at your suggestion its not clear to me whether
> > there is a bug.
> > Remember we need to determine a valid interval [ last_pid .. next_safe ).
> > In the pid_map function, if no pid is available, then
> > [ PID_MAX .. PID_MAX ) will be returned.
> > The other path should also end up with this as well.
> > Could you point where you see this not happening.
>
> Maybe my point was unclear. Sorry.
>
> Please consider what happens after using up pid. Then,
> get_pid_by_map() returns 0.
>
> And last_pid = 0, next_safe = RESERVED_PIDS. After it, get_pid()
> returns the values between 0 and RESERVED_PIDS.
>
> And the line which I added is also the same reason.
>
> Regards.

Hirofumi, yes you are right, "the thought was there, but the implementation
was weak".
Proofs again, all these benchmarks are not enough. I'll make a new patch
soon and resubmit.
--
-- Hubertus Franke ([email protected])

2002-03-22 22:13:45

by Hubertus Franke

[permalink] [raw]
Subject: [PATCH] get_pid() performance fix

I implemented an alternative version of getpid, that for large thread counts
( > 220000), provides "significantly" better performance as shown in attached
performance analysis. This is particulary viable for PID_MAX=32768.
Note under the current algorithm, allocating the last pid will take 32
seconds (!!!) on a 500MHZ P-III.
I addressed the correctnes issue that Hirofumi brought up with the last
submission (now test case "-c 4")
Attached patch is against 2.5.7.

-- Hubertus Franke <[email protected]>

------------------------------------------------------------------------------

Currently the getpid algorithm works as follows:
At any given time an interval of [ last_pid .. next_safe ) is known to hold
unused pids. Initially the interval is set to [0 .. 32767]

Hence to allocate a new pid the following is sufficient:

if (++last_pid < next_safe) return last_pid;

However, if we move out of the safe interval, the next safe interval needs
to be established first.
This is currently done by a repetive search

repeat:
foralltasks(p) {
if (p uses lastpid) { last_pid++; goto repeat; }
/* narrow [ last_pid .. next_safe ) */
if (p->pids in [ last_pid .. next_safe ) ) next_safe = p->pid
}

Particulary for large number of tasks, this can lead to frequent exercise of
the repeat resulting in a O(N^2) algorithm. We call this : <algo-0>.

Instead I have provided an alternative mechanism that at the time
of determining the next interval marks a bitmask by walking the tasklist
once [ O(N) ] and then finding the proper bit offsets to mark the next free
interval starting from last_pid. The bitmap requires 4096 bytes.
This is <algo-1>.

An optimization to this to keep the last bitmap instead of clearing it
with every search. Only if we fail to obtain a free range, then we have
to go back and clear the bitmap and redo the search one more time.
This is <algo-2>. Here I omit the results, as they have only shown
improvements for the very last few pids.

I dragged the various algorithms into a userlevel test program to figure
out where the cut off points are with PID_MAX=32768. In this testprogram
I maintain <TH> tasks, and for 10 rounds (delete <DEL> random tasks and
reallocate <DEL> tasks again) resulting in T=10*D total measured allocations.

Si states how many interval searches where needed for algo-i.
Gi states the average overhead per get_pid for algo-i in usecs.

Based on that one should use the current algorithm until ~ 22-25K tasks and
beyond that use algo-2. Only the last 15 tasks are a bit faster under algo-1.
We can safely ignore that case.

Based on that providing an adaptive implementation seems the right choice.
The patch for 2.5.7 is attached.


executed program example: getpid -c 2 -s 10 -e 100 -D 10 -t <0,1>
0 is old 1 is new algo 2.

TH: average thread count in system
DEL: number of randomly deleted threads and then reallocated
TAL: total number of getpid invocation
C-NEW: number of times search was invoked
T-NEW: per getpid() overhead
C-OLD: number of times search was invoked
T-OLD: per getpid() overhead


TH DEL TAL | C-NEW T-NEW | C-OLD T-OLD
10 10 80 | 1 0.79 | 1 0.41
20 10 100 | 1 0.56 | 1 0.36
30 10 100 | 1 0.56 | 1 0.38
40 10 100 | 1 0.58 | 1 0.42
50 10 100 | 1 0.59 | 1 0.43
60 10 100 | 2 0.84 | 2 0.52
70 10 100 | 1 0.72 | 1 0.43
80 10 100 | 1 0.72 | 1 0.48
100 50 500 | 2 0.38 | 2 0.27
150 50 500 | 4 0.56 | 3 0.32
200 50 500 | 5 0.79 | 4 0.40
250 50 500 | 6 1.03 | 2 0.35
300 50 500 | 9 1.68 | 2 0.38
350 50 500 | 6 1.47 | 4 0.58
400 50 500 | 10 2.54 | 8 1.05
450 50 500 | 7 2.10 | 7 1.02
500 50 500 | 7 2.32 | 4 0.75
550 50 500 | 9 3.13 | 5 0.95
600 50 500 | 14 5.14 | 6 1.18
650 50 500 | 13 5.23 | 9 1.79
700 50 500 | 10 4.35 | 9 1.87
750 50 500 | 15 6.91 | 8 2.02
800 50 500 | 12 5.98 | 8 1.95
850 50 500 | 13 6.85 | 9 2.29
900 50 500 | 27 14.55 | 18 4.46
1000 1000 9980 | 1 0.21 | 1 0.21
2000 1000 10000 | 322 14.36 | 76 2.03
3000 1000 10000 | 606 46.10 | 161 6.63
4000 1000 10000 | 901 97.58 | 359 18.87
5000 1000 10000 | 1165 164.19 | 539 38.75
6000 1000 10000 | 1498 266.58 | 724 66.96
7000 1000 10000 | 1835 396.91 | 1026 122.35
8000 1000 10000 | 2084 531.83 | 1228 179.70
9000 1000 10000 | 2489 746.99 | 1516 264.16
10000 1000 10000 | 2763 946.99 | 1818 375.22
11000 1000 10000 | 3095 1199.73 | 2093 502.47
12000 1000 10000 | 3277 1422.02 | 2305 648.17
13000 1000 10000 | 3711 1776.28 | 2680 854.89
14000 1000 10000 | 3878 2033.30 | 2959 1061.61
15000 1000 10000 | 4301 2452.35 | 3167 1266.78
16000 1000 10000 | 4485 2757.00 | 3466 1554.22
17000 1000 10000 | 4844 3210.19 | 3743 1857.74
18000 1000 10000 | 5218 3681.90 | 4069 2247.13
19000 1000 10000 | 5501 4118.69 | 4293 2605.90
20000 1000 10000 | 5770 4594.99 | 4616 3095.39
21000 1000 10000 | 6161 5172.44 | 4974 3686.87
22000 1000 10000 | 6389 5637.80 | 5177 4258.81
23000 1000 10000 | 6665 6191.73 | 5483 4949.61
24000 1000 10000 | 6982 6777.02 | 5838 5831.25
25000 1000 10000 | 7209 7318.15 | 6183 6905.34
----------------> Break even range <------------------------
26000 1000 10000 | 7533 7954.33 | 6413 7955.66
27000 1000 10000 | 7959 8749.97 | 6748 9444.79
28000 1000 10000 | 8140 9302.36 | 7139 11617.75
29000 1000 10000 | 8501 10085.77 | 7638 14960.53
30000 1000 10000 | 8911 10946.80 | 8178 19584.24
32000 50 500 | 487 12265.89 | 486 94498.36
32050 50 500 | 486 12314.17 | 486 95832.03
32100 50 500 | 486 12389.38 | 488 108895.28
32150 50 500 | 492 12599.58 | 484 111742.39
32200 50 500 | 497 12792.36 | 491 124440.45
32250 50 500 | 490 12659.33 | 490 134499.09
32300 50 500 | 495 12843.78 | 489 145873.72
32350 50 500 | 495 12915.18 | 493 158940.66
32400 50 500 | 496 12988.64 | 494 183092.45
32450 50 500 | 492 12924.86 | 490 196135.28
32500 50 500 | 495 13043.85 | 488 223226.98
32550 50 500 | 498 13164.09 | 495 265431.10
32600 50 500 | 498 13222.56 | 495 326363.36
32650 50 500 | 498 13279.90 | 497 441002.09
32700 50 500 | 499 13368.66 | 499 656269.52
32751 1 10 | 10 12836.40 | 10 4031660.40
32752 1 10 | 10 12831.70 | 10 4620214.70
32753 1 10 | 10 12832.60 | 10 4188605.70
32754 1 10 | 10 12837.60 | 10 2972975.40
32755 1 10 | 10 12835.90 | 10 4434635.70
32756 1 10 | 10 12833.80 | 10 3892058.30
32757 1 10 | 10 12839.10 | 10 5002365.30
32758 1 10 | 10 12840.20 | 10 6332786.20
32759 1 10 | 10 12840.20 | 10 5269462.90
32760 1 10 | 10 14127.80 | 10 8234780.40
32761 1 10 | 10 14134.90 | 10 7558043.20
32762 1 10 | 10 14129.70 | 10 9117119.40
32763 1 10 | 10 14134.70 | 10 13895498.10
32764 1 10 | 10 14140.10 | 10 13608972.90
32765 1 10 | 10 15427.30 | 10 12930469.20
32766 1 10 | 10 16708.30 | 10 23576610.90
32767 1 10 | 10 17997.90 | 10 32603396.10

--
-- Hubertus Franke ([email protected])

diff -urbN linux-2.5.7/kernel/fork.c linux-2.5.7-pid/kernel/fork.c
--- linux-2.5.7/kernel/fork.c Mon Mar 18 15:37:05 2002
+++ linux-2.5.7-pid/kernel/fork.c Fri Mar 22 16:38:29 2002
@@ -125,24 +125,111 @@
/* Protects next_safe and last_pid. */
spinlock_t lastpid_lock = SPIN_LOCK_UNLOCKED;

+/* this should be provided in every architecture */
+#ifndef SHIFT_PER_LONG
+#if BITS_PER_LONG == 64
+# define SHIFT_PER_LONG 6
+#elif BITS_PER_LONG == 32
+# define SHIFT_PER_LONG 5
+#else
+#error "SHIFT_PER_LONG"
+#endif
+#endif
+
+#define RESERVED_PIDS (300)
+#define GETPID_THRESHOLD (26000) /* when to switch to a mapped algo */
+#define PID_MAP_SIZE (PID_MAX >> SHIFT_PER_LONG)
+static unsigned long pid_map[PID_MAP_SIZE];
+static int next_safe = PID_MAX;
+
+static inline void mark_pid(int pid)
+{
+ __set_bit(pid,pid_map);
+}
+
+static int get_pid_by_map(int lastpid)
+{
+ static int mark_and_sweep = 0;
+
+ int round = 0;
+ struct task_struct *p;
+ int i;
+ unsigned long mask;
+ int fpos;
+
+
+ if (mark_and_sweep) {
+repeat:
+ mark_and_sweep = 0;
+ memset(pid_map, 0, PID_MAP_SIZE * sizeof(unsigned long));
+ lastpid = RESERVED_PIDS;
+ }
+ for_each_task(p) {
+ mark_pid(p->pid);
+ mark_pid(p->pgrp);
+ mark_pid(p->tgid);
+ mark_pid(p->session);
+ }
+
+ /* find next free pid */
+ i = (lastpid >> SHIFT_PER_LONG);
+ mask = pid_map[i] | (((lastpid & (BITS_PER_LONG-1)) << 1) - 1);
+
+ while ((mask == ~0) && (++i < PID_MAP_SIZE))
+ mask = pid_map[i];
+
+ if (i == PID_MAP_SIZE) {
+ if (round == 0) {
+ round = 1;
+ goto repeat;
+ }
+ next_safe = RESERVED_PIDS;
+ mark_and_sweep = 1; /* mark next time */
+ return 0;
+ }
+
+ fpos = ffz(mask);
+ i &= (PID_MAX-1);
+ lastpid = (i << SHIFT_PER_LONG) + fpos;
+
+ /* find next save pid */
+ mask &= ~((1 << fpos) - 1);
+
+ while ((mask == 0) && (++i < PID_MAP_SIZE))
+ mask = pid_map[i];
+
+ if (i==PID_MAP_SIZE)
+ next_safe = PID_MAX;
+ else
+ next_safe = (i << SHIFT_PER_LONG) + ffs(mask) - 1;
+ return lastpid;
+}
+
static int get_pid(unsigned long flags)
{
- static int next_safe = PID_MAX;
struct task_struct *p;
- int pid;
+ int pid,beginpid;

if (flags & CLONE_PID)
return current->pid;

spin_lock(&lastpid_lock);
+ beginpid = last_pid;
if((++last_pid) & 0xffff8000) {
- last_pid = 300; /* Skip daemons etc. */
+ last_pid = RESERVED_PIDS; /* Skip daemons etc. */
goto inside;
}
if(last_pid >= next_safe) {
inside:
next_safe = PID_MAX;
read_lock(&tasklist_lock);
+ if (nr_threads > GETPID_THRESHOLD) {
+ last_pid = get_pid_by_map(last_pid);
+ if (unlikely(last_pid == 0)) {
+ last_pid = PID_MAX;
+ goto nomorepids;
+ }
+ } else {
repeat:
for_each_task(p) {
if(p->pid == last_pid ||
@@ -151,9 +238,11 @@
p->session == last_pid) {
if(++last_pid >= next_safe) {
if(last_pid & 0xffff8000)
- last_pid = 300;
+ last_pid = RESERVED_PIDS;
next_safe = PID_MAX;
}
+ if(unlikely(last_pid == beginpid))
+ goto nomorepids;
goto repeat;
}
if(p->pid > last_pid && next_safe > p->pid)
@@ -162,6 +251,9 @@
next_safe = p->pgrp;
if(p->session > last_pid && next_safe > p->session)
next_safe = p->session;
+ if(p->tgid > last_pid && next_safe > p->tgid)
+ next_safe = p->tgid;
+ }
}
read_unlock(&tasklist_lock);
}
@@ -169,6 +261,10 @@
spin_unlock(&lastpid_lock);

return pid;
+nomorepids:
+ read_unlock(&tasklist_lock);
+ spin_unlock(&lastpid_lock);
+ return 0;
}

static inline int dup_mmap(struct mm_struct * mm)

2002-03-22 22:23:25

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] get_pid() performance fix

On Fri, 22 Mar 2002, Hubertus Franke wrote:

> I implemented an alternative version of getpid, that for large thread counts
> ( > 220000), provides "significantly" better performance as shown in attached
^^^^^^
You've a very nice system Hubertus because it's about 1.8Gb only for the
stack :-)



- Davide


2002-03-22 22:26:35

by Hubertus Franke

[permalink] [raw]
Subject: Re: [PATCH] get_pid() performance fix

On Friday 22 March 2002 05:28 pm, Davide Libenzi wrote:
> On Fri, 22 Mar 2002, Hubertus Franke wrote:
> > I implemented an alternative version of getpid, that for large thread
> > counts ( > 220000), provides "significantly" better performance as shown
> > in attached
>
> ^^^^^^
> You've a very nice system Hubertus because it's about 1.8Gb only for the
> stack :-)
>
>
>
> - Davide


Friday evening alert. Already thinking about the pub...
Meant 22000-25000.


--
-- Hubertus Franke ([email protected])