2010-06-09 21:00:41

by Salman Qazi

[permalink] [raw]
Subject: [PATCH] Fix a race in pid generation that causes pids to be reused immediately.

A program that repeatedly forks and waits is susceptible to having the
same pid repeated, especially when it competes with another instance of the
same program. This is really bad for bash implementation. Furthermore, many shell
scripts assume that pid numbers will not be used for some length of time.

Race Description:

A B

// pid == offset == n // pid == offset == n + 1
test_and_set_bit(offset, map->page)
test_and_set_bit(offset, map->page);
pid_ns->last_pid = pid;
pid_ns->last_pid = pid;
// pid == n + 1 is freed (wait())

// Next fork()...
last = pid_ns->last_pid; // == n
pid = last + 1;

Code to reproduce it (Running multiple instances is more effective):

#include <errno.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

// The distance mod 32768 between two pids, where the first pid is expected
// to be smaller than the second.
int PidDistance(pid_t first, pid_t second) {
return (second + 32768 - first) % 32768;
}

int main(int argc, char* argv[]) {
int failed = 0;
pid_t last_pid = 0;
int i;
printf("%d\n", sizeof(pid_t));
for (i = 0; i < 10000000; ++i) {
if (i % 32786 == 0)
printf("Iter: %d\n", i/32768);
int child_exit_code = i % 256;
pid_t pid = fork();
if (pid == -1) {
fprintf(stderr, "fork failed, iteration %d, errno=%d", i, errno);
exit(1);
}
if (pid == 0) {
// Child
exit(child_exit_code);
} else {
// Parent
if (i > 0) {
int distance = PidDistance(last_pid, pid);
if (distance == 0 || distance > 30000) {
fprintf(stderr,
"Unexpected pid sequence: previous fork: pid=%d, "
"current fork: pid=%d for iteration=%d.\n",
last_pid, pid, i);
failed = 1;
}
}
last_pid = pid;
int status;
int reaped = wait(&status);
if (reaped != pid) {
fprintf(stderr,
"Wait return value: expected pid=%d, "
"got %d, iteration %d\n",
pid, reaped, i);
failed = 1;
} else if (WEXITSTATUS(status) != child_exit_code) {
fprintf(stderr,
"Unexpected exit status %x, iteration %d\n",
WEXITSTATUS(status), i);
failed = 1;
}
}
}
exit(failed);
}


Thanks to Ted Tso for the key ideas of this implementation.

Signed-off-by: Salman Qazi <[email protected]>
---
kernel/pid.c | 39 ++++++++++++++++++++++++++++++++++++++-
1 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index e9fd8c1..865a482 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -122,6 +122,22 @@ static void free_pidmap(struct upid *upid)
atomic_inc(&map->nr_free);
}

+/*
+ * If we started walking pids at 'base', is 'a' seen before 'b'?
+ *
+ */
+static int pid_before(int base, int a, int b)
+{
+ int a_lt_b = (a < b);
+ int min_a_b = min(a, b);
+ int max_a_b = max(a, b);
+
+ if ((base <= min_a_b) || (base >= max_a_b))
+ return a_lt_b;
+
+ return !a_lt_b;
+}
+
static int alloc_pidmap(struct pid_namespace *pid_ns)
{
int i, offset, max_scan, pid, last = pid_ns->last_pid;
@@ -153,8 +169,29 @@ static int alloc_pidmap(struct pid_namespace *pid_ns)
if (likely(atomic_read(&map->nr_free))) {
do {
if (!test_and_set_bit(offset, map->page)) {
+ int prev;
+ int last_write = last;
atomic_dec(&map->nr_free);
- pid_ns->last_pid = pid;
+
+ /*
+ * We might be racing with someone else trying
+ * to set pid_ns->last_pid. We want the
+ * the winner to have the "later" value,
+ * because if the "earlier" value prevails, then
+ * a pid may get reused immediately.
+ *
+ * Since pids rollover, it is not sufficent
+ * to just pick the bigger value. We
+ * have to consider where we started counting
+ * from.
+ */
+ do {
+ prev = last_write;
+ last_write = cmpxchg(&pid_ns->last_pid,
+ prev, pid);
+ } while ((prev != last_write) &&
+ (pid_before(last, last_write, pid)));
+
return pid;
}
offset = find_next_offset(map, offset);


2010-06-09 21:22:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix a race in pid generation that causes pids to be reused immediately.



On Wed, 9 Jun 2010, Salman wrote:
> +/*
> + * If we started walking pids at 'base', is 'a' seen before 'b'?
> + *
> + */
> +static int pid_before(int base, int a, int b)
> +{
> + int a_lt_b = (a < b);
> + int min_a_b = min(a, b);
> + int max_a_b = max(a, b);
> +
> + if ((base <= min_a_b) || (base >= max_a_b))
> + return a_lt_b;
> +
> + return !a_lt_b;
> +}

Ok, so that's a very confusing expression. I'm sure it gets the right
value, but it's not exactly straightforward, is it?

Wouldn't it be nicer to write it out in a more straightforward way?
Something like

/* a and b in order? base must not be between them */
if (a <= b)
return (base <= a || base >= b);
/* b < a? We reach 'a' first iff base is between them */
return base >= b && base <= a;

would seem to be equivalent and easier to explain, no?

And when you write it that way, it looks like the compiler should be able
to trivially CSE the five comparisons down to just three (notice how the
"base <= a" and "base >= b" comparisons are repeated. Which I'm sure some
super-optimizing compiler can do from your version too, but mine seems
more straightforward.

But maybe I did that thing wrong, and I just confused myself. I have _not_
checked the logic deeply, somebody else should definitely double-check me.

Linus

2010-06-09 21:34:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] Fix a race in pid generation that causes pids to be reused immediately.

On Wed, 2010-06-09 at 14:21 -0700, Linus Torvalds wrote:
>
> On Wed, 9 Jun 2010, Salman wrote:
> > +/*
> > + * If we started walking pids at 'base', is 'a' seen before 'b'?
> > + *
> > + */
> > +static int pid_before(int base, int a, int b)
> > +{
> > + int a_lt_b = (a < b);
> > + int min_a_b = min(a, b);
> > + int max_a_b = max(a, b);
> > +
> > + if ((base <= min_a_b) || (base >= max_a_b))
> > + return a_lt_b;
> > +
> > + return !a_lt_b;
> > +}
>
> Ok, so that's a very confusing expression. I'm sure it gets the right
> value, but it's not exactly straightforward, is it?
>
> Wouldn't it be nicer to write it out in a more straightforward way?
> Something like
>
> /* a and b in order? base must not be between them */
> if (a <= b)
> return (base <= a || base >= b);
> /* b < a? We reach 'a' first iff base is between them */
> return base >= b && base <= a;
>
> would seem to be equivalent and easier to explain, no?
>
> And when you write it that way, it looks like the compiler should be able
> to trivially CSE the five comparisons down to just three (notice how the
> "base <= a" and "base >= b" comparisons are repeated. Which I'm sure some
> super-optimizing compiler can do from your version too, but mine seems
> more straightforward.
>
> But maybe I did that thing wrong, and I just confused myself. I have _not_
> checked the logic deeply, somebody else should definitely double-check me.

Isn't: return a - base < b - base, the natural way to express this?

2010-06-09 22:21:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix a race in pid generation that causes pids to be reused immediately.



On Wed, 9 Jun 2010, Peter Zijlstra wrote:
>
> Isn't: return a - base < b - base, the natural way to express this?

Quite possibly. I'd worry about the overflow case a bit, but it's
certainly going to get the right value when base << MAX_INT.

Linus

2010-06-09 22:28:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix a race in pid generation that causes pids to be reused immediately.



On Wed, 9 Jun 2010, Linus Torvalds wrote:
>
> Quite possibly. I'd worry about the overflow case a bit, but it's
> certainly going to get the right value when base << MAX_INT.

Having given it a couple of seconds more thought, I don't think there is
an overflow case either. All of a/b/base are guaranteed to be non-negative
(or our pid code is in worse trouble anyway), so there is no overflow
possible. So yes. Just comparing a-base < b-base should always be safe.

Linus

2010-06-10 00:08:59

by Salman Qazi

[permalink] [raw]
Subject: Re: [PATCH] Fix a race in pid generation that causes pids to be reused immediately.

On Wed, Jun 9, 2010 at 3:27 PM, Linus Torvalds
<[email protected]> wrote:
>
>
> On Wed, 9 Jun 2010, Linus Torvalds wrote:
>>
>> Quite possibly. I'd worry about the overflow case a bit, but it's
>> certainly going to get the right value when base << MAX_INT.
>
> Having given it a couple of seconds more thought, I don't think there is
> an overflow case either. All of a/b/base are guaranteed to be non-negative
> (or our pid code is in worse trouble anyway), so there is no overflow
> possible. So yes. Just comparing a-base < b-base should always be safe

I don't think this gives the right answer in the a < base < b case.
Here a - base < 0 and
b - base > 0. But we really want b to be before a, since a has rolled
over further than b. I think
the right solution is comparing (a - base + max_pid) % max_pid with (b
- base + max_pid) % max_pid. Am I correct or deluded?
.
>
> ? ? ? ? ? ? ? ? ? ? ? ?Linus
>
>
>

2010-06-10 00:20:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix a race in pid generation that causes pids to be reused immediately.



On Wed, 9 Jun 2010, Salman Qazi wrote:
>
> I don't think this gives the right answer in the a < base < b case.
> Here a - base < 0 and
> b - base > 0. But we really want b to be before a, since a has rolled
> over further than b.

Right you are.

> I think the right solution is comparing (a - base + max_pid) % max_pid
> with (b - base + max_pid) % max_pid. Am I correct or deluded? .

That would work, but it would be horrible. Just use the three compares
version: doing a integer 'mod' operator is _way_ more expensive.

Linus