2001-03-13 01:41:17

by David Shoon

[permalink] [raw]
Subject: system hang with "__alloc_page: 1-order allocation failed"

Hi,

After some testing, 2.4.2, 2.4.2-pre3, and 2.4.3-ac18 and ac19 both
crash/hang when a fork loop (bomb) is executed (under a normal user) and
killed (by a superuser). This isn't what you'd expect in previous
kernels (2.2.x, and 2.0.x), as they both return to normal after killing
the process.

(This might be related to an earlier post about memory allocation?)

Anyway, a 'forkbomb' just looks like this (sorry, just clarifying the
obvious):

int main() {
while (1)
fork();
}

With 2.4.2, 2.4.2-pre3 after killing the process (ctrl-c or killall -9
prog) the kernel dumps error messages of: "__alloc_page: 1-order
allocation failed" continuously for a few minutes and then starts to
(randomly?) kill other processes which are active (such as xfs, bash)
with "Out of Memory: Killed process ### (etc.)". Keyboard input doesn't
work, but you can still switch vconsoles.

Under 2.4.2-ac18/19, the system doesn't show the error messages, but it
still hangs after you kill the process. All keyboard input freezes
eventually (can't switch vconsoles).

I'm not sure if it helps, but the system I'm testing this on is a PIII
500mhz, with 196megs of ram, with swap disabled just so I know it's not
device read/writes.

If anyone needs more info, give me a holler..

[ please cc: replies back to me since i'm not on the linux kernel list ]

p.s. apologies if this is already known or fixed
--
David Shoon
[email protected]



2001-03-13 10:51:55

by Mike Galbraith

[permalink] [raw]
Subject: Re: system hang with "__alloc_page: 1-order allocation failed"

On Tue, 13 Mar 2001, David Shoon wrote:

> Hi,

Greetings,

> After some testing, 2.4.2, 2.4.2-pre3, and 2.4.3-ac18 and ac19 both
> crash/hang when a fork loop (bomb) is executed (under a normal user) and
> killed (by a superuser). This isn't what you'd expect in previous
> kernels (2.2.x, and 2.0.x), as they both return to normal after killing
> the process.
>
> (This might be related to an earlier post about memory allocation?)
>
> Anyway, a 'forkbomb' just looks like this (sorry, just clarifying the
> obvious):
>
> int main() {
> while (1)
> fork();
> }

Allowing users to run in an unlimited environment is bad of course, and
setting a process limit will cure that. However...

> With 2.4.2, 2.4.2-pre3 after killing the process (ctrl-c or killall -9
> prog) the kernel dumps error messages of: "__alloc_page: 1-order
> allocation failed" continuously for a few minutes and then starts to
> (randomly?) kill other processes which are active (such as xfs, bash)
> with "Out of Memory: Killed process ### (etc.)". Keyboard input doesn't
> work, but you can still switch vconsoles.

(The oom killer doesn't activate here.. bummer)

First problem is that since high order allocations (>0) can fail for
root just as for any user. It can (and does currently) happen that
root can't fork a task in order to kill the forkbombs. Here, I even
modified __alloc_pages() to never allow root allocations to fail..
that alone isn't enough, because it can (does) happen that even though
I never give up, the little memory which is free is so fragmented that
I cannot allocate a task struct (order 1). Everything swappable has
been swapped and the freed memory consumed by the greedy little user
processes.. so it never gets better and the kernel just pages forever.

(A workaround is to lower max_threads to 25% of memory.. works, but is
really cheezy. OTOH, allowing half of memory to be allocated in task
structs is a bit cheezy looking too. That means that these tasks can't
be big enough to be doing real work.. no?)

Second problem is that even SysRq-E doesn't always kill reliably when
you're very low on memory, so your fork bomb may take off all over again..
and it does exactly that.

Third problem _appears_ (heavy emphasis required) to be the scheduler.
Even with the allocator never giving up on root allocations and reducing
max_threads, it happens here that root's killall -9 forkbomb never RUNS
despite order 1 being available.. unless root's shell is set SCHED_RR.
[maybe I wasn't patient enough, but minutes seems long enough to wait]

None of these problems seem to be a 'big hairy deal' with real workloads..
but they are smudges on the otherwise perfect ;-) kernel.

> Under 2.4.2-ac18/19, the system doesn't show the error messages, but it
> still hangs after you kill the process. All keyboard input freezes
> eventually (can't switch vconsoles).
>
> I'm not sure if it helps, but the system I'm testing this on is a PIII
> 500mhz, with 196megs of ram, with swap disabled just so I know it's not
> device read/writes.

Running oom without swap is guaranteed doom.

-Mike

2001-03-13 13:12:23

by Rik van Riel

[permalink] [raw]
Subject: Re: system hang with "__alloc_page: 1-order allocation failed"

On Tue, 13 Mar 2001, Mike Galbraith wrote:

> (A workaround is to lower max_threads to 25% of memory.. works, but is
> really cheezy. OTOH, allowing half of memory to be allocated in task
> structs is a bit cheezy looking too. That means that these tasks
> can't be big enough to be doing real work.. no?)

If half of memory is allocated for task structures, we won't
even be able to allocate the minimum number of page table
pages needed for each task ...

For a "normal" task we'll need at least 1 page directory and
3 page table pages. We only have space for half when the
maximum number of task_struct's is allocated.

Maybe it would be good to lower the default threads-max to
about 10% or less of physical memory ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-13 18:40:21

by Manfred Spraul

[permalink] [raw]
Subject: Re: system hang with "__alloc_page: 1-order allocation failed"

// $Header$
// Kernel Version:
// VERSION = 2
// PATCHLEVEL = 4
// SUBLEVEL = 0
// EXTRAVERSION = -test12
--- 2.4/kernel/sysctl.c Sun Dec 17 18:04:04 2000
+++ build-2.4/kernel/sysctl.c Sat Dec 23 17:29:52 2000
@@ -45,6 +45,7 @@
extern int bdf_prm[], bdflush_min[], bdflush_max[];
extern int sysctl_overcommit_memory;
extern int max_threads;
+extern int threads_high, threads_low;
extern int nr_queued_signals, max_queued_signals;
extern int sysrq_enabled;

@@ -222,7 +223,8 @@
0644, NULL, &proc_dointvec},
#endif
{KERN_MAX_THREADS, "threads-max", &max_threads, sizeof(int),
- 0644, NULL, &proc_dointvec},
+ 0644, NULL, &proc_dointvec_minmax, &sysctl_intvec, NULL,
+ &threads_low, &threads_high},
{KERN_RANDOM, "random", NULL, 0, 0555, random_table},
{KERN_OVERFLOWUID, "overflowuid", &overflowuid, sizeof(int), 0644, NULL,
&proc_dointvec_minmax, &sysctl_intvec, NULL,
--- 2.4/kernel/fork.c Sun Dec 17 18:04:04 2000
+++ build-2.4/kernel/fork.c Sat Dec 23 16:59:47 2000
@@ -63,6 +63,11 @@
wq_write_unlock_irqrestore(&q->lock, flags);
}

+#define THREADS_HIGH 32000
+#define THREADS_LOW 16
+int threads_high = THREADS_HIGH;
+int threads_low = THREADS_LOW;
+
void __init fork_init(unsigned long mempages)
{
/*
@@ -71,53 +76,79 @@
* of memory.
*/
max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 2;
+ if(max_threads > threads_high)
+ max_threads = threads_high;

init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
}

-/* Protects next_safe and last_pid. */
-spinlock_t lastpid_lock = SPIN_LOCK_UNLOCKED;
+/*
+ * Reserve a few pid values for root, otherwise
+ * the reserved threads might not help him ;-)
+ */
+#define PIDS_FOR_ROOT 60

-static int get_pid(unsigned long flags)
+static int search_pid(int start, int* plimit)
{
- static int next_safe = PID_MAX;
+ int next_safe = *plimit;
struct task_struct *p;
+ int loop = 0;

- if (flags & CLONE_PID)
- return current->pid;
-
- spin_lock(&lastpid_lock);
- if((++last_pid) & 0xffff8000) {
- last_pid = 300; /* Skip daemons etc. */
- goto inside;
- }
- if(last_pid >= next_safe) {
-inside:
- next_safe = PID_MAX;
- read_lock(&tasklist_lock);
- repeat:
- for_each_task(p) {
- if(p->pid == last_pid ||
- p->pgrp == last_pid ||
- p->session == last_pid) {
- if(++last_pid >= next_safe) {
- if(last_pid & 0xffff8000)
- last_pid = 300;
- next_safe = PID_MAX;
+ if(start >= *plimit || start < 300) {
+ loop = 1;
+ start=300;
+ }
+repeat:
+ read_lock(&tasklist_lock);
+ for_each_task(p) {
+ if(p->pid == start ||
+ p->pgrp == start ||
+ p->session == start) {
+ if(++start >= next_safe) {
+ read_unlock(&tasklist_lock);
+ if(start >= *plimit) {
+ if(loop) {
+ next_safe=-1;
+ start=-1;
+ break;
+ }
+ loop=1;
+ start = 300;
}
+ next_safe = *plimit;
goto repeat;
}
- if(p->pid > last_pid && next_safe > p->pid)
- next_safe = p->pid;
- if(p->pgrp > last_pid && next_safe > p->pgrp)
- next_safe = p->pgrp;
- if(p->session > last_pid && next_safe > p->session)
- next_safe = p->session;
}
- read_unlock(&tasklist_lock);
+ if(p->pid > start && next_safe > p->pid)
+ next_safe = p->pid;
+ if(p->pgrp > start && next_safe > p->pgrp)
+ next_safe = p->pgrp;
+ if(p->session > start && next_safe > p->session)
+ next_safe = p->session;
+ }
+ read_unlock(&tasklist_lock);
+ *plimit=next_safe;
+ return start;
+}
+
+static int get_pid(unsigned long flags, int is_root)
+{
+ static int next_safe = PID_MAX-PIDS_FOR_ROOT;
+
+ if (flags & CLONE_PID)
+ return current->pid;
+
+ if(++last_pid < next_safe)
+ return last_pid;
+
+ next_safe = PID_MAX-PIDS_FOR_ROOT;
+ last_pid = search_pid(last_pid, &next_safe);
+
+ if(last_pid==-1 && is_root) {
+ int dummy = PID_MAX;
+ return search_pid(PID_MAX-PIDS_FOR_ROOT, &dummy);
}
- spin_unlock(&lastpid_lock);

return last_pid;
}
@@ -573,8 +604,13 @@
* the kernel lock so nr_threads can't
* increase under us (but it may decrease).
*/
- if (nr_threads >= max_threads)
- goto bad_fork_cleanup_count;
+ {
+ int limit = max_threads;
+ if(current->uid)
+ limit -= MIN_THREADS_LEFT_FOR_ROOT;
+ if (nr_threads >= limit)
+ goto bad_fork_cleanup_count;
+ }

get_exec_domain(p->exec_domain);

@@ -586,7 +622,6 @@
p->state = TASK_UNINTERRUPTIBLE;

copy_flags(clone_flags, p);
- p->pid = get_pid(clone_flags);

p->run_list.next = NULL;
p->run_list.prev = NULL;
@@ -662,6 +697,16 @@
current->counter >>= 1;
if (!current->counter)
current->need_resched = 1;
+
+ /* get the pid value last, we must atomically add the new
+ * thread to the task lists.
+ * Atomicity is guaranteed by lock_kernel().
+ */
+ p->pid = get_pid(clone_flags, !current->uid);
+ if(p->pid==-1) {
+ /* FIXME: cleanup for copy_thread? */
+ goto bad_fork_cleanup_sighand;
+ }

/*
* Ok, add it to the run-queues and make it


Attachments:
patch-pid (4.83 kB)

2001-03-13 21:29:47

by Chris Evans

[permalink] [raw]
Subject: Re: system hang with "__alloc_page: 1-order allocation failed"


On Tue, 13 Mar 2001, Manfred Spraul wrote:

> * bugfixes for get_pid(). This is the longest part of the patch, but
> it's only necessary if you have more than 10.000 threads running. If you
> have enough memory: launch a forkbomb. If ~ 32760 thread are running the
> kernel enters an endless loop in get_pid() (or around 11000 threads if
> they intentionally create additional sessions and process groups)

I thought (on Intel) there was a 4092 hard limit?

Chers
Chris

2001-03-13 22:46:51

by Manfred Spraul

[permalink] [raw]
Subject: Re: system hang with "__alloc_page: 1-order allocation failed"

From: "Chris Evans" <[email protected]>
>
> I thought (on Intel) there was a 4092 hard limit?
>
That's the 2.2 limit, it's gone.

The new limit is total memory and pid space. The pid's are intentionally
limited to 15 bits, the remaining bits are reserved.

In the worst case one running process can block 3 pid values (one for
the session, one for the process group, one for process id), thus
~11.000 running processes can exhaust the pid space.

--
Manfred