2004-06-10 04:17:19

by j.random.programmer

[permalink] [raw]
Subject: Threading behavior in 2.6.5 may be broken ?

Hi all:

I just installed Fedora Core 2 (2.6.5.x smp
kernel) on a Dual 1 Ghz P4 server with about
1.5 GB of RAM and about 1.4 GB of swap.

I am primarily a web/database developer, not
a C programmer so I am writing this email from
an end-user's perspective.

I have a program that tries to create as many
threads as possible. This program was written by me
for kicks/testing -- just to see what would happen.
I ssh into the server and run this program as root
under the sun 1.4.2 JDK.

On a 2.4.x kernel, from a Java JVM I could create
about 900 threads before the JVM crapped out with
a "cannot create more threads" type of error. Before
this point, I can create/run - say 700 threads - just
fine. This is good -- a clean failure at some point
and good behavior before then.

On this new kernel, the system gets totally wedged
when I run the same program and try to create
10,000 threads. Instead of getting a "cannot
create more threads" error, I now get an "out of
memory" error. Then the command line freezes in
the existing terminal window, ctrl+c does not work
(no matter how many times it's pressed), I cannot
launch another ssh session and cannot ssh into the
server again (although ping still works).

To recap:

[2.4.x]
700 threads --> fine
10,000 threads --> crap out at 900 something.

[2.6.5]
700 threads --> fine
10,000 threads --> system wedged totally.

I thought NPTL would create/run threads as if there
was no tomorrow ? So why do things seem to be worse
in 2.6.x ?

For right now, I'm going back to slackware with 2.4.x
but it would be great if someone fixed this problem
in future 2.6 kernels. [As as aside, I can create
as many threads as I want, say 20,000, without
any problems using the same program on my mac-osx
laptop].

I'd be happy to give a known kernel hackers on this
list root access to this box for the next few days
if anyone is interested in seeing/poking around
for themselves (email me if so desired).

Best regards,

--j
[email protected]

------------- The test program is shown below --------

/** Usage: java MaxThreads number-of-threads */
public class MaxThreads extends Thread
{
static int threadnum = 0;
public static void main(String args[])
{
if (args.length != 1) {
System.out.println("java MaxThreads
num_threads");
System.exit(1);
}
int n = Integer.parseInt(args[0]);
System.out.println("test " + n + " threads..");
for (int i=0 ; i < n; i++) {
Thread t = new MaxThreads();
t.start();
}
}

public void run() {
try {
currentThread().sleep(5000); //5 sec
System.out.println(
"Thread:" + threadnum++ + "..done");
}
catch (Exception e) { e.printStackTrace(); }
}
} //~class MaxThreads
--------------------------------------------------














__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/


2004-06-10 05:31:11

by j.random.programmer

[permalink] [raw]
Subject: Re: Threading behavior in 2.6.5 may be broken ?

Hi all:

This is a followup to my earlier post about
threading in 2.6.5

After more experimenting, I have found the following
behavior:

Specifying the per-thread maximum stack size to
the java JVM allows one to create more threads.
For example, specifying 128k of stack per thread
(max), allows me to create 6000 threads. By
default, I think 2 MB per thread is assigned for
that thread's stack space. I don't know if the 2MB
is the default in the JVM, glibc, or the kernel.

However, even if I am using 2 MB * 10,000 threads
(=20 GB of RAM), the machine _should_ give an
out of memory error but should _not_ get totally
wedged (so that ctrl-c doesn't work in that shell
to break out of the program or ssh is inoperative
so I cannot ssh into another shell and kill the jvm).

Best regards,

--j




__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

2004-06-10 18:31:51

by j.random.programmer

[permalink] [raw]
Subject: Re: Threading behavior in 2.6.5 may be broken ?

One last followup:

This problem has gone way in 2.6.6(fedora
2.6.6-1.424).

Whatever change was made between 2.6.5 and 2.6.6
fixed this. In 2.6.6, I can in fact create 10,000
threads and this does not hang the entire machine
(in 2.6.5, when the machine hung, even the console
was inoperative and I couldn't run vmstat or cat
/proc/meminfo to see what was going on).

Best regards,

--j




__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

2004-06-15 20:42:41

by Joe Korty

[permalink] [raw]
Subject: Re: Threading behavior in 2.6.5 may be broken ?

On Wed, Jun 09, 2004 at 09:16:41PM -0700, j.random.programmer wrote:
> Hi all:
>
> I just installed Fedora Core 2 (2.6.5.x smp
> kernel) on a Dual 1 Ghz P4 server with about
> 1.5 GB of RAM and about 1.4 GB of swap.
>
> I am primarily a web/database developer, not
> a C programmer so I am writing this email from
> an end-user's perspective.
>
> I have a program that tries to create as many
> threads as possible. This program was written by me
> for kicks/testing -- just to see what would happen.
> I ssh into the server and run this program as root
> under the sun 1.4.2 JDK.
>
> On a 2.4.x kernel, from a Java JVM I could create
> about 900 threads before the JVM crapped out with
> a "cannot create more threads" type of error. Before
> this point, I can create/run - say 700 threads - just
> fine. This is good -- a clean failure at some point
> and good behavior before then.
>
> On this new kernel, the system gets totally wedged
> when I run the same program and try to create
> 10,000 threads. Instead of getting a "cannot
> create more threads" error, I now get an "out of
> memory" error. Then the command line freezes in
> the existing terminal window, ctrl+c does not work
> (no matter how many times it's pressed), I cannot
> launch another ssh session and cannot ssh into the
> server again (although ping still works).
>
> To recap:
>
> [2.4.x]
> 700 threads --> fine
> 10,000 threads --> crap out at 900 something.
>
> [2.6.5]
> 700 threads --> fine
> 10,000 threads --> system wedged totally.

It's funny you should post this, I just encountered the
same thing a few days ago, and found the cause a few
minutes ago.

The problem is that 2.6.x restricts unpinned mmaps (mmaps
where the app does not supply the virtual address) to the
400000000-c000000000 range, while redhat allows all holes
in the user address space to satisfy such requests.

RedHat implements this by introducing an i386-specific
version of arch_get_unmapped_area(). This has not
yet made it to the official kernels. I have no idea
if it was ever submitted for consideration, or was
submitted and rejected.

Regards,
Joe