2000-11-30 07:45:43

by Arnaud Installe

[permalink] [raw]
Subject: high load & poor interactivity on fast thread creation

Hello,

When creating a lot of Java threads per second linux slows down to a
crawl. I don't think this happens on NT, probably because NT doesn't
create new threads as fast as Linux does.

Is there a way (setting ?) to solve this problem ? Rate-limit the number
of threads created ? The problem occurred on linux 2.2, IBM Java 1.1.8.

Thanks,

Arnaud

--
Arnaud Installe [email protected]

Look, we trade every day out there with hustlers, deal-makers, shysters,
con-men. That's the way businesses get started. That's the way this
country was built.
-- Hubert Allen


2000-11-30 15:17:13

by Ray Bryant

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

The IBM implementations of the Java language use native threads --
the result is that every time you do a Java thread creation, you
end up with a new cloned process. Now this should be pretty fast,
so I am surprised that it stalls like that. It is possible this
is a scheduler effect. Do you have a program example you can
share with us?

Also, it is a little old now (by Internet standards) but you
might take a look at this paper we did at the beginning of
the year:

http://www-4.ibm.com/software/developer/library/java2/index.html

Arnaud Installe wrote:
>
> Hello,
>
> When creating a lot of Java threads per second linux slows down to a
> crawl. I don't think this happens on NT, probably because NT doesn't
> create new threads as fast as Linux does.
>
> Is there a way (setting ?) to solve this problem ? Rate-limit the number
> of threads created ? The problem occurred on linux 2.2, IBM Java 1.1.8.
>

--

Best Regards,

Ray Bryant
IBM Linux Technology Center
[email protected]
512-838-8538
http://oss.software.ibm.com/developerworks/opensource/linux

We are Linux. Resistance is an indication that you missed the point.

"...the Right Thing is more important than the amount of flamage you need
to go through to get there"
--Eric S. Raymond

2000-11-30 16:45:06

by Alan

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

> When creating a lot of Java threads per second linux slows down to a
> crawl. I don't think this happens on NT, probably because NT doesn't
> create new threads as fast as Linux does.

Also probably the Java implementation on NT is not creating true threads for
each java thread as the IBM java seems to.

> Is there a way (setting ?) to solve this problem ? Rate-limit the number
> of threads created ? The problem occurred on linux 2.2, IBM Java 1.1.8.

The programming real answer is replace threads with state machines and all
your stuff runs faster. Thats often easy to say and hard to do.

2000-11-30 16:55:57

by Arnaud Installe

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

On Thu, Nov 30, 2000 at 08:47:49AM -0600, Ray Bryant wrote:
> The IBM implementations of the Java language use native threads --
> the result is that every time you do a Java thread creation, you
> end up with a new cloned process. Now this should be pretty fast,

Well, I think the problem is that it is *too* fast. :-/ What I think
happens is that a lot of threads get created at the same time, and they
all run a bit of initialization code. This way a lot of processes are in
the running state, so that the load average gets *very* high, which makes
the system very unresponsive.

Could this be correct ? Also, I haven't seen this happen with NT. Could
it be that Java on NT uses user-mode threading and creates threads much
more slowly, resulting in a lower load ?

> so I am surprised that it stalls like that. It is possible this
> is a scheduler effect. Do you have a program example you can
> share with us?

So I suppose it is a scheduler effect. Can this be solved on the kernel
side (a /proc/sys setting perhaps ?), or should a check be built-in into
the software that no more than a certain number of threads are created per
time unit ?

> Also, it is a little old now (by Internet standards) but you
> might take a look at this paper we did at the beginning of
> the year:
>
> http://www-4.ibm.com/software/developer/library/java2/index.html

I've already read this one. I'll have to re-read it to freshen up my
memory.

Arnaud

--
Arnaud Installe <[email protected]>

Absence is to love what wind is to fire. It extinguishes the small,
it enkindles the great.

2000-11-30 17:15:30

by James A Sutherland

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

On Thu, 30 Nov 2000, Arnaud Installe wrote:
> On Thu, Nov 30, 2000 at 08:47:49AM -0600, Ray Bryant wrote:
> > The IBM implementations of the Java language use native threads --
> > the result is that every time you do a Java thread creation, you
> > end up with a new cloned process. Now this should be pretty fast,
>
> Well, I think the problem is that it is *too* fast. :-/ What I think
> happens is that a lot of threads get created at the same time, and they
> all run a bit of initialization code. This way a lot of processes are in
> the running state, so that the load average gets *very* high, which makes
> the system very unresponsive.

Certainly sounds plausible; if the first process is able to create a lot of
runnable processes/threads in a single timeslice, the scheduler is then hit
with a huge queue to plough through once that timeslice ends.

Making calls to clone() force a schedule() might help here? That way, hopefully
each thread can run its initialisation code before the next one is created,
avoiding the problem?

> Could this be correct ? Also, I haven't seen this happen with NT. Could
> it be that Java on NT uses user-mode threading and creates threads much
> more slowly, resulting in a lower load ?

Perhaps; alternatively, if it schedules the new thread to run before resuming
the parent thread, each thread is initialised when created, rather than
building up a huge backlog, that would avoid the problem.

> > so I am surprised that it stalls like that. It is possible this
> > is a scheduler effect. Do you have a program example you can
> > share with us?
>
> So I suppose it is a scheduler effect. Can this be solved on the kernel
> side (a /proc/sys setting perhaps ?), or should a check be built-in into
> the software that no more than a certain number of threads are created per
> time unit ?

Either of those could help. Alternatively, have you tried inserting a yield
instruction in the Java code after creating each thread, to make sure the new
thread gets a chance to initialise?


James.

2000-11-30 22:47:20

by David Lang

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

try the 2.4 test kernels. I had a situation of poor performance with lots
of processes and saw a dramatic improvement with the 2.4 kernel.

David Lang

On Thu, 30 Nov 2000, Arnaud Installe wrote:

> Date: Thu, 30 Nov 2000 08:14:43 +0100
> From: Arnaud Installe <[email protected]>
> Reply-To: Arnaud Installe <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Subject: high load & poor interactivity on fast thread creation
>
> Hello,
>
> When creating a lot of Java threads per second linux slows down to a
> crawl. I don't think this happens on NT, probably because NT doesn't
> create new threads as fast as Linux does.
>
> Is there a way (setting ?) to solve this problem ? Rate-limit the number
> of threads created ? The problem occurred on linux 2.2, IBM Java 1.1.8.
>
> Thanks,
>
> Arnaud
>
> --
> Arnaud Installe [email protected]
>
> Look, we trade every day out there with hustlers, deal-makers, shysters,
> con-men. That's the way businesses get started. That's the way this
> country was built.
> -- Hubert Allen
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

2000-12-01 10:19:10

by Arnaud Installe

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

On Thu, Nov 30, 2000 at 03:00:10PM -0800, David Lang wrote:
> try the 2.4 test kernels. I had a situation of poor performance with lots
> of processes and saw a dramatic improvement with the 2.4 kernel.

So what load average should I expect Linux versions 2.2 and 2.4 to perform
well under ? I'm wondering what would be the best way to solve this
problem: limit the number of processes created during a certain time span;
check if the load average isn't too high before creating a new thread (and
go to sleep if it isn't); or something else ?

Thanks very much BTW ! The list has always been very helpful. :-)

Arnaud

> > When creating a lot of Java threads per second linux slows down to a
> > crawl. I don't think this happens on NT, probably because NT doesn't
> > create new threads as fast as Linux does.
> >
> > Is there a way (setting ?) to solve this problem ? Rate-limit the number
> > of threads created ? The problem occurred on linux 2.2, IBM Java 1.1.8.

--
Arnaud Installe <[email protected]>

Man has never reconciled himself to the ten commandments.

2000-12-01 21:07:16

by David Lang

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

I don't have really good numbers for either, but I can say that I was
really impressed with this firewall yesterday. there were other problems
in the system that caused things to clog up, but a 2.4 AMD950 PC133 ram
system was useable (slow, but useable) with 4000+ processes and a loadave
of > 300

David Lang

On Fri, 1 Dec 2000, Arnaud Installe wrote:

> Date: Fri, 1 Dec 2000 10:47:45 +0100
> From: Arnaud Installe <[email protected]>
> To: David Lang <[email protected]>
> Cc: [email protected]
> Subject: Re: high load & poor interactivity on fast thread creation
>
> On Thu, Nov 30, 2000 at 03:00:10PM -0800, David Lang wrote:
> > try the 2.4 test kernels. I had a situation of poor performance with lots
> > of processes and saw a dramatic improvement with the 2.4 kernel.
>
> So what load average should I expect Linux versions 2.2 and 2.4 to perform
> well under ? I'm wondering what would be the best way to solve this
> problem: limit the number of processes created during a certain time span;
> check if the load average isn't too high before creating a new thread (and
> go to sleep if it isn't); or something else ?
>
> Thanks very much BTW ! The list has always been very helpful. :-)
>
> Arnaud
>
> > > When creating a lot of Java threads per second linux slows down to a
> > > crawl. I don't think this happens on NT, probably because NT doesn't
> > > create new threads as fast as Linux does.
> > >
> > > Is there a way (setting ?) to solve this problem ? Rate-limit the number
> > > of threads created ? The problem occurred on linux 2.2, IBM Java 1.1.8.
>
> --
> Arnaud Installe <[email protected]>
>
> Man has never reconciled himself to the ten commandments.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

2000-12-27 11:33:02

by Ruth Ivimey-Cook

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

At 04:11 PM 11/30/00, Arnaud Installe wrote:
>Could this be correct ? Also, I haven't seen this happen with NT. Could
>it be that Java on NT uses user-mode threading and creates threads much
>more slowly, resulting in a lower load ?

No. Java on NT uses proper NT threads. However, a thread on NT is a rather
different beast to a cloned thread on Linux. I don't know whether the
differences are important.

Ruth
--

Ruth
Ivimey-Cook [email protected]
Technical
Author, ARM Ltd [email protected]

2000-12-27 17:42:06

by Michael Rothwell

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

Ruth Ivimey-Cook wrote:
> No. Java on NT uses proper NT threads. However, a thread on NT is a rather
> different beast to a cloned thread on Linux. I don't know whether the
> differences are important.

On Linux, threads are processes. On NT, processes are distinct from
threads, and usually have at least one thread within them. On Linux, the
process and the initial (sole) thread are the same thing. On NT, they
are distinct.

CreateProcess() creates a process and its initial thread (because that
is what you would expect to happen). On Linux, clone() creates a new
process; sometimes that process can be construed as a thread, because of
shared memory, file descriptors, or whatever. On NT, because a process
contains threads and is not itself a thread, you can use CreateProcess()
to create a process whose only thread is suspended. I don't know if that
kind of thing is possible with clone(). You might have to clone() then
suspend.

You use CreateThread() to create more threads inside an existing
process. CreateRemoteThread() creates new threads inside a different
process. A process is more of a virtual memory environment and container
for threads than a context of execution itself. With clone() on Linux
you can mimic a lot of the behavior of NT processes and threads, but not
all of it.

One notable difference between Linux and NT threads and processes is
that it is more expensive to create new processes on NT than on Linux,
and on NT thread creation is cheaper than process creation. Typically
Windows programs use multiple threads rather than multiple processes,
whereas on Unix the reverse is true.

http://www.byte.com/art/9511/sec11/art3.htm
http://msdn.microsoft.com/library/winresource/dnwinnt/S7FC7.HTM
http://msdn.microsoft.com/library/psdk/winbase/prothred_9dpv.htm
http://msdn.microsoft.com/library/psdk/winbase/prothred_4084.htm

2000-12-27 17:55:48

by Gregory Maxwell

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

On Wed, Dec 27, 2000 at 12:11:04PM -0500, Michael Rothwell wrote:
[snip]
> One notable difference between Linux and NT threads and processes is
> that it is more expensive to create new processes on NT than on Linux,
> and on NT thread creation is cheaper than process creation. Typically
> Windows programs use multiple threads rather than multiple processes,
> whereas on Unix the reverse is true.

This is the meaty difference. Under Linux, full *process* operations
are faster then NT *thread* operations. The Linux 'threads' (lightweight
processes) are somewhat faster then unlightweight processes, but nowhere
near the magnitude of difference that NT experiences.

Because of this, lightweight processes are used differently under Linux: They
are treated just like processes and can share variable amounts of state with
other processes.

In Linux, you use threads when it makes sense to code with threads. You can
share as little or as much makes sense with your design. You almost never use
threads for performance reasons, because regular processes are so fast that
it seldom makes sense to use threads for performance (they can be faster but
usually the additional development/debugging difficulty makes it a non-issue).

In Windows NT, you MUST use threads for decent performance in many places
where processes (or other different semi-lightweight structures) might make
more sense. Threads are the largest construction capable of really good
performance, so you don't have the flexibility to chose what you share:
What is shared is not a programming design decision but an OS performance
decision.

2000-12-27 18:03:19

by Larry McVoy

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

Great post. Rob Pike said it best, if you are trying to distill it down
to one sentence, when he said

"If you think you need threads, you processes are too fat"

Stevel Kleiman had a somewhat more cryptic comment (somewhat is an
understatement, it took me years to let it sink in) in reference to
a popular DB technology from a west coast University:

"They didn't use mmap"

I don't have anything as catchy, but for years I've felt that plain old
processes combined with mmap give you all the sharing you need. You do
pay a price for not sharing TLB entries if the OS is stupid (Linux' is
not).

On Wed, Dec 27, 2000 at 12:25:09PM -0500, Gregory Maxwell wrote:
> On Wed, Dec 27, 2000 at 12:11:04PM -0500, Michael Rothwell wrote:
> [snip]
> > One notable difference between Linux and NT threads and processes is
> > that it is more expensive to create new processes on NT than on Linux,
> > and on NT thread creation is cheaper than process creation. Typically
> > Windows programs use multiple threads rather than multiple processes,
> > whereas on Unix the reverse is true.
>
> This is the meaty difference. Under Linux, full *process* operations
> are faster then NT *thread* operations. The Linux 'threads' (lightweight
> processes) are somewhat faster then unlightweight processes, but nowhere
> near the magnitude of difference that NT experiences.
>
> Because of this, lightweight processes are used differently under Linux: They
> are treated just like processes and can share variable amounts of state with
> other processes.
>
> In Linux, you use threads when it makes sense to code with threads. You can
> share as little or as much makes sense with your design. You almost never use
> threads for performance reasons, because regular processes are so fast that
> it seldom makes sense to use threads for performance (they can be faster but
> usually the additional development/debugging difficulty makes it a non-issue).
>
> In Windows NT, you MUST use threads for decent performance in many places
> where processes (or other different semi-lightweight structures) might make
> more sense. Threads are the largest construction capable of really good
> performance, so you don't have the flexibility to chose what you share:
> What is shared is not a programming design decision but an OS performance
> decision.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/

--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2000-12-27 20:14:43

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: high load & poor interactivity on fast thread creation

On Wed, Dec 27, 2000 at 09:32:36AM -0800, Larry McVoy wrote:
> [..] You do
> pay a price for not sharing TLB entries if the OS is stupid (Linux' is
> not).

Even assuming all segments are attached at the same virtual address on all MM
(this can be enforced with MAP_FIXED of course), we can't use the same tlb
entries for accessing the same shared sement from different MM. That's not even
possible on hardware with address space numbers (on x86 it's obvious it's not
possible even with future x86 chips that can tag the TLB entries
with the phisical address of the pgd to skip the full tlb flush during
switch_mm).

I think the main point of using threads instead of shared mappings is
performance.

Andrea