Dear List(s),
as part of my project I need to run a very high number of processes/threads on a
linux machine. Right now I have a Dual-PIII 1.4G w/ 8GB RAM -- I am running
4000 processes w/ 2-3 threads each totaling in a process count of 15000+
processes (since Linux doesn't really distinguish between threads and
processes...).
Once I pass the 10000 (+/-) pocesses load increases drastically (on startup,
although it returns to normal), however the system time (on one processor)
reaches for 54% (12061 procs) while the only non sleeping process is top -- the
system is basically doing nothing (except scheduling the "nothing" which
consumes significant system time).
Is there anything I can do to reduce that system load/time? (I haven't been
able to exactly define the "line" but it definitly gets worse the more processes
need to be handled.)
Does any of the patchsets address this particular problem?
BTW: The processes are all alike...
Thanks for you help!
Immanuel
forgot the kernel version (2.4.20aa1)...
Till Immanuel Patzschke wrote:
> Dear List(s),
>
> as part of my project I need to run a very high number of processes/threads on a
> linux machine. Right now I have a Dual-PIII 1.4G w/ 8GB RAM -- I am running
> 4000 processes w/ 2-3 threads each totaling in a process count of 15000+
> processes (since Linux doesn't really distinguish between threads and
> processes...).
> Once I pass the 10000 (+/-) pocesses load increases drastically (on startup,
> although it returns to normal), however the system time (on one processor)
> reaches for 54% (12061 procs) while the only non sleeping process is top -- the
> system is basically doing nothing (except scheduling the "nothing" which
> consumes significant system time).
> Is there anything I can do to reduce that system load/time? (I haven't been
> able to exactly define the "line" but it definitly gets worse the more processes
> need to be handled.)
> Does any of the patchsets address this particular problem?
> BTW: The processes are all alike...
>
> Thanks for you help!
>
> Immanuel
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> as part of my project I need to run a very high number of processes/threads on a
> linux machine. Right now I have a Dual-PIII 1.4G w/ 8GB RAM -- I am running
> 4000 processes w/ 2-3 threads each totaling in a process count of 15000+
> processes (since Linux doesn't really distinguish between threads and
> processes...).
> Once I pass the 10000 (+/-) pocesses load increases drastically (on startup,
> although it returns to normal), however the system time (on one processor)
> reaches for 54% (12061 procs) while the only non sleeping process is top -- the
> system is basically doing nothing (except scheduling the "nothing" which
> consumes significant system time).
> Is there anything I can do to reduce that system load/time? (I haven't been
> able to exactly define the "line" but it definitly gets worse the more processes
> need to be handled.)
You don't even specify what kernel you're using ...
> Does any of the patchsets address this particular problem?
Read the linux-kernel archives.
M.
On Wed, Dec 18, 2002 at 04:46:15PM -0800, Till Immanuel Patzschke wrote:
> Dear List(s),
>
> as part of my project I need to run a very high number of processes/threads on a
> linux machine. Right now I have a Dual-PIII 1.4G w/ 8GB RAM -- I am running
> 4000 processes w/ 2-3 threads each totaling in a process count of 15000+
> processes (since Linux doesn't really distinguish between threads and
> processes...).
> Once I pass the 10000 (+/-) pocesses load increases drastically (on startup,
> although it returns to normal), however the system time (on one processor)
> reaches for 54% (12061 procs) while the only non sleeping process is top -- the
> system is basically doing nothing (except scheduling the "nothing" which
> consumes significant system time).
> Is there anything I can do to reduce that system load/time? (I haven't been
> able to exactly define the "line" but it definitly gets worse the more processes
> need to be handled.)
Redesign your program to not do silly things like this.
Unless you have hardware with 5000 or more CPUs...
Jeff
>
> forgot the kernel version (2.4.20aa1)...
You need the O(1) scheduler; not sure if aa has it or not; if not, lots of
processes will suck your machine. I think -ac has the O(1) scheduler, or try
2.5. The old scheduler is pretty cool but not as scalable as the new one.
If it has it ... well, I have no idea - maybe Robert Love would know.
Inaky Perez-Gonzalez -- Not speaking for Intel - opinions are my own [or my
fault]
On Wed, Dec 18, 2002 at 04:46:15PM -0800, Till Immanuel Patzschke wrote:
> as part of my project I need to run a very high number of
> processes/threads on a linux machine. Right now I have a Dual-PIII
> 1.4G w/ 8GB RAM -- I am running 4000 processes w/ 2-3 threads each
> totaling in a process count of 15000+ processes (since Linux doesn't
> really distinguish between threads and processes...).
You're for the most part SOL unless you can either hack the support or
can wait for it to be finished. More details below.
On Wed, Dec 18, 2002 at 04:46:15PM -0800, Till Immanuel Patzschke wrote:
> Once I pass the 10000 (+/-) pocesses load increases drastically (on
> startup, although it returns to normal), however the system time (on
> one processor) reaches for 54% (12061 procs) while the only non
> sleeping process is top -- the system is basically doing nothing
> (except scheduling the "nothing" which
> consumes significant system time).
> Is there anything I can do to reduce that system load/time? (I
> haven't been able to exactly define the "line" but it definitly gets
> worse the more processes need to be handled.)
> Does any of the patchsets address this particular problem?
> BTW: The processes are all alike...
> Thanks for you help!
Try 2.5.52-mm1 + 2.5.52-wli-1. The -wli bits are orthogonal but they do
a small bit to reduce the cpu inefficiencies of many task loads.
-wli is actually maintenance and follow-through on various early 2.5
promises.
proc_pid_readdir() is the cpu culprit, which I have not yet addressed.
You are also going to have severe memory management problems due to the
number of L2 and L3 pagetables created as well as kernel stacks.
2.5.52-mm1 will have 2 of 3 possible things that can be done about L3
pagetables. L2 pagetables limit you to 64K processes with more practical
limits around 16K. As 16K is feasible here, you are running the wrong
kernel version(s).
Bill
On Wed, Dec 18, 2002 at 04:53:45PM -0800, Till Immanuel Patzschke wrote:
> forgot the kernel version (2.4.20aa1)...
2.4.20aa1 is missing some of the infrastructure to reduce the cpu
consumption under high process count loads, but that's not going to
help you anyway. 150K processes is not going to be feasible in the
immediate future (months or longer away) so you'll have to figure out
how to take that into account.
Bill
On Wed, 2002-12-18 at 20:04, Perez-Gonzalez, Inaky wrote:
> >
> > forgot the kernel version (2.4.20aa1)...
>
> You need the O(1) scheduler; not sure if aa has it or not; if not, lots of
> processes will suck your machine. I think -ac has the O(1) scheduler, or try
> 2.5. The old scheduler is pretty cool but not as scalable as the new one.
>
> If it has it ... well, I have no idea - maybe Robert Love would
> know.
2.4-aa has the O(1) scheduler, yes.
I think 15,000 processes may always suck, though :)
Robert Love
On Wed, Dec 18, 2002 at 05:12:41PM -0800, David Lang wrote:
> also top is very inefficant with large numbers of processes. use vmstat
> or cat out the files in /proc to get the info more efficiantly (it won't
> get you per process info, but it son't cause the interferance with your
> desired load that top gives you.)
It's mostly just the fact top(1) doesn't scan /proc/ incrementally and
that proc_pid_readdir() is quadratic in the number of tasks.
Bill
also top is very inefficant with large numbers of processes. use vmstat
or cat out the files in /proc to get the info more efficiantly (it won't
get you per process info, but it son't cause the interferance with your
desired load that top gives you.)
David Lang
On Wed, 18 Dec 2002, William Lee Irwin
III wrote:
> Date: Wed, 18 Dec 2002 17:15:41 -0800
> From: William Lee Irwin III <[email protected]>
> To: Till Immanuel Patzschke <[email protected]>
> Cc: lse-tech <[email protected]>,
> "[email protected]" <[email protected]>
> Subject: Re: 15000+ processes -- poor performance ?!
>
> On Wed, Dec 18, 2002 at 04:53:45PM -0800, Till Immanuel Patzschke wrote:
> > forgot the kernel version (2.4.20aa1)...
>
> 2.4.20aa1 is missing some of the infrastructure to reduce the cpu
> consumption under high process count loads, but that's not going to
> help you anyway. 150K processes is not going to be feasible in the
> immediate future (months or longer away) so you'll have to figure out
> how to take that into account.
>
>
> Bill
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Wed, Dec 18, 2002 at 04:53:45PM -0800, Till Immanuel Patzschke wrote:
>> forgot the kernel version (2.4.20aa1)...
On Wed, Dec 18, 2002 at 05:15:41PM -0800, William Lee Irwin III wrote:
> 2.4.20aa1 is missing some of the infrastructure to reduce the cpu
> consumption under high process count loads, but that's not going to
> help you anyway. 150K processes is not going to be feasible in the
> immediate future (months or longer away) so you'll have to figure out
> how to take that into account.
Er, sorry, on a brief rereading my eyes deceived me and I thought an
extra zero got in there. 15K is fine on 2.5 + patches.
Bill
Ok, I wasn't sure of the cause, but I've seen this as far back as 2.2 I
had a machine trying to run 2000 processes under 2.2 and 2.4.0 (after
upping the 2.2 kernel limit) and top would cost me ~40% throughput on the
machine (while claiming it was useing ~5% of the CPU)
David Lang
On Wed, 18 Dec 2002, William Lee Irwin III wrote:
> Date: Wed, 18 Dec 2002 17:25:49 -0800
> From: William Lee Irwin III <[email protected]>
> To: David Lang <[email protected]>
> Cc: Till Immanuel Patzschke <[email protected]>,
> lse-tech <[email protected]>,
> "[email protected]" <[email protected]>
> Subject: Re: 15000+ processes -- poor performance ?!
>
> On Wed, Dec 18, 2002 at 05:12:41PM -0800, David Lang wrote:
> > also top is very inefficant with large numbers of processes. use vmstat
> > or cat out the files in /proc to get the info more efficiantly (it won't
> > get you per process info, but it son't cause the interferance with your
> > desired load that top gives you.)
>
> It's mostly just the fact top(1) doesn't scan /proc/ incrementally and
> that proc_pid_readdir() is quadratic in the number of tasks.
>
>
> Bill
>
On Wed, Dec 18, 2002 at 05:20:02PM -0800, David Lang wrote:
> Ok, I wasn't sure of the cause, but I've seen this as far back as 2.2 I
> had a machine trying to run 2000 processes under 2.2 and 2.4.0 (after
> upping the 2.2 kernel limit) and top would cost me ~40% throughput on the
> machine (while claiming it was useing ~5% of the CPU)
> David Lang
It wasn't really lying to you. The issue is that the kernel samples at
regular intervals to avoid timer reprogramming overhead. Now top(1) is
isochronous in nature as it's trying to periodically refresh, and so
it runs in lockstep with the clock interrupt, and the kernel hands back
bad numbers to top(1).
Bill
On Wed, 2002-12-18 at 20:20, David Lang wrote:
> Ok, I wasn't sure of the cause, but I've seen this as far back as 2.2 I
> had a machine trying to run 2000 processes under 2.2 and 2.4.0 (after
> upping the 2.2 kernel limit) and top would cost me ~40% throughput on the
> machine (while claiming it was useing ~5% of the CPU)
Yah a lot of it is like William is saying... you just do not want to
read multiple files for each process in /proc when you have a kajillion
processes, and that is what top does. Over and over.
Work has gone into 2.5 to make this a lot better.. If you use threads
with NPTL in 2.5, a lot of this is resolved, since the sub-threads will
not show up in as /proc/#/ entries.
Robert Love
On Thu, 2002-12-19 at 01:04, Perez-Gonzalez, Inaky wrote:
>
> >
> > forgot the kernel version (2.4.20aa1)...
>
> You need the O(1) scheduler; not sure if aa has it or not; if not, lots of
> processes will suck your machine. I think -ac has the O(1) scheduler, or try
> 2.5. The old scheduler is pretty cool but not as scalable as the new one.
>
> If it has it ... well, I have no idea - maybe Robert Love would know.
He's running the -aa kernel, which has all the right bits for this too.
In fact in some ways for very large memory boxes its probably the better
variant
In my case I will still be running thousands of processes, so I have to
just teach everyone not to use top instead.
David Lang
On 18 Dec 2002, Robert Love wrote:
> Date: 18 Dec 2002 20:42:58 -0500
> From: Robert Love <[email protected]>
> To: David Lang <[email protected]>
> Cc: William Lee Irwin III <[email protected]>,
> Till Immanuel Patzschke <[email protected]>,
> lse-tech <[email protected]>,
> "[email protected]" <[email protected]>
> Subject: Re: 15000+ processes -- poor performance ?!
>
> On Wed, 2002-12-18 at 20:20, David Lang wrote:
> > Ok, I wasn't sure of the cause, but I've seen this as far back as 2.2 I
> > had a machine trying to run 2000 processes under 2.2 and 2.4.0 (after
> > upping the 2.2 kernel limit) and top would cost me ~40% throughput on the
> > machine (while claiming it was useing ~5% of the CPU)
>
> Yah a lot of it is like William is saying... you just do not want to
> read multiple files for each process in /proc when you have a kajillion
> processes, and that is what top does. Over and over.
>
> Work has gone into 2.5 to make this a lot better.. If you use threads
> with NPTL in 2.5, a lot of this is resolved, since the sub-threads will
> not show up in as /proc/#/ entries.
>
> Robert Love
>
On Thu, 19 Dec 2002, Alan Cox wrote:
> He's running the -aa kernel, which has all the right bits for this too.
> In fact in some ways for very large memory boxes its probably the better
> variant
If you're willing to merge a patch to take -ac up to rmap15b
I'll integrate some large memory stuff into my tree for 15c
and 2.4-ac should be able to handle large boxes too within a
month or so.
(keeping the speed of merging slow, deliberately)
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://guru.conectiva.com/
Current spamtrap: <a href=mailto:"[email protected]">[email protected]</a>
On Thu, 2002-12-19 at 01:04, Perez-Gonzalez, Inaky wrote:
>> If it has it ... well, I have no idea - maybe Robert Love would know.
On Thu, Dec 19, 2002 at 02:31:28AM +0000, Alan Cox wrote:
> He's running the -aa kernel, which has all the right bits for this too.
> In fact in some ways for very large memory boxes its probably the better
> variant
In my experience the most critical issues running 16K processes are:
(1) the highmem footprint of the pte's is significant
(2) the lowmem footprint of pmd's
and most of the rest is in the noise. It's probably a bad idea to run
top(1) or perhaps even mount /proc/ at all until top itself,
proc_pid_readdir(), and the tasklist_lock are all fixed.
Pretty much all he needs to "stay alive" is highpte of some flavor or
another. Performance etc. is addressed somewhat more by 2.5.x than -aa,
at least in the context of not degrading with this kind of multitasking.
i.e. shpte and pidhash. I've been randomly shooting down do_each_thread()
and for_each_process() loops in -wli, which is why I recommended it.
Bill
On Wed, Dec 18, 2002 at 05:44:46PM -0800, David Lang wrote:
> In my case I will still be running thousands of processes, so I have to
> just teach everyone not to use top instead.
> David Lang
Well, a better solution would be a userspace free of /proc/ dependency.
Or actually fixing the kernel. proc_pid_readdir() wants an efficiently
indexable linear list, e.g. TAOCP's 6.2.3 "Linear List Representation".
At that point its expense is proportional to the buffer size and
"seeking" about the list as it is wont to do is O(lg(processes)).
Bill
On 18 December 2002 22:46, Till Immanuel Patzschke wrote:
> Dear List(s),
>
> as part of my project I need to run a very high number of
> processes/threads on a linux machine. Right now I have a Dual-PIII
> 1.4G w/ 8GB RAM -- I am running 4000 processes w/ 2-3 threads each
> totaling in a process count of 15000+ processes (since Linux doesn't
> really distinguish between threads and processes...).
BTW, can you say _what_ are you trying to do?
> Once I pass the 10000 (+/-) pocesses load increases drastically (on
> startup, although it returns to normal), however the system time (on
> one processor) reaches for 54% (12061 procs) while the only non
> sleeping process is top -- the system is basically doing nothing
> (except scheduling the "nothing" which consumes significant system
> time).
> Is there anything I can do to reduce that system load/time? (I
> haven't been able to exactly define the "line" but it definitly gets
> worse the more processes need to be handled.)
> Does any of the patchsets address this particular problem?
> BTW: The processes are all alike...
You need to collect memory info (especially lowmem and highmem situation)
and maybe profile your kernel to find out where does it spend that time
doing "nothing".
BTW, your .config?
--
vda
On 19 December 2002 00:05, William Lee Irwin III wrote:
> On Wed, Dec 18, 2002 at 05:44:46PM -0800, David Lang wrote:
> > In my case I will still be running thousands of processes, so I
> > have to just teach everyone not to use top instead.
> > David Lang
>
> Well, a better solution would be a userspace free of /proc/
> dependency.
>
> Or actually fixing the kernel. proc_pid_readdir() wants an
> efficiently indexable linear list, e.g. TAOCP's 6.2.3 "Linear List
> Representation". At that point its expense is proportional to the
> buffer size and "seeking" about the list as it is wont to do is
> O(lg(processes)).
A short-time solution: run top d 30 to make it refresh only every 30 seconds.
This will greatly reduce top's own load skew.
--
vda
On 19 December 2002 00:05, William Lee Irwin III wrote:
>> Well, a better solution would be a userspace free of /proc/
>> dependency.
>> Or actually fixing the kernel. proc_pid_readdir() wants an
>> efficiently indexable linear list, e.g. TAOCP's 6.2.3 "Linear List
>> Representation". At that point its expense is proportional to the
>> buffer size and "seeking" about the list as it is wont to do is
>> O(lg(processes)).
On Thu, Dec 19, 2002 at 01:05:03PM -0200, Denis Vlasenko wrote:
> A short-time solution: run top d 30 to make it refresh only every 30 seconds.
> This will greatly reduce top's own load skew.
As userspace solutions go your suggestions is just as good. The kernel
still needs to get its act together and with some urgency.
Bill
On 19 December 2002 08:27, William Lee Irwin III wrote:
> On 19 December 2002 00:05, William Lee Irwin III wrote:
> >> Well, a better solution would be a userspace free of /proc/
> >> dependency.
> >> Or actually fixing the kernel. proc_pid_readdir() wants an
> >> efficiently indexable linear list, e.g. TAOCP's 6.2.3 "Linear List
> >> Representation". At that point its expense is proportional to the
> >> buffer size and "seeking" about the list as it is wont to do is
> >> O(lg(processes)).
>
> On Thu, Dec 19, 2002 at 01:05:03PM -0200, Denis Vlasenko wrote:
> > A short-time solution: run top d 30 to make it refresh only every
> > 30 seconds. This will greatly reduce top's own load skew.
>
> As userspace solutions go your suggestions is just as good. The
> kernel still needs to get its act together and with some urgency.
That was just a suggestion as to how to get realistic picture
of system load for Till Immanuel Patzschke <[email protected]>.
--
vda
>>>>> William Lee Irwin (WLI) writes:
WLI> On 19 December 2002 00:05, William Lee Irwin III wrote:
>>> Well, a better solution would be a userspace free of /proc/
>>> dependency. Or actually fixing the kernel. proc_pid_readdir()
>>> wants an efficiently indexable linear list, e.g. TAOCP's 6.2.3
>>> "Linear List Representation". At that point its expense is
>>> proportional to the buffer size and "seeking" about the list as
>>> it is wont to do is O(lg(processes)).
WLI> On Thu, Dec 19, 2002 at 01:05:03PM -0200, Denis Vlasenko wrote:
>> A short-time solution: run top d 30 to make it refresh only every
>> 30 seconds. This will greatly reduce top's own load skew.
WLI> As userspace solutions go your suggestions is just as good. The
WLI> kernel still needs to get its act together and with some
WLI> urgency.
what about retreiving info from /proc/kmem or something like? just to
avoid binary -> text(proc) -> binary
William Lee Irwin (WLI) writes:
WLI> As userspace solutions go your suggestions is just as good. The
WLI> kernel still needs to get its act together and with some
WLI> urgency.
On Thu, Dec 19, 2002 at 01:37:30PM +0300, Alex Tomas wrote:
> what about retreiving info from /proc/kmem or something like? just to
> avoid binary -> text(proc) -> binary
That would also be an excellent userspace solution to this local DoS.
Bill
> WLI> As userspace solutions go your suggestions is just as good. The
> WLI> kernel still needs to get its act together and with some
> WLI> urgency.
>
> what about retreiving info from /proc/kmem or something like? just to
> avoid binary -> text(proc) -> binary
The binary <-> text translation problem is less of an issue than all the
syscall traffic, dcache hits, etc. Search linux-kernel archives for a
recent thread entitiled "ps performance sucks" or something similar.
M.