2015-07-02 08:50:24

by Ulrich Windl

[permalink] [raw]
Subject: Lower bound 0.05 on 15-Minute load?

Hi!

I'm not subscribed, so plese CC: me for your replies.

When graphing the CPU load, I noticed that the 15-minute average never drops below 0.05, while the 5-minute load and the 1-minute load does
(Kernel 3.0.101-0.47.52-xen of SLES11 on x86_64).

Ist that a known bug? Interactive call of "uptime" seems to confirm my suspect:
windl> uptime
10:41am up 23 days 18:49, 1 user, load average: 0.08, 0.05, 0.05
windl> uptime
10:48am up 23 days 18:56, 1 user, load average: 0.00, 0.04, 0.05
windl> cat /proc/loadavg
0.00 0.04 0.05 1/108 9704

I'll attach a sample graph.

Regards,
Ulrich



Attachments:
(No filename) (596.00 B)
Load.png (45.80 kB)
Download all attachments

2015-07-02 09:27:29

by Martin Steigerwald

[permalink] [raw]
Subject: Re: Lower bound 0.05 on 15-Minute load?

On Thursday 02 July 2015 10:50:13 Ulrich Windl wrote:
> Hi!

Hi Ulrich,

> I'm not subscribed, so plese CC: me for your replies.
>
> When graphing the CPU load, I noticed that the 15-minute average never
> drops below 0.05, while the 5-minute load and the 1-minute load does
> (Kernel 3.0.101-0.47.52-xen of SLES11 on x86_64).

Load average is *NOT* the CPU load although this is a very common
misconception.

Load average indicates the amount of processes that are waiting to be
scheduled / running (which is CPU saturation) *and* those that are waiting
uninterruptable. You can have a high load average without much CPU
utilizitation, for example by running 20 find processes on a /home on NFS.

A high load can be CPU-bound but it doesn't need to be.

So a high load only can indicate that things are running more slowly, but
not why, or well the why can be at least two things and does not need to be
CPU.

Also the load is normalized to CPU cores.

> Ist that a known bug? Interactive call of "uptime" seems to confirm my
> suspect: windl> uptime
> 10:41am up 23 days 18:49, 1 user, load average: 0.08, 0.05, 0.05
> windl> uptime
> 10:48am up 23 days 18:56, 1 user, load average: 0.00, 0.04, 0.05
> windl> cat /proc/loadavg
> 0.00 0.04 0.05 1/108 9704
>
> I'll attach a sample graph.

Why should it be? As you can see in the graph you have higher spikes with 1-
minute average. As its just a average about one minute it more easily drops
below 0,05. But the 5 minute and 15 minute avergage need more time to drop
lower, so for it to become lower, you need longer times without spikes in
load average.

So its natural you get "flatter" curves with higher average. Average easily
hide things like spikes.

Thanks,
--
Martin

2015-07-03 06:12:53

by Ulrich Windl

[permalink] [raw]
Subject: Antw: Re: Lower bound 0.05 on 15-Minute load?

>>> Martin Steigerwald <[email protected]> schrieb am 02.07.2015 um 11:26 in
Nachricht <1479160.a5Vb4cJSSF@merkaba>:
> On Thursday 02 July 2015 10:50:13 Ulrich Windl wrote:
>> Hi!
>
> Hi Ulrich,
>
>> I'm not subscribed, so plese CC: me for your replies.
>>
>> When graphing the CPU load, I noticed that the 15-minute average never
>> drops below 0.05, while the 5-minute load and the 1-minute load does
>> (Kernel 3.0.101-0.47.52-xen of SLES11 on x86_64).
>
> Load average is *NOT* the CPU load although this is a very common
> misconception.

I think the correlation of 1-min, 5-min and 15-min values is independent of the actual meaning of the value.

>
> Load average indicates the amount of processes that are waiting to be
> scheduled / running (which is CPU saturation) *and* those that are waiting
> uninterruptable. You can have a high load average without much CPU
> utilizitation, for example by running 20 find processes on a /home on NFS.
>
> A high load can be CPU-bound but it doesn't need to be.

I knew.

>
> So a high load only can indicate that things are running more slowly, but
> not why, or well the why can be at least two things and does not need to be
> CPU.

How is that related to my complaint/question?

>
> Also the load is normalized to CPU cores.

Actually I don't think so, but that's also not related to the issue I reported. In know that HP-UX load was the average load of every CPU, while for Linux the load seemed to be the sum of all CPU loads, meaning a load of 4 is low for a 12-CPU machine. But that's all unrelated...

>
>> Ist that a known bug? Interactive call of "uptime" seems to confirm my
>> suspect: windl> uptime
>> 10:41am up 23 days 18:49, 1 user, load average: 0.08, 0.05, 0.05
>> windl> uptime
>> 10:48am up 23 days 18:56, 1 user, load average: 0.00, 0.04, 0.05
>> windl> cat /proc/loadavg
>> 0.00 0.04 0.05 1/108 9704
>>
>> I'll attach a sample graph.
>
> Why should it be? As you can see in the graph you have higher spikes with 1-
> minute average. As its just a average about one minute it more easily drops
> below 0,05. But the 5 minute and 15 minute avergage need more time to drop
> lower, so for it to become lower, you need longer times without spikes in
> load average.
>
> So its natural you get "flatter" curves with higher average. Average easily
> hide things like spikes.

Actually it seems my "mathematical eye" is better than yours: I have another graph that shows the problem even more clearly (same kernel and hardware, just another machine).

Regards,
Ulrich



Attachments:
(No filename) (2.51 kB)
Load-15.png (41.12 kB)
Download all attachments