2017-08-04 15:41:30

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH] sched/fair: Make PELT signal more accurate

The PELT signal (sa->load_avg and sa->util_avg) are not updated if the amount
accumulated during a single update doesn't cross a period boundary. This is
fine in cases where the amount accrued is much smaller than the size of a
single PELT window (1ms) however if the amount accrued is high then the
relative error (calculated against what the actual signal would be had we
updated the averages) can be quite high - as much 3-6% in my testing. On
plotting signals, I found that there are errors especially high when we update
just before the period boundary is hit. These errors can be significantly
reduced if we update the averages more often.

Inorder to fix this, this patch does the average update by also checking how
much time has elapsed since the last update and update the averages if it has
been long enough (as a threshold I chose 128us).

In order to compare the signals with/without the patch I created a synthetic
test (20ms runtime, 100ms period) and analyzed the signals and created a report
on the analysis data/plots both with and without the fix:
http://www.linuxinternals.org/misc/pelt-error.pdf

With the patch, the error in the signal is significantly reduced, and is
non-existent beyond a small negligible amount.

Cc: Vincent Guittot <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Dietmar Eggeman <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
---
kernel/sched/fair.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4f1825d60937..1347643737f3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2882,6 +2882,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
unsigned long weight, int running, struct cfs_rq *cfs_rq)
{
u64 delta;
+ int periods;

delta = now - sa->last_update_time;
/*
@@ -2908,9 +2909,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
* accrues by two steps:
*
* Step 1: accumulate *_sum since last_update_time. If we haven't
- * crossed period boundaries, finish.
+ * crossed period boundaries and the time since last update is small
+ * enough, we're done.
*/
- if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq))
+ periods = accumulate_sum(delta, cpu, sa, weight, running, cfs_rq);
+
+ if (!periods && delta < 128)
return 0;

/*
--
2.14.0.rc1.383.gd1ce394fe2-goog


2017-08-07 13:24:39

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

Hi Joel,

On 4 August 2017 at 17:40, Joel Fernandes <[email protected]> wrote:
> The PELT signal (sa->load_avg and sa->util_avg) are not updated if the amount
> accumulated during a single update doesn't cross a period boundary. This is
> fine in cases where the amount accrued is much smaller than the size of a
> single PELT window (1ms) however if the amount accrued is high then the
> relative error (calculated against what the actual signal would be had we
> updated the averages) can be quite high - as much 3-6% in my testing. On
> plotting signals, I found that there are errors especially high when we update
> just before the period boundary is hit. These errors can be significantly
> reduced if we update the averages more often.
>
> Inorder to fix this, this patch does the average update by also checking how
> much time has elapsed since the last update and update the averages if it has
> been long enough (as a threshold I chose 128us).

Why 128us and not 512us as an example ?

128us threshold means that util/load_avg can be computed 8 times more
often and this means up to 16 times more call to div_u64

>
> In order to compare the signals with/without the patch I created a synthetic
> test (20ms runtime, 100ms period) and analyzed the signals and created a report
> on the analysis data/plots both with and without the fix:
> http://www.linuxinternals.org/misc/pelt-error.pdf

The glitch described in page 2 shows a decrease of the util_avg which
is not linked to accuracy of the calculation but due to the use of the
wrong range when computing util_avg.
commit 625ed2bf049d "sched/cfs: Make util/load_avg more stable" fixes
this glitch.
And the lower peak value in page 3 is probably linked to the inaccuracy
I agree that there is an inaccuracy (the max absolute value of 22) but
that's in favor of less overhead. Have you seen wrong behavior because
of this inaccuracy ?

>
> With the patch, the error in the signal is significantly reduced, and is
> non-existent beyond a small negligible amount.
>
> Cc: Vincent Guittot <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Brendan Jackman <[email protected]>
> Cc: Dietmar Eggeman <[email protected]>
> Signed-off-by: Joel Fernandes <[email protected]>
> ---
> kernel/sched/fair.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4f1825d60937..1347643737f3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2882,6 +2882,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> unsigned long weight, int running, struct cfs_rq *cfs_rq)
> {
> u64 delta;
> + int periods;
>
> delta = now - sa->last_update_time;
> /*
> @@ -2908,9 +2909,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> * accrues by two steps:
> *
> * Step 1: accumulate *_sum since last_update_time. If we haven't
> - * crossed period boundaries, finish.
> + * crossed period boundaries and the time since last update is small
> + * enough, we're done.
> */
> - if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq))
> + periods = accumulate_sum(delta, cpu, sa, weight, running, cfs_rq);
> +
> + if (!periods && delta < 128)
> return 0;
>
> /*
> --
> 2.14.0.rc1.383.gd1ce394fe2-goog
>

2017-08-07 13:40:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

On Fri, Aug 04, 2017 at 08:40:23AM -0700, Joel Fernandes wrote:
> The PELT signal (sa->load_avg and sa->util_avg) are not updated if the
> amount accumulated during a single update doesn't cross a period
> boundary.

> This is fine in cases where the amount accrued is much smaller than
> the size of a single PELT window (1ms) however if the amount accrued
> is high then the relative error (calculated against what the actual
> signal would be had we updated the averages) can be quite high - as
> much 3-6% in my testing.

The max accumulate we can have and not cross a boundary is 1023*1024 ns.
At which point we get a divisor of LOAD_AVG_MAX - 1024 + 1023.

So for util_sum we'd have a increase of 1023*1024/(47742-1) = ~22. Which
on the total signal for util (1024) is ~2.1%

Where does the 3-6% come from?

> Inorder to fix this, this patch does the average update by also
> checking how much time has elapsed since the last update and update
> the averages if it has been long enough (as a threshold I chose
> 128us).

This of course does the divisions more often; anything on performance
impact?

2017-08-08 23:11:21

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

Hi Vincent,

On Mon, Aug 7, 2017 at 6:24 AM, Vincent Guittot
<[email protected]> wrote:
> Hi Joel,
>
> On 4 August 2017 at 17:40, Joel Fernandes <[email protected]> wrote:
>> The PELT signal (sa->load_avg and sa->util_avg) are not updated if the amount
>> accumulated during a single update doesn't cross a period boundary. This is
>> fine in cases where the amount accrued is much smaller than the size of a
>> single PELT window (1ms) however if the amount accrued is high then the
>> relative error (calculated against what the actual signal would be had we
>> updated the averages) can be quite high - as much 3-6% in my testing. On
>> plotting signals, I found that there are errors especially high when we update
>> just before the period boundary is hit. These errors can be significantly
>> reduced if we update the averages more often.
>>
>> Inorder to fix this, this patch does the average update by also checking how
>> much time has elapsed since the last update and update the averages if it has
>> been long enough (as a threshold I chose 128us).
>
> Why 128us and not 512us as an example ?

I picked it because I see it shows a good reduction in the error to
fewer occurrences.

> 128us threshold means that util/load_avg can be computed 8 times more
> often and this means up to 16 times more call to div_u64

Yes this is true, however since I'm using the 'delta' instead of
period_contrib, its only does the update every 128us, however if
several updates fall within a 128us boundary then those will be rate
limited. So say we have a flood of updates, then the updates have to
be spaced every 128us to reach the maximum number of division, I don't
know whether this is a likely situation or would happen very often? I
am planning to run some benchmarks and check that there is no
regression as well as Peter mentioned about the performance aspect.

>> In order to compare the signals with/without the patch I created a synthetic
>> test (20ms runtime, 100ms period) and analyzed the signals and created a report
>> on the analysis data/plots both with and without the fix:
>> http://www.linuxinternals.org/misc/pelt-error.pdf
>
> The glitch described in page 2 shows a decrease of the util_avg which
> is not linked to accuracy of the calculation but due to the use of the
> wrong range when computing util_avg.

Yes, and I corrected the graphs this time to show what its like after
your patch and confirm that there is STILL a glitch. You are right
that there isn't a reduction after your patch, however in my updated
graphs there is a glitch and its not a downward peak but a stall in
the update, the error is still quite high and can be as high as the
absolute 2% error, in my update graphs I show an example where its ~
1.8% (18 / 1024).

Could you please take a look at my updated document? I have included
new graph and traces there and color coded them so its easy to
correlate the trace lines to the error in the graph: Here's the
updated new link:
https://github.com/joelagnel/joelagnel.github.io/blob/master/misc/pelt-error-rev2.pdf

> commit 625ed2bf049d "sched/cfs: Make util/load_avg more stable" fixes
> this glitch.
> And the lower peak value in page 3 is probably linked to the inaccuracy

This is not true. The reduction in peak in my tests which happen even
after your patch is because of the dequeue that happens just before
the period boundary is hit. Could you please take a look at the
updated document in the link above? In there I show in the second
example with a trace that corresponds the reduction in peak during the
dequeue and is because of the delay in update. These errors go away
with my patch.

> I agree that there is an inaccuracy (the max absolute value of 22) but
> that's in favor of less overhead. Have you seen wrong behavior because
> of this inaccuracy ?

I haven't tried to nail this to a wrong behavior however since other
patches have been posted to fix inaccuracy and I do see we reach the
theoretical maximum error on quite a few occassions, I think its
justifiable. Also the overhead is minimal if updates aren't happening
several times in a window, and at 128us interval, and the few times
that the update does happen, the division is performed only during
those times. So incases where it does fix the error, it does so with
minimal overhead. I do agree with the overhead point and I'm planning
to do more tests with hackbench to confirm overhead is minimal. I'll
post some updates about it soon.

Thanks!

-Joel


>
>>
>> With the patch, the error in the signal is significantly reduced, and is
>> non-existent beyond a small negligible amount.
>>
>> Cc: Vincent Guittot <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Juri Lelli <[email protected]>
>> Cc: Brendan Jackman <[email protected]>
>> Cc: Dietmar Eggeman <[email protected]>
>> Signed-off-by: Joel Fernandes <[email protected]>
>> ---
>> kernel/sched/fair.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 4f1825d60937..1347643737f3 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2882,6 +2882,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>> unsigned long weight, int running, struct cfs_rq *cfs_rq)
>> {
>> u64 delta;
>> + int periods;
>>
>> delta = now - sa->last_update_time;
>> /*
>> @@ -2908,9 +2909,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>> * accrues by two steps:
>> *
>> * Step 1: accumulate *_sum since last_update_time. If we haven't
>> - * crossed period boundaries, finish.
>> + * crossed period boundaries and the time since last update is small
>> + * enough, we're done.
>> */
>> - if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq))
>> + periods = accumulate_sum(delta, cpu, sa, weight, running, cfs_rq);
>> +
>> + if (!periods && delta < 128)
>> return 0;
>>
>> /*
>> --
>> 2.14.0.rc1.383.gd1ce394fe2-goog
>>

2017-08-08 23:38:01

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

Hi Peter,

On Mon, Aug 7, 2017 at 6:40 AM, Peter Zijlstra <[email protected]> wrote:
> On Fri, Aug 04, 2017 at 08:40:23AM -0700, Joel Fernandes wrote:
>> The PELT signal (sa->load_avg and sa->util_avg) are not updated if the
>> amount accumulated during a single update doesn't cross a period
>> boundary.
>
>> This is fine in cases where the amount accrued is much smaller than
>> the size of a single PELT window (1ms) however if the amount accrued
>> is high then the relative error (calculated against what the actual
>> signal would be had we updated the averages) can be quite high - as
>> much 3-6% in my testing.
>
> The max accumulate we can have and not cross a boundary is 1023*1024 ns.
> At which point we get a divisor of LOAD_AVG_MAX - 1024 + 1023.
>
> So for util_sum we'd have a increase of 1023*1024/(47742-1) = ~22. Which
> on the total signal for util (1024) is ~2.1%
>
> Where does the 3-6% come from?

Sorry, I should have been more clear. This error (3-6%) I measured is
relative to what the signal could have been had we done the division.
Indeed I don't see any cases were the absolute error is more than ~22
/ 1024 as you mentioned.

>
>> Inorder to fix this, this patch does the average update by also
>> checking how much time has elapsed since the last update and update
>> the averages if it has been long enough (as a threshold I chose
>> 128us).
>
> This of course does the divisions more often; anything on performance
> impact?

Sure, I am working on those and will post an update soon.

Thanks!

-Joel


>
> --
> You received this message because you are subscribed to the Google Groups "kernel-team" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>

2017-08-09 10:24:00

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

On 9 August 2017 at 01:11, Joel Fernandes <[email protected]> wrote:
> Hi Vincent,
>
> On Mon, Aug 7, 2017 at 6:24 AM, Vincent Guittot
> <[email protected]> wrote:
>> Hi Joel,
>>
>> On 4 August 2017 at 17:40, Joel Fernandes <[email protected]> wrote:
>>> The PELT signal (sa->load_avg and sa->util_avg) are not updated if the amount
>>> accumulated during a single update doesn't cross a period boundary. This is
>>> fine in cases where the amount accrued is much smaller than the size of a
>>> single PELT window (1ms) however if the amount accrued is high then the
>>> relative error (calculated against what the actual signal would be had we
>>> updated the averages) can be quite high - as much 3-6% in my testing. On
>>> plotting signals, I found that there are errors especially high when we update
>>> just before the period boundary is hit. These errors can be significantly
>>> reduced if we update the averages more often.
>>>
>>> Inorder to fix this, this patch does the average update by also checking how
>>> much time has elapsed since the last update and update the averages if it has
>>> been long enough (as a threshold I chose 128us).
>>
>> Why 128us and not 512us as an example ?
>
> I picked it because I see it shows a good reduction in the error to
> fewer occurrences.
>
>> 128us threshold means that util/load_avg can be computed 8 times more
>> often and this means up to 16 times more call to div_u64
>
> Yes this is true, however since I'm using the 'delta' instead of
> period_contrib, its only does the update every 128us, however if
> several updates fall within a 128us boundary then those will be rate
> limited. So say we have a flood of updates, then the updates have to
> be spaced every 128us to reach the maximum number of division, I don't
> know whether this is a likely situation or would happen very often? I
> am planning to run some benchmarks and check that there is no
> regression as well as Peter mentioned about the performance aspect.
>
>>> In order to compare the signals with/without the patch I created a synthetic
>>> test (20ms runtime, 100ms period) and analyzed the signals and created a report
>>> on the analysis data/plots both with and without the fix:
>>> http://www.linuxinternals.org/misc/pelt-error.pdf
>>
>> The glitch described in page 2 shows a decrease of the util_avg which
>> is not linked to accuracy of the calculation but due to the use of the
>> wrong range when computing util_avg.
>
> Yes, and I corrected the graphs this time to show what its like after
> your patch and confirm that there is STILL a glitch. You are right
> that there isn't a reduction after your patch, however in my updated
> graphs there is a glitch and its not a downward peak but a stall in
> the update, the error is still quite high and can be as high as the
> absolute 2% error, in my update graphs I show an example where its ~
> 1.8% (18 / 1024).
>
> Could you please take a look at my updated document? I have included
> new graph and traces there and color coded them so its easy to
> correlate the trace lines to the error in the graph: Here's the
> updated new link:
> https://github.com/joelagnel/joelagnel.github.io/blob/master/misc/pelt-error-rev2.pdf

I see strange behavior in your rev2 document:
At timestamp 9.235635, we have util_avg=199 acc_util_avg=199
util_err=0. Everything looks fine but I don't this point on the graph
Then, at 9.235636 (which is the red colored faulty trace), we have
util_avg=182 acc_util_avg=200 util_err=18.
Firstly, this means that util_avg has been updated (199 -> 182) so the
error is not a problem of util_avg not been updated often enough :-)
Then, util_avg decreases (199 -> 182) whereas it must increase because
the task is still running. This should not happen and this is exactly
what commit 625ed2bf049d should fix. So either the patch is not
applied or it doesn't fix completely the problem.

That would be interesting to also display the last_update_time of sched_avg

>
>> commit 625ed2bf049d "sched/cfs: Make util/load_avg more stable" fixes
>> this glitch.
>> And the lower peak value in page 3 is probably linked to the inaccuracy
>
> This is not true. The reduction in peak in my tests which happen even
> after your patch is because of the dequeue that happens just before
> the period boundary is hit. Could you please take a look at the
> updated document in the link above? In there I show in the second
> example with a trace that corresponds the reduction in peak during the
> dequeue and is because of the delay in update. These errors go away
> with my patch.

There is the same strange behavior there:
When the reduction in peak happens, the util_avg is updated whereas
your concerns is that util_avg is not update often enough.
At timestamp 10.656683, we have util_avg=389 acc_util_avg=389 util_err=0
At timestamp 10.657420, we have util_avg=396 acc_util_avg=396
util_err=0. I don't see this point on the graph
At timestamp 10.657422, we have util_avg=389 acc_util_avg=399
util_err=10. This is the colored faulty trace but util_avg has been
updated from 369 to 389

Regards,
Vincent

>
>> I agree that there is an inaccuracy (the max absolute value of 22) but
>> that's in favor of less overhead. Have you seen wrong behavior because
>> of this inaccuracy ?
>
> I haven't tried to nail this to a wrong behavior however since other
> patches have been posted to fix inaccuracy and I do see we reach the
> theoretical maximum error on quite a few occassions, I think its
> justifiable. Also the overhead is minimal if updates aren't happening
> several times in a window, and at 128us interval, and the few times
> that the update does happen, the division is performed only during
> those times. So incases where it does fix the error, it does so with
> minimal overhead. I do agree with the overhead point and I'm planning
> to do more tests with hackbench to confirm overhead is minimal. I'll
> post some updates about it soon.
>
> Thanks!
>
> -Joel
>
>
>>
>>>
>>> With the patch, the error in the signal is significantly reduced, and is
>>> non-existent beyond a small negligible amount.
>>>
>>> Cc: Vincent Guittot <[email protected]>
>>> Cc: Peter Zijlstra <[email protected]>
>>> Cc: Juri Lelli <[email protected]>
>>> Cc: Brendan Jackman <[email protected]>
>>> Cc: Dietmar Eggeman <[email protected]>
>>> Signed-off-by: Joel Fernandes <[email protected]>
>>> ---
>>> kernel/sched/fair.c | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 4f1825d60937..1347643737f3 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -2882,6 +2882,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>> unsigned long weight, int running, struct cfs_rq *cfs_rq)
>>> {
>>> u64 delta;
>>> + int periods;
>>>
>>> delta = now - sa->last_update_time;
>>> /*
>>> @@ -2908,9 +2909,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>> * accrues by two steps:
>>> *
>>> * Step 1: accumulate *_sum since last_update_time. If we haven't
>>> - * crossed period boundaries, finish.
>>> + * crossed period boundaries and the time since last update is small
>>> + * enough, we're done.
>>> */
>>> - if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq))
>>> + periods = accumulate_sum(delta, cpu, sa, weight, running, cfs_rq);
>>> +
>>> + if (!periods && delta < 128)
>>> return 0;
>>>
>>> /*
>>> --
>>> 2.14.0.rc1.383.gd1ce394fe2-goog
>>>

2017-08-09 17:51:30

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

Hi Vincent,

On Wed, Aug 9, 2017 at 3:23 AM, Vincent Guittot
<[email protected]> wrote:
<snip>
>>
>> Yes this is true, however since I'm using the 'delta' instead of
>> period_contrib, its only does the update every 128us, however if
>> several updates fall within a 128us boundary then those will be rate
>> limited. So say we have a flood of updates, then the updates have to
>> be spaced every 128us to reach the maximum number of division, I don't
>> know whether this is a likely situation or would happen very often? I
>> am planning to run some benchmarks and check that there is no
>> regression as well as Peter mentioned about the performance aspect.
>>
>>>> In order to compare the signals with/without the patch I created a synthetic
>>>> test (20ms runtime, 100ms period) and analyzed the signals and created a report
>>>> on the analysis data/plots both with and without the fix:
>>>> http://www.linuxinternals.org/misc/pelt-error.pdf
>>>
>>> The glitch described in page 2 shows a decrease of the util_avg which
>>> is not linked to accuracy of the calculation but due to the use of the
>>> wrong range when computing util_avg.
>>
>> Yes, and I corrected the graphs this time to show what its like after
>> your patch and confirm that there is STILL a glitch. You are right
>> that there isn't a reduction after your patch, however in my updated
>> graphs there is a glitch and its not a downward peak but a stall in
>> the update, the error is still quite high and can be as high as the
>> absolute 2% error, in my update graphs I show an example where its ~
>> 1.8% (18 / 1024).
>>
>> Could you please take a look at my updated document? I have included
>> new graph and traces there and color coded them so its easy to
>> correlate the trace lines to the error in the graph: Here's the
>> updated new link:
>> https://github.com/joelagnel/joelagnel.github.io/blob/master/misc/pelt-error-rev2.pdf
>
> I see strange behavior in your rev2 document:
> At timestamp 9.235635, we have util_avg=199 acc_util_avg=199
> util_err=0. Everything looks fine but I don't this point on the graph
> Then, at 9.235636 (which is the red colored faulty trace), we have
> util_avg=182 acc_util_avg=200 util_err=18.
> Firstly, this means that util_avg has been updated (199 -> 182) so the
> error is not a problem of util_avg not been updated often enough :-)
> Then, util_avg decreases (199 -> 182) whereas it must increase because
> the task is still running. This should not happen and this is exactly
> what commit 625ed2bf049d should fix. So either the patch is not
> applied or it doesn't fix completely the problem.

I think you are looking at wrong trace lines. The graph is generated
with for rq util only (cfs_rq == 1), so the lines in the traces you
should look at are the ones with cfs_rq= 1. Only cfs_rq==1 lines were
used to generate the graphs.

In this you will see rq util_avg change as follows: 165 -> 182 -> 182
(missed an update causing error). This is also reflected in the graph
in the graph where you see the flat green line.

>
> That would be interesting to also display the last_update_time of sched_avg
>
>>
>>> commit 625ed2bf049d "sched/cfs: Make util/load_avg more stable" fixes
>>> this glitch.
>>> And the lower peak value in page 3 is probably linked to the inaccuracy
>>
>> This is not true. The reduction in peak in my tests which happen even
>> after your patch is because of the dequeue that happens just before
>> the period boundary is hit. Could you please take a look at the
>> updated document in the link above? In there I show in the second
>> example with a trace that corresponds the reduction in peak during the
>> dequeue and is because of the delay in update. These errors go away
>> with my patch.
>
> There is the same strange behavior there:
> When the reduction in peak happens, the util_avg is updated whereas
> your concerns is that util_avg is not update often enough.
> At timestamp 10.656683, we have util_avg=389 acc_util_avg=389 util_err=0
> At timestamp 10.657420, we have util_avg=396 acc_util_avg=396
> util_err=0. I don't see this point on the graph
> At timestamp 10.657422, we have util_avg=389 acc_util_avg=399
> util_err=10. This is the colored faulty trace but util_avg has been
> updated from 369 to 389

Yeah, same thing here, you should look at the lines with cfs_rq == 1.
The util changes as: 363 -> 376 -> 389 -> 389 (missed update).


thanks,

-Joel


>
> Regards,
> Vincent
>
>>
>>> I agree that there is an inaccuracy (the max absolute value of 22) but
>>> that's in favor of less overhead. Have you seen wrong behavior because
>>> of this inaccuracy ?
>>
>> I haven't tried to nail this to a wrong behavior however since other
>> patches have been posted to fix inaccuracy and I do see we reach the
>> theoretical maximum error on quite a few occassions, I think its
>> justifiable. Also the overhead is minimal if updates aren't happening
>> several times in a window, and at 128us interval, and the few times
>> that the update does happen, the division is performed only during
>> those times. So incases where it does fix the error, it does so with
>> minimal overhead. I do agree with the overhead point and I'm planning
>> to do more tests with hackbench to confirm overhead is minimal. I'll
>> post some updates about it soon.
>>
>> Thanks!
>>
>> -Joel
>>
>>
>>>
>>>>
>>>> With the patch, the error in the signal is significantly reduced, and is
>>>> non-existent beyond a small negligible amount.
>>>>
>>>> Cc: Vincent Guittot <[email protected]>
>>>> Cc: Peter Zijlstra <[email protected]>
>>>> Cc: Juri Lelli <[email protected]>
>>>> Cc: Brendan Jackman <[email protected]>
>>>> Cc: Dietmar Eggeman <[email protected]>
>>>> Signed-off-by: Joel Fernandes <[email protected]>
>>>> ---
>>>> kernel/sched/fair.c | 8 ++++++--
>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 4f1825d60937..1347643737f3 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -2882,6 +2882,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>> unsigned long weight, int running, struct cfs_rq *cfs_rq)
>>>> {
>>>> u64 delta;
>>>> + int periods;
>>>>
>>>> delta = now - sa->last_update_time;
>>>> /*
>>>> @@ -2908,9 +2909,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>> * accrues by two steps:
>>>> *
>>>> * Step 1: accumulate *_sum since last_update_time. If we haven't
>>>> - * crossed period boundaries, finish.
>>>> + * crossed period boundaries and the time since last update is small
>>>> + * enough, we're done.
>>>> */
>>>> - if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq))
>>>> + periods = accumulate_sum(delta, cpu, sa, weight, running, cfs_rq);
>>>> +
>>>> + if (!periods && delta < 128)
>>>> return 0;
>>>>
>>>> /*
>>>> --
>>>> 2.14.0.rc1.383.gd1ce394fe2-goog
>>>>

2017-08-10 07:17:57

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

On 9 August 2017 at 19:51, Joel Fernandes <[email protected]> wrote:
> Hi Vincent,
>
> On Wed, Aug 9, 2017 at 3:23 AM, Vincent Guittot
> <[email protected]> wrote:
> <snip>
>>>
>>> Yes this is true, however since I'm using the 'delta' instead of
>>> period_contrib, its only does the update every 128us, however if
>>> several updates fall within a 128us boundary then those will be rate
>>> limited. So say we have a flood of updates, then the updates have to
>>> be spaced every 128us to reach the maximum number of division, I don't
>>> know whether this is a likely situation or would happen very often? I
>>> am planning to run some benchmarks and check that there is no
>>> regression as well as Peter mentioned about the performance aspect.
>>>
>>>>> In order to compare the signals with/without the patch I created a synthetic
>>>>> test (20ms runtime, 100ms period) and analyzed the signals and created a report
>>>>> on the analysis data/plots both with and without the fix:
>>>>> http://www.linuxinternals.org/misc/pelt-error.pdf
>>>>
>>>> The glitch described in page 2 shows a decrease of the util_avg which
>>>> is not linked to accuracy of the calculation but due to the use of the
>>>> wrong range when computing util_avg.
>>>
>>> Yes, and I corrected the graphs this time to show what its like after
>>> your patch and confirm that there is STILL a glitch. You are right
>>> that there isn't a reduction after your patch, however in my updated
>>> graphs there is a glitch and its not a downward peak but a stall in
>>> the update, the error is still quite high and can be as high as the
>>> absolute 2% error, in my update graphs I show an example where its ~
>>> 1.8% (18 / 1024).
>>>
>>> Could you please take a look at my updated document? I have included
>>> new graph and traces there and color coded them so its easy to
>>> correlate the trace lines to the error in the graph: Here's the
>>> updated new link:
>>> https://github.com/joelagnel/joelagnel.github.io/blob/master/misc/pelt-error-rev2.pdf
>>
>> I see strange behavior in your rev2 document:
>> At timestamp 9.235635, we have util_avg=199 acc_util_avg=199
>> util_err=0. Everything looks fine but I don't this point on the graph
>> Then, at 9.235636 (which is the red colored faulty trace), we have
>> util_avg=182 acc_util_avg=200 util_err=18.
>> Firstly, this means that util_avg has been updated (199 -> 182) so the
>> error is not a problem of util_avg not been updated often enough :-)
>> Then, util_avg decreases (199 -> 182) whereas it must increase because
>> the task is still running. This should not happen and this is exactly
>> what commit 625ed2bf049d should fix. So either the patch is not
>> applied or it doesn't fix completely the problem.
>
> I think you are looking at wrong trace lines. The graph is generated
> with for rq util only (cfs_rq == 1), so the lines in the traces you
> should look at are the ones with cfs_rq= 1. Only cfs_rq==1 lines were
> used to generate the graphs.

Ah! this is quite confusing and not obvious that the trace is not for
1 signal but in fact 2 signals are interleaved and only 1 is displayed
and that we have to filter them

So ok i can see that the trace with cfs_rq=1 is not updated. At the
opposite, we can see that the other trace (for the se i assume) is
updated normally whereas they are normally synced on the same clock

>
> In this you will see rq util_avg change as follows: 165 -> 182 -> 182
> (missed an update causing error). This is also reflected in the graph
> in the graph where you see the flat green line.
>
>>
>> That would be interesting to also display the last_update_time of sched_avg
>>
>>>
>>>> commit 625ed2bf049d "sched/cfs: Make util/load_avg more stable" fixes
>>>> this glitch.
>>>> And the lower peak value in page 3 is probably linked to the inaccuracy
>>>
>>> This is not true. The reduction in peak in my tests which happen even
>>> after your patch is because of the dequeue that happens just before
>>> the period boundary is hit. Could you please take a look at the
>>> updated document in the link above? In there I show in the second
>>> example with a trace that corresponds the reduction in peak during the
>>> dequeue and is because of the delay in update. These errors go away
>>> with my patch.
>>
>> There is the same strange behavior there:
>> When the reduction in peak happens, the util_avg is updated whereas
>> your concerns is that util_avg is not update often enough.
>> At timestamp 10.656683, we have util_avg=389 acc_util_avg=389 util_err=0
>> At timestamp 10.657420, we have util_avg=396 acc_util_avg=396
>> util_err=0. I don't see this point on the graph
>> At timestamp 10.657422, we have util_avg=389 acc_util_avg=399
>> util_err=10. This is the colored faulty trace but util_avg has been
>> updated from 369 to 389
>
> Yeah, same thing here, you should look at the lines with cfs_rq == 1.
> The util changes as: 363 -> 376 -> 389 -> 389 (missed update).
>
>
> thanks,
>
> -Joel
>
>
>>
>> Regards,
>> Vincent
>>
>>>
>>>> I agree that there is an inaccuracy (the max absolute value of 22) but
>>>> that's in favor of less overhead. Have you seen wrong behavior because
>>>> of this inaccuracy ?
>>>
>>> I haven't tried to nail this to a wrong behavior however since other
>>> patches have been posted to fix inaccuracy and I do see we reach the
>>> theoretical maximum error on quite a few occassions, I think its
>>> justifiable. Also the overhead is minimal if updates aren't happening
>>> several times in a window, and at 128us interval, and the few times
>>> that the update does happen, the division is performed only during
>>> those times. So incases where it does fix the error, it does so with
>>> minimal overhead. I do agree with the overhead point and I'm planning
>>> to do more tests with hackbench to confirm overhead is minimal. I'll
>>> post some updates about it soon.
>>>
>>> Thanks!
>>>
>>> -Joel
>>>
>>>
>>>>
>>>>>
>>>>> With the patch, the error in the signal is significantly reduced, and is
>>>>> non-existent beyond a small negligible amount.
>>>>>
>>>>> Cc: Vincent Guittot <[email protected]>
>>>>> Cc: Peter Zijlstra <[email protected]>
>>>>> Cc: Juri Lelli <[email protected]>
>>>>> Cc: Brendan Jackman <[email protected]>
>>>>> Cc: Dietmar Eggeman <[email protected]>
>>>>> Signed-off-by: Joel Fernandes <[email protected]>
>>>>> ---
>>>>> kernel/sched/fair.c | 8 ++++++--
>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index 4f1825d60937..1347643737f3 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -2882,6 +2882,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>>> unsigned long weight, int running, struct cfs_rq *cfs_rq)
>>>>> {
>>>>> u64 delta;
>>>>> + int periods;
>>>>>
>>>>> delta = now - sa->last_update_time;
>>>>> /*
>>>>> @@ -2908,9 +2909,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>>> * accrues by two steps:
>>>>> *
>>>>> * Step 1: accumulate *_sum since last_update_time. If we haven't
>>>>> - * crossed period boundaries, finish.
>>>>> + * crossed period boundaries and the time since last update is small
>>>>> + * enough, we're done.
>>>>> */
>>>>> - if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq))
>>>>> + periods = accumulate_sum(delta, cpu, sa, weight, running, cfs_rq);
>>>>> +
>>>>> + if (!periods && delta < 128)
>>>>> return 0;
>>>>>
>>>>> /*
>>>>> --
>>>>> 2.14.0.rc1.383.gd1ce394fe2-goog
>>>>>

2017-08-10 10:37:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

On Thu, Aug 10, 2017 at 09:17:34AM +0200, Vincent Guittot wrote:
> Ah! this is quite confusing and not obvious that the trace is not for
> 1 signal but in fact 2 signals are interleaved and only 1 is displayed
> and that we have to filter them
>
> So ok i can see that the trace with cfs_rq=1 is not updated. At the
> opposite, we can see that the other trace (for the se i assume) is
> updated normally whereas they are normally synced on the same clock

So I'm still sitting on those patches that sync up the PELT window
between group-cfs_rq and its corresponding group-se.

I was hoping to get around to looking at all that again and adding
Josef's last few patches and promoting the lot from /experimental to
/core before going on holidays next week, but we'll see.

2017-08-10 15:22:42

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

On Thu, Aug 10, 2017 at 12:17 AM, Vincent Guittot
<[email protected]> wrote:
> On 9 August 2017 at 19:51, Joel Fernandes <[email protected]> wrote:
>> Hi Vincent,
>>
>> On Wed, Aug 9, 2017 at 3:23 AM, Vincent Guittot
>> <[email protected]> wrote:
>> <snip>
>>>>
>>>> Yes this is true, however since I'm using the 'delta' instead of
>>>> period_contrib, its only does the update every 128us, however if
>>>> several updates fall within a 128us boundary then those will be rate
>>>> limited. So say we have a flood of updates, then the updates have to
>>>> be spaced every 128us to reach the maximum number of division, I don't
>>>> know whether this is a likely situation or would happen very often? I
>>>> am planning to run some benchmarks and check that there is no
>>>> regression as well as Peter mentioned about the performance aspect.
>>>>
>>>>>> In order to compare the signals with/without the patch I created a synthetic
>>>>>> test (20ms runtime, 100ms period) and analyzed the signals and created a report
>>>>>> on the analysis data/plots both with and without the fix:
>>>>>> http://www.linuxinternals.org/misc/pelt-error.pdf
>>>>>
>>>>> The glitch described in page 2 shows a decrease of the util_avg which
>>>>> is not linked to accuracy of the calculation but due to the use of the
>>>>> wrong range when computing util_avg.
>>>>
>>>> Yes, and I corrected the graphs this time to show what its like after
>>>> your patch and confirm that there is STILL a glitch. You are right
>>>> that there isn't a reduction after your patch, however in my updated
>>>> graphs there is a glitch and its not a downward peak but a stall in
>>>> the update, the error is still quite high and can be as high as the
>>>> absolute 2% error, in my update graphs I show an example where its ~
>>>> 1.8% (18 / 1024).
>>>>
>>>> Could you please take a look at my updated document? I have included
>>>> new graph and traces there and color coded them so its easy to
>>>> correlate the trace lines to the error in the graph: Here's the
>>>> updated new link:
>>>> https://github.com/joelagnel/joelagnel.github.io/blob/master/misc/pelt-error-rev2.pdf
>>>
>>> I see strange behavior in your rev2 document:
>>> At timestamp 9.235635, we have util_avg=199 acc_util_avg=199
>>> util_err=0. Everything looks fine but I don't this point on the graph
>>> Then, at 9.235636 (which is the red colored faulty trace), we have
>>> util_avg=182 acc_util_avg=200 util_err=18.
>>> Firstly, this means that util_avg has been updated (199 -> 182) so the
>>> error is not a problem of util_avg not been updated often enough :-)
>>> Then, util_avg decreases (199 -> 182) whereas it must increase because
>>> the task is still running. This should not happen and this is exactly
>>> what commit 625ed2bf049d should fix. So either the patch is not
>>> applied or it doesn't fix completely the problem.
>>
>> I think you are looking at wrong trace lines. The graph is generated
>> with for rq util only (cfs_rq == 1), so the lines in the traces you
>> should look at are the ones with cfs_rq= 1. Only cfs_rq==1 lines were
>> used to generate the graphs.
>
> Ah! this is quite confusing and not obvious that the trace is not for
> 1 signal but in fact 2 signals are interleaved and only 1 is displayed
> and that we have to filter them

Sorry, its my fault that I didn't make this fact clear enough in the
doc :-/. Thanks for your patience.

> So ok i can see that the trace with cfs_rq=1 is not updated. At the
> opposite, we can see that the other trace (for the se i assume) is
> updated normally whereas they are normally synced on the same clock

Ah, ok.
I also checked that the error can for the se as well, in following
example for cfs_rq=0 :

Lines filtered:

task_tick_fair: pelt_update: util_avg=359 load_avg=364
acc_load_avg=364 acc_util_avg=359 util_err=0 load_err=0
load_sum=17041923 sum_err=0 delta_us=977 cfs_rq=0 ret=1

task_tick_fair: pelt_update: util_avg=373 load_avg=377
acc_load_avg=377 acc_util_avg=373 util_err=0 load_err=0
load_sum=17656717 sum_err=0 delta_us=978 cfs_rq=0 ret=1

task_tick_fair: pelt_update: util_avg=373 load_avg=377
acc_load_avg=390 acc_util_avg=386 util_err=13 load_err=13
load_sum=18651021 sum_err=-994304 delta_us=971 cfs_rq=0

dequeue_task_fair: pelt_update: util_avg=396 load_avg=400
acc_load_avg=400 acc_util_avg=396 util_err=0 load_err=0
load_sum=18987624 sum_err=0 delta_us=720 cfs_rq=0 ret=1

So here we have 359, 373, 373, 396.


Lines unfiltered:

task_tick_fair: pelt_update: util_avg=349 load_avg=413
acc_load_avg=413 acc_util_avg=349 util_err=0 load_err=0
load_sum=19460993 sum_err=0 delta_us=980 cfs_rq=1 ret=1

task_tick_fair: pelt_update: util_avg=359 load_avg=364
acc_load_avg=364 acc_util_avg=359 util_err=0 load_err=0
load_sum=17041923 sum_err=0 delta_us=977 cfs_rq=0 ret=1

task_tick_fair: pelt_update: util_avg=363 load_avg=426
acc_load_avg=426 acc_util_avg=363 util_err=0 load_err=0
load_sum=20029072 sum_err=0 delta_us=977 cfs_rq=1 ret=1

task_tick_fair: pelt_update: util_avg=373 load_avg=377
acc_load_avg=377 acc_util_avg=373 util_err=0 load_err=0
load_sum=17656717 sum_err=0 delta_us=978 cfs_rq=0 ret=1

task_tick_fair: pelt_update: util_avg=376 load_avg=438
acc_load_avg=438 acc_util_avg=376 util_err=0 load_err=0
load_sum=20584978 sum_err=0 delta_us=978 cfs_rq=1 ret=1

task_tick_fair: pelt_update: util_avg=373 load_avg=377
acc_load_avg=390 acc_util_avg=386 util_err=13 load_err=13
load_sum=18651021 sum_err=-994304 delta_us=971 cfs_rq=0 ret=0

task_tick_fair: pelt_update: util_avg=389 load_avg=450
acc_load_avg=450 acc_util_avg=389 util_err=0 load_err=0
load_sum=21120780 sum_err=0 delta_us=971 cfs_rq=1 ret=1

dequeue_task_fair: pelt_update: util_avg=396 load_avg=400
acc_load_avg=400 acc_util_avg=396 util_err=0 load_err=0
load_sum=18987624 sum_err=0 delta_us=720 cfs_rq=0 ret=1


I hope to finish the perf tests today that Peter discussed and provide
an update,

thanks!

-Joel


>> In this you will see rq util_avg change as follows: 165 -> 182 -> 182
>> (missed an update causing error). This is also reflected in the graph
>> in the graph where you see the flat green line.
>>
>>>
>>> That would be interesting to also display the last_update_time of sched_avg
>>>
>>>>
>>>>> commit 625ed2bf049d "sched/cfs: Make util/load_avg more stable" fixes
>>>>> this glitch.
>>>>> And the lower peak value in page 3 is probably linked to the inaccuracy
>>>>
>>>> This is not true. The reduction in peak in my tests which happen even
>>>> after your patch is because of the dequeue that happens just before
>>>> the period boundary is hit. Could you please take a look at the
>>>> updated document in the link above? In there I show in the second
>>>> example with a trace that corresponds the reduction in peak during the
>>>> dequeue and is because of the delay in update. These errors go away
>>>> with my patch.
>>>
>>> There is the same strange behavior there:
>>> When the reduction in peak happens, the util_avg is updated whereas
>>> your concerns is that util_avg is not update often enough.
>>> At timestamp 10.656683, we have util_avg=389 acc_util_avg=389 util_err=0
>>> At timestamp 10.657420, we have util_avg=396 acc_util_avg=396
>>> util_err=0. I don't see this point on the graph
>>> At timestamp 10.657422, we have util_avg=389 acc_util_avg=399
>>> util_err=10. This is the colored faulty trace but util_avg has been
>>> updated from 369 to 389
>>
>> Yeah, same thing here, you should look at the lines with cfs_rq == 1.
>> The util changes as: 363 -> 376 -> 389 -> 389 (missed update).
>>
>>
>> thanks,
>>
>> -Joel
>>
>>
>>>
>>> Regards,
>>> Vincent
>>>
>>>>
>>>>> I agree that there is an inaccuracy (the max absolute value of 22) but
>>>>> that's in favor of less overhead. Have you seen wrong behavior because
>>>>> of this inaccuracy ?
>>>>
>>>> I haven't tried to nail this to a wrong behavior however since other
>>>> patches have been posted to fix inaccuracy and I do see we reach the
>>>> theoretical maximum error on quite a few occassions, I think its
>>>> justifiable. Also the overhead is minimal if updates aren't happening
>>>> several times in a window, and at 128us interval, and the few times
>>>> that the update does happen, the division is performed only during
>>>> those times. So incases where it does fix the error, it does so with
>>>> minimal overhead. I do agree with the overhead point and I'm planning
>>>> to do more tests with hackbench to confirm overhead is minimal. I'll
>>>> post some updates about it soon.
>>>>
>>>> Thanks!
>>>>
>>>> -Joel
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> With the patch, the error in the signal is significantly reduced, and is
>>>>>> non-existent beyond a small negligible amount.
>>>>>>
>>>>>> Cc: Vincent Guittot <[email protected]>
>>>>>> Cc: Peter Zijlstra <[email protected]>
>>>>>> Cc: Juri Lelli <[email protected]>
>>>>>> Cc: Brendan Jackman <[email protected]>
>>>>>> Cc: Dietmar Eggeman <[email protected]>
>>>>>> Signed-off-by: Joel Fernandes <[email protected]>
>>>>>> ---
>>>>>> kernel/sched/fair.c | 8 ++++++--
>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>>> index 4f1825d60937..1347643737f3 100644
>>>>>> --- a/kernel/sched/fair.c
>>>>>> +++ b/kernel/sched/fair.c
>>>>>> @@ -2882,6 +2882,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>>>> unsigned long weight, int running, struct cfs_rq *cfs_rq)
>>>>>> {
>>>>>> u64 delta;
>>>>>> + int periods;
>>>>>>
>>>>>> delta = now - sa->last_update_time;
>>>>>> /*
>>>>>> @@ -2908,9 +2909,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>>>> * accrues by two steps:
>>>>>> *
>>>>>> * Step 1: accumulate *_sum since last_update_time. If we haven't
>>>>>> - * crossed period boundaries, finish.
>>>>>> + * crossed period boundaries and the time since last update is small
>>>>>> + * enough, we're done.
>>>>>> */
>>>>>> - if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq))
>>>>>> + periods = accumulate_sum(delta, cpu, sa, weight, running, cfs_rq);
>>>>>> +
>>>>>> + if (!periods && delta < 128)
>>>>>> return 0;
>>>>>>
>>>>>> /*
>>>>>> --
>>>>>> 2.14.0.rc1.383.gd1ce394fe2-goog
>>>>>>
>
> --
> You received this message because you are subscribed to the Google Groups "kernel-team" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>

2017-08-10 23:11:28

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate

Hi Peter,

On Mon, Aug 7, 2017 at 6:40 AM, Peter Zijlstra <[email protected]> wrote:
> On Fri, Aug 04, 2017 at 08:40:23AM -0700, Joel Fernandes wrote:
>> The PELT signal (sa->load_avg and sa->util_avg) are not updated if the
>> amount accumulated during a single update doesn't cross a period
>> boundary.
>
>> This is fine in cases where the amount accrued is much smaller than
>> the size of a single PELT window (1ms) however if the amount accrued
>> is high then the relative error (calculated against what the actual
>> signal would be had we updated the averages) can be quite high - as
>> much 3-6% in my testing.
>
> The max accumulate we can have and not cross a boundary is 1023*1024 ns.
> At which point we get a divisor of LOAD_AVG_MAX - 1024 + 1023.
>
> So for util_sum we'd have a increase of 1023*1024/(47742-1) = ~22. Which
> on the total signal for util (1024) is ~2.1%
>
> Where does the 3-6% come from?
>
>> Inorder to fix this, this patch does the average update by also
>> checking how much time has elapsed since the last update and update
>> the averages if it has been long enough (as a threshold I chose
>> 128us).
>
> This of course does the divisions more often; anything on performance
> impact?

I ran hackbench and as such I don't see any degradation in performance.

# while [ 1 ]; do hackbench 5 thread 500; done
Running with 5*40 (== 200) tasks.

without:
Time: 0.742
Time: 0.770
Time: 0.857
Time: 0.809
Time: 0.721
Time: 0.725
Time: 0.717
Time: 0.699

with:
Time: 0.787
Time: 0.816
Time: 0.744
Time: 0.832
Time: 0.798
Time: 0.785
Time: 0.714
Time: 0.721

If there's any other benchmark or anything different in this test
you'd like me to run, let me know, thanks.

Regards,
Joel

2017-08-11 02:49:10

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [sched/fair] dca93994f6: unixbench.score -8.1% regression


Greeting,

FYI, we noticed a -8.1% regression of unixbench.score due to commit:


commit: dca93994f61becdd8d224155643a44ba284970f6 ("sched/fair: Make PELT signal more accurate")
url: https://github.com/0day-ci/linux/commits/Joel-Fernandes/sched-fair-Make-PELT-signal-more-accurate/20170805-084820


in testcase: unixbench
on test machine: 8 threads Ivy Bridge with 16G memory
with following parameters:

runtime: 300s
nr_task: 100%
test: shell8
cpufreq_governor: performance

test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

testcase/path_params/tbox_group/run: unixbench/300s-100%-shell8-performance/lkp-ivb-d01

dbe04493eddfaa89 dca93994f61becdd8d22415564
---------------- --------------------------
%stddev change %stddev
\ | \
13509 -8% 12415 unixbench.score
5.186e+08 -8% 4.765e+08 unixbench.time.minor_page_faults
16164186 -9% 14694969 unixbench.time.voluntary_context_switches
3100 -13% 2708 unixbench.time.user_time
12755309 -8% 11770274 unixbench.time.involuntary_context_switches
704 -13% 612 unixbench.time.percent_of_cpu_this_job_got
1368 -15% 1157 unixbench.time.system_time
72414 127% 164292 interrupts.CAL:Function_call_interrupts
71606 5% 75101 vmstat.system.cs
14967 15288 vmstat.system.in
45523474 4% 47526554 perf-stat.context-switches
0.76 6% 0.81 perf-stat.ipc
10697570 -4% 10257128 perf-stat.cpu-migrations
74.79 -3% 72.40 perf-stat.iTLB-load-miss-rate%
3.282e+12 -8% 3.032e+12 perf-stat.dTLB-loads
2246 9% 2446 perf-stat.instructions-per-iTLB-miss
1.748e+11 -9% 1.596e+11 perf-stat.cache-references
1.908e+09 -4% 1.827e+09 perf-stat.iTLB-loads
2.534e+12 -8% 2.337e+12 perf-stat.branch-instructions
1.272e+13 -8% 1.173e+13 perf-stat.instructions
2.178e+12 -8% 1.999e+12 perf-stat.dTLB-stores
5.082e+08 -8% 4.67e+08 perf-stat.minor-faults
5.082e+08 -8% 4.67e+08 perf-stat.page-faults
1.31 -6% 1.24 perf-stat.cpi
6.454e+10 -8% 5.915e+10 perf-stat.branch-misses
13.23 -4% 12.71 perf-stat.cache-miss-rate%
1.671e+13 -13% 1.451e+13 perf-stat.cpu-cycles
2.313e+10 -12% 2.028e+10 perf-stat.cache-misses
2.093e+09 -10% 1.882e+09 ? 6% perf-stat.dTLB-store-misses
5.66e+09 -15% 4.794e+09 perf-stat.iTLB-load-misses



unixbench.score

13800 ++------------------------------------------------------------------+
| *..*. |
13600 ++*..*. + *.*.. .*..*.*.. .*.*..*. |
* *..* * * *..*.*..*.*.*..*.*..*.*..*.*
13400 ++ |
| |
13200 ++ |
| |
13000 ++ |
| |
12800 ++ |
O O O O O |
12600 ++ O O O O O O O O |
| |
12400 ++------------------------------O-O-O--O-O--O-O--O------------------+


perf-stat.cpu-cycles

1.7e+13 ++------------*-*--*-*-*--*-*-*------*---------------------------+
*.*..*.*.*..* *.* *..*.*.*..*.*.*..*.*.*..*.*
1.65e+13 ++ |
| |
| |
1.6e+13 ++ |
| |
1.55e+13 ++ |
| |
1.5e+13 ++ O |
O O O O O O O O O O O O |
| |
1.45e+13 ++ O O O O O O O O |
| |
1.4e+13 ++---------------------------------------------------------------+


perf-stat.instructions

1.3e+13 ++---------------------------------------------------------------+
| *.*. .*. .* |
1.28e+13 ++*..*. .. *..*.*.*. *.*..*.* + |
* *.* *..*.*.*..*.*.*..*.*.*..*.*
1.26e+13 ++ |
| |
1.24e+13 ++ |
| |
1.22e+13 ++ |
| |
1.2e+13 O+O O O O O O O O O O O |
| O |
1.18e+13 ++ O O |
| O O O O O O |
1.16e+13 ++---------------------------------------------------------------+


perf-stat.cache-references

1.78e+11 ++---------------------------------------------------------------+
1.76e+11 ++ .*. *. .*. |
|.*.. .* *..*. .. *.*..*.* *..*.*.*.. .*.*..*.*.*..*.*
1.74e+11 *+ *.*.*. *.* * |
1.72e+11 ++ |
| |
1.7e+11 ++ |
1.68e+11 ++ |
1.66e+11 ++ |
| |
1.64e+11 ++ O O |
1.62e+11 O+O O O O O O O O O O |
| |
1.6e+11 ++ O O O O O O O O |
1.58e+11 ++---------------------------------------------------------------+


perf-stat.branch-instructions

2.6e+12 ++---------------------------------------------------------------+
| .*. .* |
2.55e+12 ++ .*.*. .*.*.*. *.*..*.* : |
*.*..*.*.*. *. : .*. .*..*.*.*..*.*.*..*.*
| *. * |
2.5e+12 ++ |
| |
2.45e+12 ++ |
| |
2.4e+12 O+O O O |
| O O O O O O O O O |
| |
2.35e+12 ++ O O O O O O O O |
| |
2.3e+12 ++---------------------------------------------------------------+


perf-stat.branch-misses

6.6e+10 ++----------------------------------------------------------------+
| .*.*..*.*.*..*.*. .*.*..* |
6.5e+10 *+*..*.*.*. *..* + .*. |
| *.*..*.*..* *..*.*.*..*.*
6.4e+10 ++ |
| |
6.3e+10 ++ |
| |
6.2e+10 ++ |
| |
6.1e+10 ++ |
| O O |
6e+10 O+O O O O O O O O O |
| O O O |
5.9e+10 ++-----------------------------O-O----O-O-O----O------------------+


perf-stat.dTLB-loads

3.35e+12 ++---------------------------------------------------------------+
| *.*. .*. .*.*.* |
3.3e+12 *+*..*. .. *..*.*.*. *.*. + .*. |
| *.* *..*.*.*..* *..*.*.*..*.*
3.25e+12 ++ |
| |
3.2e+12 ++ |
| |
3.15e+12 ++ |
| |
3.1e+12 O+O O O O O O O O O O O O |
| |
3.05e+12 ++ |
| O O O O O O O O |
3e+12 ++---------------------------------------------------------------+


perf-stat.dTLB-stores

2.25e+12 ++---------------------------------------------------------------+
| |
2.2e+12 ++ *.*. .*..*. .*.*.* |
*.*..*.*. .. *..*.* *.*. + .*. .*.. .*.*..*. .*..*.*
| * *. * * * |
2.15e+12 ++ |
| |
2.1e+12 ++ |
| |
2.05e+12 O+O O O O O O O O O O O |
| O |
| O O O O O O |
2e+12 ++ |
| O O |
1.95e+12 ++---------------------------------------------------------------+


perf-stat.iTLB-load-misses

5.8e+09 ++----------------------------------------------------------------+
5.7e+09 ++ .*.*.. .*. .*.. |
*.*..*.*.*. *.*.*..*.*.*.. .*.*..*.*.*..*.*..*.*.*. * *.*
5.6e+09 ++ * |
5.5e+09 ++ |
5.4e+09 ++ |
5.3e+09 ++ |
| |
5.2e+09 ++ |
5.1e+09 ++ |
5e+09 ++ |
4.9e+09 O+O O O O O O O O O O O O |
| |
4.8e+09 ++ O O O O O O O O |
4.7e+09 ++----------------------------------------------------------------+


perf-stat.page-faults

5.2e+08 ++---------------------------------------------------------------+
5.15e+08 ++ *.*. |
|.*.. .. *..*. .*..*.*. .*.*.*. |
5.1e+08 *+ *.*.* * *. *..*.*.*..*.*.*..*.*.*..*.*
5.05e+08 ++ |
5e+08 ++ |
4.95e+08 ++ |
| |
4.9e+08 ++ |
4.85e+08 ++ |
4.8e+08 ++ |
4.75e+08 O+O O O O O O O O O O O O |
| |
4.7e+08 ++ O O O O |
4.65e+08 ++-------------------------------O-O------O-O--------------------+


perf-stat.minor-faults

5.2e+08 ++---------------------------------------------------------------+
5.15e+08 ++ *.*. |
|.*.. .. *..*. .*..*.*. .*.*.*. |
5.1e+08 *+ *.*.* * *. *..*.*.*..*.*.*..*.*.*..*.*
5.05e+08 ++ |
5e+08 ++ |
4.95e+08 ++ |
| |
4.9e+08 ++ |
4.85e+08 ++ |
4.8e+08 ++ |
4.75e+08 O+O O O O O O O O O O O O |
| |
4.7e+08 ++ O O O O |
4.65e+08 ++-------------------------------O-O------O-O--------------------+


perf-stat.iTLB-load-miss-rate_

75.5 ++-------------------------------------------------------------------+
| *..*. |
75 *+*.. + *.. .*.*.. .*.. .*.. .*
74.5 ++ * * .*.*..*. .*. .*..*.*..*.*. * * * |
| : *. *. * |
74 ++ : + |
| *..* |
73.5 ++ |
| |
73 ++ |
72.5 ++ |
| O O O O O O O O O O O O O O O O |
72 O+O O O O |
| |
71.5 ++-------------------------------------------------------------------+


perf-stat.ipc

0.81 ++------------------------------O----O-O--O----O-O-------------------+
O O O O O |
0.8 ++ O O O O O O O O O |
| O |
| |
0.79 ++ |
| |
0.78 ++ |
| |
0.77 ++ |
| |
*.*..*. .*..* .*. .*.. .*..*.*..*.|
0.76 ++ *..* + .*. .*.*..*. .* *..*.*..*.*..* * *
| *. *. *. |
0.75 ++-------------------------------------------------------------------+


perf-stat.instructions-per-iTLB-miss

2500 ++-------------------------------------------------------------------+
| |
2450 ++ O O O O |
O O O O O O O O O O O O |
| O O O O O |
2400 ++ |
| |
2350 ++ |
| |
2300 ++ |
| .*..*.*. |
*. .*. .*.*..*.*..*.*..* *.. .*.|
2250 ++*. *..*.*. *.*..*.*..*.*..*.*..*.*. *
| |
2200 ++-------------------------------------------------------------------+


unixbench.time.user_time

3150 ++-------------------------------------------------------------------+
*.*..*. .*..*.*..*.*..*.*.. .*.*.*.. .*.. .*.. .*.. .*
3100 ++ *..* *.*. *.*..*.*..* * * * |
3050 ++ |
| |
3000 ++ |
2950 ++ |
| |
2900 ++ |
2850 ++ |
| |
2800 ++ |
2750 O+O O O O O O O O O O O O |
| |
2700 ++------------------------------O--O-O-O--O-O--O-O-------------------+


unixbench.time.system_time

1450 ++-------------------------------------------------------------------+
| .*..*.*..*.*.. .*.. |
1400 ++ .*.*..* *.* |
| .*.. .*. *. .*. .*. |
*.*..* * *. *. *..*.*..*.*..*.*
1350 ++ |
| |
1300 ++ |
| |
1250 ++ |
| O O O |
O O O O O O O O O O |
1200 ++ |
| O O |
1150 ++------------------------------O--O---O--O-O----O-------------------+


unixbench.time.minor_page_faults

5.3e+08 ++---------------------------------------------------------------+
5.25e+08 ++ *.*. |
|.*..*. .. *..*.*.*..*.*. .*.*.*. .*. |
5.2e+08 *+ *.* *. *..*.*.*..* *..*.*.*..*.*
5.15e+08 ++ |
5.1e+08 ++ |
5.05e+08 ++ |
| |
5e+08 ++ |
4.95e+08 ++ |
4.9e+08 ++ |
4.85e+08 O+O O O O O O O O O O |
| O O |
4.8e+08 ++ |
4.75e+08 ++----------------------------O--O-O-O-O--O-O-O------------------+


unixbench.time.voluntary_context_switches

1.64e+07 ++---------------------------------------------------------------+
*.*..*. .*.* .*. |
1.62e+07 ++ *.*. + .*. .*..*. .*.*.*..*.*.*..* *..*.*.*..*.*
1.6e+07 ++ *. * *.*..* |
| |
1.58e+07 ++ |
1.56e+07 ++ |
| |
1.54e+07 ++ |
1.52e+07 ++ |
| |
1.5e+07 ++ |
1.48e+07 ++ |
O O O O O O O O O O O O O O O O O O O |
1.46e+07 ++---------------------O----O------------------------------------+

[*] bisect-good sample
[O] bisect-bad sample


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong


Attachments:
(No filename) (25.23 kB)
config-4.13.0-rc2-00024-gdca9399 (157.17 kB)
job-script (6.54 kB)
job.yaml (4.25 kB)
reproduce (277.00 B)
Download all attachments

2017-08-11 15:57:47

by Joel Fernandes

[permalink] [raw]
Subject: Re: [lkp-robot] [sched/fair] dca93994f6: unixbench.score -8.1% regression

On Thu, Aug 10, 2017 at 7:47 PM, kernel test robot
<[email protected]> wrote:
>
> Greeting,
>
> FYI, we noticed a -8.1% regression of unixbench.score due to commit:
>
>
> commit: dca93994f61becdd8d224155643a44ba284970f6 ("sched/fair: Make PELT signal more accurate")
> url: https://github.com/0day-ci/linux/commits/Joel-Fernandes/sched-fair-Make-PELT-signal-more-accurate/20170805-084820
>
>
> in testcase: unixbench
> on test machine: 8 threads Ivy Bridge with 16G memory
> with following parameters:
>
> runtime: 300s
> nr_task: 100%
> test: shell8
> cpufreq_governor: performance

Sorry! I really didn't see this overhead in hackbench (I guess it was
also very hard to see because of the variation). I will try
LKP/unixbench and measure again (I haven't used LKP before so can't
wait to try it!). I suspect this test is doing several updates within
a short time period . most of the errors in my usecase I see are
during infrequent updates so I'm happy to just fix the error for
those.

I'm glad to have a usecase now to measure this (assuming the unixbench
regression is real) :-)

A better patch is coming soon! thanks everyone!

-Joel


>
> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
> test-url: https://github.com/kdlucas/byte-unixbench
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/01org/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> testcase/path_params/tbox_group/run: unixbench/300s-100%-shell8-performance/lkp-ivb-d01
>
> dbe04493eddfaa89 dca93994f61becdd8d22415564
> ---------------- --------------------------
> %stddev change %stddev
> \ | \
> 13509 -8% 12415 unixbench.score
> 5.186e+08 -8% 4.765e+08 unixbench.time.minor_page_faults
> 16164186 -9% 14694969 unixbench.time.voluntary_context_switches
> 3100 -13% 2708 unixbench.time.user_time
> 12755309 -8% 11770274 unixbench.time.involuntary_context_switches
> 704 -13% 612 unixbench.time.percent_of_cpu_this_job_got
> 1368 -15% 1157 unixbench.time.system_time
> 72414 127% 164292 interrupts.CAL:Function_call_interrupts
> 71606 5% 75101 vmstat.system.cs
> 14967 15288 vmstat.system.in
> 45523474 4% 47526554 perf-stat.context-switches
> 0.76 6% 0.81 perf-stat.ipc
> 10697570 -4% 10257128 perf-stat.cpu-migrations
> 74.79 -3% 72.40 perf-stat.iTLB-load-miss-rate%
> 3.282e+12 -8% 3.032e+12 perf-stat.dTLB-loads
> 2246 9% 2446 perf-stat.instructions-per-iTLB-miss
> 1.748e+11 -9% 1.596e+11 perf-stat.cache-references
> 1.908e+09 -4% 1.827e+09 perf-stat.iTLB-loads
> 2.534e+12 -8% 2.337e+12 perf-stat.branch-instructions
> 1.272e+13 -8% 1.173e+13 perf-stat.instructions
> 2.178e+12 -8% 1.999e+12 perf-stat.dTLB-stores
> 5.082e+08 -8% 4.67e+08 perf-stat.minor-faults
> 5.082e+08 -8% 4.67e+08 perf-stat.page-faults
> 1.31 -6% 1.24 perf-stat.cpi
> 6.454e+10 -8% 5.915e+10 perf-stat.branch-misses
> 13.23 -4% 12.71 perf-stat.cache-miss-rate%
> 1.671e+13 -13% 1.451e+13 perf-stat.cpu-cycles
> 2.313e+10 -12% 2.028e+10 perf-stat.cache-misses
> 2.093e+09 -10% 1.882e+09 ą 6% perf-stat.dTLB-store-misses
> 5.66e+09 -15% 4.794e+09 perf-stat.iTLB-load-misses
>
>
>
> unixbench.score
>
> 13800 ++------------------------------------------------------------------+
> | *..*. |
> 13600 ++*..*. + *.*.. .*..*.*.. .*.*..*. |
> * *..* * * *..*.*..*.*.*..*.*..*.*..*.*
> 13400 ++ |
> | |
> 13200 ++ |
> | |
> 13000 ++ |
> | |
> 12800 ++ |
> O O O O O |
> 12600 ++ O O O O O O O O |
> | |
> 12400 ++------------------------------O-O-O--O-O--O-O--O------------------+
>
>
> perf-stat.cpu-cycles
>
> 1.7e+13 ++------------*-*--*-*-*--*-*-*------*---------------------------+
> *.*..*.*.*..* *.* *..*.*.*..*.*.*..*.*.*..*.*
> 1.65e+13 ++ |
> | |
> | |
> 1.6e+13 ++ |
> | |
> 1.55e+13 ++ |
> | |
> 1.5e+13 ++ O |
> O O O O O O O O O O O O |
> | |
> 1.45e+13 ++ O O O O O O O O |
> | |
> 1.4e+13 ++---------------------------------------------------------------+
>
>
> perf-stat.instructions
>
> 1.3e+13 ++---------------------------------------------------------------+
> | *.*. .*. .* |
> 1.28e+13 ++*..*. .. *..*.*.*. *.*..*.* + |
> * *.* *..*.*.*..*.*.*..*.*.*..*.*
> 1.26e+13 ++ |
> | |
> 1.24e+13 ++ |
> | |
> 1.22e+13 ++ |
> | |
> 1.2e+13 O+O O O O O O O O O O O |
> | O |
> 1.18e+13 ++ O O |
> | O O O O O O |
> 1.16e+13 ++---------------------------------------------------------------+
>
>
> perf-stat.cache-references
>
> 1.78e+11 ++---------------------------------------------------------------+
> 1.76e+11 ++ .*. *. .*. |
> |.*.. .* *..*. .. *.*..*.* *..*.*.*.. .*.*..*.*.*..*.*
> 1.74e+11 *+ *.*.*. *.* * |
> 1.72e+11 ++ |
> | |
> 1.7e+11 ++ |
> 1.68e+11 ++ |
> 1.66e+11 ++ |
> | |
> 1.64e+11 ++ O O |
> 1.62e+11 O+O O O O O O O O O O |
> | |
> 1.6e+11 ++ O O O O O O O O |
> 1.58e+11 ++---------------------------------------------------------------+
>
>
> perf-stat.branch-instructions
>
> 2.6e+12 ++---------------------------------------------------------------+
> | .*. .* |
> 2.55e+12 ++ .*.*. .*.*.*. *.*..*.* : |
> *.*..*.*.*. *. : .*. .*..*.*.*..*.*.*..*.*
> | *. * |
> 2.5e+12 ++ |
> | |
> 2.45e+12 ++ |
> | |
> 2.4e+12 O+O O O |
> | O O O O O O O O O |
> | |
> 2.35e+12 ++ O O O O O O O O |
> | |
> 2.3e+12 ++---------------------------------------------------------------+
>
>
> perf-stat.branch-misses
>
> 6.6e+10 ++----------------------------------------------------------------+
> | .*.*..*.*.*..*.*. .*.*..* |
> 6.5e+10 *+*..*.*.*. *..* + .*. |
> | *.*..*.*..* *..*.*.*..*.*
> 6.4e+10 ++ |
> | |
> 6.3e+10 ++ |
> | |
> 6.2e+10 ++ |
> | |
> 6.1e+10 ++ |
> | O O |
> 6e+10 O+O O O O O O O O O |
> | O O O |
> 5.9e+10 ++-----------------------------O-O----O-O-O----O------------------+
>
>
> perf-stat.dTLB-loads
>
> 3.35e+12 ++---------------------------------------------------------------+
> | *.*. .*. .*.*.* |
> 3.3e+12 *+*..*. .. *..*.*.*. *.*. + .*. |
> | *.* *..*.*.*..* *..*.*.*..*.*
> 3.25e+12 ++ |
> | |
> 3.2e+12 ++ |
> | |
> 3.15e+12 ++ |
> | |
> 3.1e+12 O+O O O O O O O O O O O O |
> | |
> 3.05e+12 ++ |
> | O O O O O O O O |
> 3e+12 ++---------------------------------------------------------------+
>
>
> perf-stat.dTLB-stores
>
> 2.25e+12 ++---------------------------------------------------------------+
> | |
> 2.2e+12 ++ *.*. .*..*. .*.*.* |
> *.*..*.*. .. *..*.* *.*. + .*. .*.. .*.*..*. .*..*.*
> | * *. * * * |
> 2.15e+12 ++ |
> | |
> 2.1e+12 ++ |
> | |
> 2.05e+12 O+O O O O O O O O O O O |
> | O |
> | O O O O O O |
> 2e+12 ++ |
> | O O |
> 1.95e+12 ++---------------------------------------------------------------+
>
>
> perf-stat.iTLB-load-misses
>
> 5.8e+09 ++----------------------------------------------------------------+
> 5.7e+09 ++ .*.*.. .*. .*.. |
> *.*..*.*.*. *.*.*..*.*.*.. .*.*..*.*.*..*.*..*.*.*. * *.*
> 5.6e+09 ++ * |
> 5.5e+09 ++ |
> 5.4e+09 ++ |
> 5.3e+09 ++ |
> | |
> 5.2e+09 ++ |
> 5.1e+09 ++ |
> 5e+09 ++ |
> 4.9e+09 O+O O O O O O O O O O O O |
> | |
> 4.8e+09 ++ O O O O O O O O |
> 4.7e+09 ++----------------------------------------------------------------+
>
>
> perf-stat.page-faults
>
> 5.2e+08 ++---------------------------------------------------------------+
> 5.15e+08 ++ *.*. |
> |.*.. .. *..*. .*..*.*. .*.*.*. |
> 5.1e+08 *+ *.*.* * *. *..*.*.*..*.*.*..*.*.*..*.*
> 5.05e+08 ++ |
> 5e+08 ++ |
> 4.95e+08 ++ |
> | |
> 4.9e+08 ++ |
> 4.85e+08 ++ |
> 4.8e+08 ++ |
> 4.75e+08 O+O O O O O O O O O O O O |
> | |
> 4.7e+08 ++ O O O O |
> 4.65e+08 ++-------------------------------O-O------O-O--------------------+
>
>
> perf-stat.minor-faults
>
> 5.2e+08 ++---------------------------------------------------------------+
> 5.15e+08 ++ *.*. |
> |.*.. .. *..*. .*..*.*. .*.*.*. |
> 5.1e+08 *+ *.*.* * *. *..*.*.*..*.*.*..*.*.*..*.*
> 5.05e+08 ++ |
> 5e+08 ++ |
> 4.95e+08 ++ |
> | |
> 4.9e+08 ++ |
> 4.85e+08 ++ |
> 4.8e+08 ++ |
> 4.75e+08 O+O O O O O O O O O O O O |
> | |
> 4.7e+08 ++ O O O O |
> 4.65e+08 ++-------------------------------O-O------O-O--------------------+
>
>
> perf-stat.iTLB-load-miss-rate_
>
> 75.5 ++-------------------------------------------------------------------+
> | *..*. |
> 75 *+*.. + *.. .*.*.. .*.. .*.. .*
> 74.5 ++ * * .*.*..*. .*. .*..*.*..*.*. * * * |
> | : *. *. * |
> 74 ++ : + |
> | *..* |
> 73.5 ++ |
> | |
> 73 ++ |
> 72.5 ++ |
> | O O O O O O O O O O O O O O O O |
> 72 O+O O O O |
> | |
> 71.5 ++-------------------------------------------------------------------+
>
>
> perf-stat.ipc
>
> 0.81 ++------------------------------O----O-O--O----O-O-------------------+
> O O O O O |
> 0.8 ++ O O O O O O O O O |
> | O |
> | |
> 0.79 ++ |
> | |
> 0.78 ++ |
> | |
> 0.77 ++ |
> | |
> *.*..*. .*..* .*. .*.. .*..*.*..*.|
> 0.76 ++ *..* + .*. .*.*..*. .* *..*.*..*.*..* * *
> | *. *. *. |
> 0.75 ++-------------------------------------------------------------------+
>
>
> perf-stat.instructions-per-iTLB-miss
>
> 2500 ++-------------------------------------------------------------------+
> | |
> 2450 ++ O O O O |
> O O O O O O O O O O O O |
> | O O O O O |
> 2400 ++ |
> | |
> 2350 ++ |
> | |
> 2300 ++ |
> | .*..*.*. |
> *. .*. .*.*..*.*..*.*..* *.. .*.|
> 2250 ++*. *..*.*. *.*..*.*..*.*..*.*..*.*. *
> | |
> 2200 ++-------------------------------------------------------------------+
>
>
> unixbench.time.user_time
>
> 3150 ++-------------------------------------------------------------------+
> *.*..*. .*..*.*..*.*..*.*.. .*.*.*.. .*.. .*.. .*.. .*
> 3100 ++ *..* *.*. *.*..*.*..* * * * |
> 3050 ++ |
> | |
> 3000 ++ |
> 2950 ++ |
> | |
> 2900 ++ |
> 2850 ++ |
> | |
> 2800 ++ |
> 2750 O+O O O O O O O O O O O O |
> | |
> 2700 ++------------------------------O--O-O-O--O-O--O-O-------------------+
>
>
> unixbench.time.system_time
>
> 1450 ++-------------------------------------------------------------------+
> | .*..*.*..*.*.. .*.. |
> 1400 ++ .*.*..* *.* |
> | .*.. .*. *. .*. .*. |
> *.*..* * *. *. *..*.*..*.*..*.*
> 1350 ++ |
> | |
> 1300 ++ |
> | |
> 1250 ++ |
> | O O O |
> O O O O O O O O O O |
> 1200 ++ |
> | O O |
> 1150 ++------------------------------O--O---O--O-O----O-------------------+
>
>
> unixbench.time.minor_page_faults
>
> 5.3e+08 ++---------------------------------------------------------------+
> 5.25e+08 ++ *.*. |
> |.*..*. .. *..*.*.*..*.*. .*.*.*. .*. |
> 5.2e+08 *+ *.* *. *..*.*.*..* *..*.*.*..*.*
> 5.15e+08 ++ |
> 5.1e+08 ++ |
> 5.05e+08 ++ |
> | |
> 5e+08 ++ |
> 4.95e+08 ++ |
> 4.9e+08 ++ |
> 4.85e+08 O+O O O O O O O O O O |
> | O O |
> 4.8e+08 ++ |
> 4.75e+08 ++----------------------------O--O-O-O-O--O-O-O------------------+
>
>
> unixbench.time.voluntary_context_switches
>
> 1.64e+07 ++---------------------------------------------------------------+
> *.*..*. .*.* .*. |
> 1.62e+07 ++ *.*. + .*. .*..*. .*.*.*..*.*.*..* *..*.*.*..*.*
> 1.6e+07 ++ *. * *.*..* |
> | |
> 1.58e+07 ++ |
> 1.56e+07 ++ |
> | |
> 1.54e+07 ++ |
> 1.52e+07 ++ |
> | |
> 1.5e+07 ++ |
> 1.48e+07 ++ |
> O O O O O O O O O O O O O O O O O O O |
> 1.46e+07 ++---------------------O----O------------------------------------+
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Xiaolong
>
> --
> You received this message because you are subscribed to the Google Groups "kernel-team" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].