2008-02-11 13:38:27

by Carlos R. Mafra

[permalink] [raw]
Subject: [2.6.25-rc1] Strange regression with CONFIG_HZ_300=y

I apologize in advance if I am crazy about this, but I noticed
a strange regression wrt 2.6.24 in cpufreq (I think) in 2.6.25-rc1, which
goes away if I revert the following commit:

commit bdc807871d58285737d50dc6163d0feb72cb0dc2
Author: H. Peter Anvin <[email protected]>
Date: Fri Feb 8 04:21:26 2008 -0800

avoid overflows in kernel/time.c

When the conversion factor between jiffies and milli- or microseconds is
not a single multiply or divide, as for the case of HZ == 300, we currently
do a multiply followed by a divide. The intervening result, however, is
subject to overflows, especially since the fraction is not simplified (for
HZ == 300, we multiply by 300 and divide by 1000).

This is exposed to the user when passing a large timeout to poll(), for
example.

This patch replaces the multiply-divide with a reciprocal multiplication on
32-bit platforms. When the input is an unsigned long, there is no portable
way to do this on 64-bit platforms there is no portable way to do this
since it requires a 128-bit intermediate result (which gcc does support on
64-bit platforms but may generate libgcc calls, e.g. on 64-bit s390), but
since the output is a 32-bit integer in the cases affected, just simplify
the multiply-divide (*3/10 instead of *300/1000).

The reciprocal multiply used can have off-by-one errors in the upper half
of the valid output range. This could be avoided at the expense of having
to deal with a potential 65-bit intermediate result. Since the intent is
to avoid overflow problems and most of the other time conversions are only
semiexact, the off-by-one errors were considered an acceptable tradeoff.

[...]
[more text follows]

The problem in vanilla 2.6.25-rc1 happens with CONFIG_HZ_300=y (and doesn't
with CONFIG_HZ_250=y or with the above commit reverted). The cpu frequency doesn't
change anymore regardless of the load, and it stays high (2.0 GHz or 1.2 GHz) even
when idle (I checked with 'top'), when the usual is to go to 800 Mhz when idle (I
always use the ondemand governor compiled in and as the default governor).

The laptop is a Vaio VGN-FZ240E, core 2 duo T7250 @ 2.0 GHz and the kernel is x86_64.

If someone needs more information about this I will be happy to provide.

Carlos R. Mafra



2008-02-12 08:52:57

by Éric Piel

[permalink] [raw]
Subject: Re: [2.6.25-rc1] Strange regression with CONFIG_HZ_300=y

Carlos R. Mafra wrote:
> I apologize in advance if I am crazy about this, but I noticed
> a strange regression wrt 2.6.24 in cpufreq (I think) in 2.6.25-rc1, which
> goes away if I revert the following commit:
>
> commit bdc807871d58285737d50dc6163d0feb72cb0dc2
> Author: H. Peter Anvin <[email protected]>
> Date: Fri Feb 8 04:21:26 2008 -0800
>
> avoid overflows in kernel/time.c
>
> When the conversion factor between jiffies and milli- or microseconds is
> not a single multiply or divide, as for the case of HZ == 300, we currently
> do a multiply followed by a divide. The intervening result, however, is
> subject to overflows, especially since the fraction is not simplified (for
> HZ == 300, we multiply by 300 and divide by 1000).
>
> This is exposed to the user when passing a large timeout to poll(), for
> example.
>
> This patch replaces the multiply-divide with a reciprocal multiplication on
> 32-bit platforms. When the input is an unsigned long, there is no portable
> way to do this on 64-bit platforms there is no portable way to do this
> since it requires a 128-bit intermediate result (which gcc does support on
> 64-bit platforms but may generate libgcc calls, e.g. on 64-bit s390), but
> since the output is a 32-bit integer in the cases affected, just simplify
> the multiply-divide (*3/10 instead of *300/1000).
>
> The reciprocal multiply used can have off-by-one errors in the upper half
> of the valid output range. This could be avoided at the expense of having
> to deal with a potential 65-bit intermediate result. Since the intent is
> to avoid overflow problems and most of the other time conversions are only
> semiexact, the off-by-one errors were considered an acceptable tradeoff.
>
> [...]
> [more text follows]
>
> The problem in vanilla 2.6.25-rc1 happens with CONFIG_HZ_300=y (and doesn't
> with CONFIG_HZ_250=y or with the above commit reverted). The cpu frequency doesn't
> change anymore regardless of the load, and it stays high (2.0 GHz or 1.2 GHz) even
> when idle (I checked with 'top'), when the usual is to go to 800 Mhz when idle (I
> always use the ondemand governor compiled in and as the default governor).
>
> The laptop is a Vaio VGN-FZ240E, core 2 duo T7250 @ 2.0 GHz and the kernel is x86_64.

Hi, it's great you found out the culprit commit because I was really
wondering where this bug was coming from...
As a data point, my machine has a core 2 duo @ 1.2GHz and x86_64 arch.
Do you also have the tickless option activated? (it could play a role)

See you,
Eric


Attachments:
E_A_B_Piel.vcf (342.00 B)

2008-02-12 11:38:26

by Carlos R. Mafra

[permalink] [raw]
Subject: Re: [2.6.25-rc1] Strange regression with CONFIG_HZ_300=y

Eric Piel wrote:
> Carlos R. Mafra wrote:
>> I apologize in advance if I am crazy about this, but I noticed
>> a strange regression wrt 2.6.24 in cpufreq (I think) in 2.6.25-rc1, which
>> goes away if I revert the following commit:
>>
>> commit bdc807871d58285737d50dc6163d0feb72cb0dc2
>> Author: H. Peter Anvin <[email protected]>
>> Date: Fri Feb 8 04:21:26 2008 -0800
>>
>> avoid overflows in kernel/time.c
>>
>> When the conversion factor between jiffies and milli- or
>> microseconds is
>> not a single multiply or divide, as for the case of HZ == 300, we
>> currently
>> do a multiply followed by a divide. The intervening result,
>> however, is
>> subject to overflows, especially since the fraction is not
>> simplified (for
>> HZ == 300, we multiply by 300 and divide by 1000).
>>
>> This is exposed to the user when passing a large timeout to
>> poll(), for
>> example.
>>
>> This patch replaces the multiply-divide with a reciprocal
>> multiplication on
>> 32-bit platforms. When the input is an unsigned long, there is no
>> portable
>> way to do this on 64-bit platforms there is no portable way to do
>> this
>> since it requires a 128-bit intermediate result (which gcc does
>> support on
>> 64-bit platforms but may generate libgcc calls, e.g. on 64-bit
>> s390), but
>> since the output is a 32-bit integer in the cases affected, just
>> simplify
>> the multiply-divide (*3/10 instead of *300/1000).
>>
>> The reciprocal multiply used can have off-by-one errors in the
>> upper half
>> of the valid output range. This could be avoided at the expense
>> of having
>> to deal with a potential 65-bit intermediate result. Since the
>> intent is
>> to avoid overflow problems and most of the other time conversions
>> are only
>> semiexact, the off-by-one errors were considered an acceptable
>> tradeoff.
>>
>> [...]
>> [more text follows]
>>
>> The problem in vanilla 2.6.25-rc1 happens with CONFIG_HZ_300=y (and
>> doesn't
>> with CONFIG_HZ_250=y or with the above commit reverted). The cpu
>> frequency doesn't
>> change anymore regardless of the load, and it stays high (2.0 GHz or
>> 1.2 GHz) even
>> when idle (I checked with 'top'), when the usual is to go to 800 Mhz
>> when idle (I
>> always use the ondemand governor compiled in and as the default
>> governor).
>>
>> The laptop is a Vaio VGN-FZ240E, core 2 duo T7250 @ 2.0 GHz and the
>> kernel is x86_64.
>
> Hi, it's great you found out the culprit commit because I was really
> wondering where this bug was coming from...

Nice!

> As a data point, my machine has a core 2 duo @ 1.2GHz and x86_64 arch.
> Do you also have the tickless option activated? (it could play a role)

Yes, I have tickless enabled.

> See you,
> Eric

2008-02-12 19:23:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [2.6.25-rc1] Strange regression with CONFIG_HZ_300=y

diff --git a/kernel/timeconst.pl b/kernel/timeconst.pl
index 62b1287..4146803 100644
--- a/kernel/timeconst.pl
+++ b/kernel/timeconst.pl
@@ -339,7 +339,7 @@ sub output($@)
print "\n";

foreach $pfx ('HZ_TO_MSEC','MSEC_TO_HZ',
- 'USEC_TO_HZ','HZ_TO_USEC') {
+ 'HZ_TO_USEC','USEC_TO_HZ') {
foreach $bit (32, 64) {
foreach $suf ('MUL', 'ADJ', 'SHR') {
printf "#define %-23s %s\n",


Attachments:
diff (409.00 B)

2008-02-12 19:49:57

by Carlos R. Mafra

[permalink] [raw]
Subject: Re: [2.6.25-rc1] Strange regression with CONFIG_HZ_300=y

H. Peter Anvin wrote:
> Eric Piel wrote:
>>>
>>> The laptop is a Vaio VGN-FZ240E, core 2 duo T7250 @ 2.0 GHz and the
>>> kernel is x86_64.
>>
>> Hi, it's great you found out the culprit commit because I was really
>> wondering where this bug was coming from...
>> As a data point, my machine has a core 2 duo @ 1.2GHz and x86_64 arch.
>> Do you also have the tickless option activated? (it could play a role)
>>
>
> I believe this patch should fix the problem. Can you please verify?

I've just tested your patch on top of 2.6.25-rc1 and it fixes the
problem!

> Thanks,
>
> -hpa

Thank you very much for solving this so fast,
Carlos