2002-08-22 09:46:24

by Yoann Vandoorselaere

[permalink] [raw]
Subject: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

Hi,

The "low_part * mult" multiplication of the old function may overflow a
32bits integer...

This patch both fix the overflow issue (tested with frequencies up to
20Ghz), and make the result of the function lose less precision.

Please apply,

--
Yoann Vandoorselaere, http://www.prelude-ids.org

"Programming is a race between programmers, who try and make more and
more idiot-proof software, and universe, which produces more and more
remarkable idiots. Until now, universe leads the race" -- R. Cook


Attachments:
cpufreq-overflow.diff (720.00 B)

2002-08-22 10:44:16

by Gabriel Paubert

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

Yoann Vandoorselaere wrote:
> Hi,
>
> The "low_part * mult" multiplication of the old function may overflow a
> 32bits integer...
>
> This patch both fix the overflow issue (tested with frequencies up to
> 20Ghz), and make the result of the function lose less precision.
>
> Please apply,
>
>
>
> ------------------------------------------------------------------------
>
> --- linux-benh/kernel/cpufreq.c 2002-08-21 17:27:52.000000000 +0200
> +++ linux-yoann/kernel/cpufreq.c 2002-08-22 11:27:09.000000000 +0200
> @@ -78,14 +78,16 @@ static unsigned int cpufreq_
> */
> static unsigned long scale(unsigned long old, u_int div, u_int mult)
> {
> - unsigned long low_part, high_part;
> -
> - high_part = old / div;
> - low_part = (old % div) / 100;
> - high_part *= mult;
> - low_part = low_part * mult / div;
> -
> - return high_part + low_part * 100;
> + unsigned long val, carry = 0;
> +
> + mult /= 100;
> + div /= 100;

if(abs(div)<100) div=0;

> + val = old / div * mult;

Now happily divide by zero.

> +
> + carry = old % div;

Again.

> + carry = carry * mult / div;

Again.

> +
> + return val + carry;
> }
>

And I can't see how it can be more precise, you divide the numerator and
denominator of the fraction by 100 and then proceed forgetting
everything about the rest. Basically this looses about 7 bits of precision.

Now altogether I believe that such a function pertains to a per arch
optimized routine.






2002-08-22 11:00:09

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

Hi Gabriel !

>if(abs(div)<100) div=0;
>
>> + val = old / div * mult;
>
>Now happily divide by zero.
>
>> +
>> + carry = old % div;
>
>Again.
>
>> + carry = carry * mult / div;
>
>Again.
>
>> +
>> + return val + carry;
>> }

None of the above can happen in the domain of application of this
function. It's used to scale up/down the loops_per_jiffy value when
scaling the CPU frequency. Anyway, the above isn't worse than the
original function. Ideally, we would want 64 bits arithmetics, but
we decided long ago not to bring the libcc support routines for that
in the kernel.
>
>And I can't see how it can be more precise, you divide the numerator and
>denominator of the fraction by 100 and then proceed forgetting
>everything about the rest. Basically this looses about 7 bits of precision.

Which is mostly ok for what we need. I think Yoann didn't mean it's
more precise that what it replace, but rather more precise than his
original implementation that divided by 1000 ;) Anyway, it's not
significantly worse than what we had and won't overflow as easily
which is all we want for this routine now.

>Now altogether I believe that such a function pertains to a per arch
>optimized routine.

Maybe... though in the context of cpufreq, it may not make that much
sense.

Ben.


2002-08-22 10:58:30

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

Hi Gabriel !

>if(abs(div)<100) div=0;
>
>> + val = old / div * mult;
>
>Now happily divide by zero.
>
>> +
>> + carry = old % div;
>
>Again.
>
>> + carry = carry * mult / div;
>
>Again.
>
>> +
>> + return val + carry;
>> }

None of the above can happen in the domain of application of this
function. It's used to scale up/down the loops_per_jiffy value when
scaling the CPU frequency. Anyway, the above isn't worse than the
original function. Ideally, we would want 64 bits arithmetics, but
we decided long ago not to bring the libcc support routines for that
in the kernel.
>
>And I can't see how it can be more precise, you divide the numerator and
>denominator of the fraction by 100 and then proceed forgetting
>everything about the rest. Basically this looses about 7 bits of precision.

Which is mostly ok for what we need. I think Yoann didn't mean it's
more precise that what it replace, but rather more precise than his
original implementation that divided by 1000 ;) Anyway, it's not
significantly worse than what we had and won't overflow as easily
which is all we want for this routine now.

>Now altogether I believe that such a function pertains to a per arch
>optimized routine.

Maybe... though in the context of cpufreq, it may not make that much
sense.

Ben.


2002-08-22 12:08:28

by Gabriel Paubert

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

Hi Ben,

> None of the above can happen in the domain of application of this
> function. It's used to scale up/down the loops_per_jiffy value when
> scaling the CPU frequency. Anyway, the above isn't worse than the
> original function. Ideally, we would want 64 bits arithmetics, but we
> decided long ago not to bring the libcc support routines for that in
> the kernel.

Well, first on sane archs which have an easily accessible, fixed
frequency time counter, loops_per_jiffy should never have existed :-)

Second, putting this code there means that one day somebody will
inevitably try to use it outside of its domain of operation (like it
happened for div64 a few months ago when I pointed out that it would not
work for divisors above 65535 or so).

Finally, I agree that we should not import libgcc, but for example on
PPC32 the double lengths shifts (__ashrdi3, __ashldi3, and __lshsldi3)
are implemented somewhere, and the assembly implementation (directly
taken from some appendix in PPC documentation, I just slightly twisted
__ashrdi3 to make it branchless AFAIR) is actually way faster than the
one in libgcc ;-), and less than half the size.

Adding a few subroutines that implement a subset of libgcc's
functionality is necessary for most archs (which functions are needed is
arch, and sometimes compiler's, dependent).

In this case a generic scaling function, while not a standard libgcc/C
library feature has potentially more applications than this simple
cpufreq approximation. But I don't see very much the need for scaling a
long (64 bit on 64 bit archs) value, 32 bit would be sufficient.

Regards,
Gabriel.

2002-08-22 12:26:00

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

>Well, first on sane archs which have an easily accessible, fixed
>frequency time counter, loops_per_jiffy should never have existed :-)
>
>Second, putting this code there means that one day somebody will
>inevitably try to use it outside of its domain of operation (like it
>happened for div64 a few months ago when I pointed out that it would not
>work for divisors above 65535 or so).

Well... it's clearly located inside kernel/cpufreq.c, so there is
little risk, though it may be worth a big bold comment

>Finally, I agree that we should not import libgcc, but for example on
>PPC32 the double lengths shifts (__ashrdi3, __ashldi3, and __lshsldi3)
>are implemented somewhere, and the assembly implementation (directly
>taken from some appendix in PPC documentation, I just slightly twisted
>__ashrdi3 to make it branchless AFAIR) is actually way faster than the
>one in libgcc ;-), and less than half the size.
>
> Adding a few subroutines that implement a subset of libgcc's
>functionality is necessary for most archs (which functions are needed is
>arch, and sometimes compiler's, dependent).
>
>In this case a generic scaling function, while not a standard libgcc/C
>library feature has potentially more applications than this simple
>cpufreq approximation. But I don't see very much the need for scaling a
>long (64 bit on 64 bit archs) value, 32 bit would be sufficient.

Well... if you can write one, go on then ;) In my case, I'm happy
with Yoann implementation for cpufreq right now. Though I agree that
could ultimately be moved to arch code.

Ben.


2002-08-22 15:20:16

by Gabriel Paubert

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

Benjamin Herrenschmidt wrote:
>>Well, first on sane archs which have an easily accessible, fixed
>>frequency time counter, loops_per_jiffy should never have existed :-)
>>
>>Second, putting this code there means that one day somebody will
>>inevitably try to use it outside of its domain of operation (like it
>>happened for div64 a few months ago when I pointed out that it would not
>>work for divisors above 65535 or so).
>
>
> Well... it's clearly located inside kernel/cpufreq.c, so there is
> little risk, though it may be worth a big bold comment

Hmm, in my experience people hardly ever read detailed comments even
when they are well-written. Perhaps if you called the function
imprecise_scale or coarse_scale, it might ring a bell.

Besides that functions should do one thing and do that *well*[1]. Well,
I'm usually not too dogmatic, but this function breaks the second rule
beyond what I find acceptable.

>>In this case a generic scaling function, while not a standard libgcc/C
>>library feature has potentially more applications than this simple
>>cpufreq approximation. But I don't see very much the need for scaling a
>>long (64 bit on 64 bit archs) value, 32 bit would be sufficient.
>
>
> Well... if you can write one, go on then ;) In my case, I'm happy
> with Yoann implementation for cpufreq right now. Though I agree that
> could ultimately be moved to arch code.

Ok, I'll give it a try this week-end (PPC, i386 and all 64 bit should
archs should be trivial).

Gabriel.

[1] Documentation/CodingStyle, which also claims that functions should
be short and *sweet*. Well, I found the patch far too bitter ;-).


2002-08-22 15:55:09

by Yoann Vandoorselaere

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

On Thu, 2002-08-22 at 17:23, Gabriel Paubert wrote:
> Benjamin Herrenschmidt wrote:
> >>Well, first on sane archs which have an easily accessible, fixed
> >>frequency time counter, loops_per_jiffy should never have existed :-)
> >>
> >>Second, putting this code there means that one day somebody will
> >>inevitably try to use it outside of its domain of operation (like it
> >>happened for div64 a few months ago when I pointed out that it would not
> >>work for divisors above 65535 or so).
> >
> >
> > Well... it's clearly located inside kernel/cpufreq.c, so there is
> > little risk, though it may be worth a big bold comment
>
> Hmm, in my experience people hardly ever read detailed comments even
> when they are well-written. Perhaps if you called the function
> imprecise_scale or coarse_scale, it might ring a bell.
>
> Besides that functions should do one thing and do that *well*[1]. Well,
> I'm usually not too dogmatic, but this function breaks the second rule
> beyond what I find acceptable.

At least it report *correct* result (when the old one was returning BS
because of the 32 bits integer overflow). Doing it well require per
architecture support.


> >>In this case a generic scaling function, while not a standard libgcc/C
> >>library feature has potentially more applications than this simple
> >>cpufreq approximation. But I don't see very much the need for scaling a
> >>long (64 bit on 64 bit archs) value, 32 bit would be sufficient.
> >
> >
> > Well... if you can write one, go on then ;) In my case, I'm happy
> > with Yoann implementation for cpufreq right now. Though I agree that
> > could ultimately be moved to arch code.

[...]

> [1] Documentation/CodingStyle, which also claims that functions should
> be short and *sweet*. Well, I found the patch far too bitter ;-).

No wonder why you're loosing contributor with such comportment.

--
Yoann Vandoorselaere, http://www.prelude-ids.org

"Programming is a race between programmers, who try and make more and
more idiot-proof software, and universe, which produces more and more
remarkable idiots. Until now, universe leads the race" -- R. Cook

2002-08-22 17:01:13

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

Hi,

> > Well... it's clearly located inside kernel/cpufreq.c, so there is
> > little risk, though it may be worth a big bold comment
>
> Hmm, in my experience people hardly ever read detailed comments even
> when they are well-written. Perhaps if you called the function
> imprecise_scale or coarse_scale, it might ring a bell.

First of all, it's located in include/linux/cpufreq.h [to be accessible for
arch/i386/kernel/time.c, called cpufreq_scale() which should mean that it is
only meant for CPUFreq and nothing else.

> >>In this case a generic scaling function, while not a standard libgcc/C
> >>library feature has potentially more applications than this simple
> >>cpufreq approximation. But I don't see very much the need for scaling a
> >>long (64 bit on 64 bit archs) value, 32 bit would be sufficient.
> >
> >
> > Well... if you can write one, go on then ;) In my case, I'm happy
> > with Yoann implementation for cpufreq right now. Though I agree that
> > could ultimately be moved to arch code.
>
> Ok, I'll give it a try this week-end (PPC, i386 and all 64 bit should
> archs should be trivial).

IMHO per-arch functions are really not needed. The only architectures which
have CPUFreq drivers by now are ARM and i386. This will change, hopefully;
IMHO it should be enough to include some basic limit checking in
cpufreq_scale().

Dominik

2002-08-22 17:31:53

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

>IMHO per-arch functions are really not needed. The only architectures which
>have CPUFreq drivers by now are ARM and i386. This will change, hopefully;
>IMHO it should be enough to include some basic limit checking in
>cpufreq_scale().

In this specific case, we were talking about PPC since the problem
occured when I implemented cpufreq support to switch the speed
of the latest powerbooks between 667 and 800Mhz

Ben.


2002-08-22 17:45:52

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

On Thu, Aug 22, 2002 at 09:35:16PM +0200, Benjamin Herrenschmidt wrote:
> >IMHO per-arch functions are really not needed. The only architectures which
> >have CPUFreq drivers by now are ARM and i386. This will change, hopefully;
> >IMHO it should be enough to include some basic limit checking in
> >cpufreq_scale().
>
> In this specific case, we were talking about PPC since the problem
> occured when I implemented cpufreq support to switch the speed
> of the latest powerbooks between 667 and 800Mhz

And the patch from Yoann solves this? Then it might be easiest to use this
for the time being, and switch to George Anzinger's sc_math.h once
high-res-timer is merged.

Dominik

2002-08-22 17:55:50

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

>> In this specific case, we were talking about PPC since the problem
>> occured when I implemented cpufreq support to switch the speed
>> of the latest powerbooks between 667 and 800Mhz
>
>And the patch from Yoann solves this? Then it might be easiest to use this
>for the time being, and switch to George Anzinger's sc_math.h once
>high-res-timer is merged.

Provided Yoann patch doesn't break other machines, that's what I would
do, yes.

Ben.


2002-08-22 17:57:58

by Yoann Vandoorselaere

[permalink] [raw]
Subject: Re: [PATCH]: fix 32bits integer overflow in loops_per_jiffy calculation

On Thu, 2002-08-22 at 19:46, Dominik Brodowski wrote:
> On Thu, Aug 22, 2002 at 09:35:16PM +0200, Benjamin Herrenschmidt wrote:
> > >IMHO per-arch functions are really not needed. The only architectures which
> > >have CPUFreq drivers by now are ARM and i386. This will change, hopefully;
> > >IMHO it should be enough to include some basic limit checking in
> > >cpufreq_scale().
> >
> > In this specific case, we were talking about PPC since the problem
> > occured when I implemented cpufreq support to switch the speed
> > of the latest powerbooks between 667 and 800Mhz
>
> And the patch from Yoann solves this?

Yep, the integer overflow resulted in an incorrectly computed
loops_per_jiffy :

Aug 21 19:50:41 titane kernel: adjust_jiffies: prechange cur=667000, new=800000
Aug 21 19:50:41 titane kernel: old loop_per_jiffy = 665.19 (cpufreq_ref_loops=3325952, cpufreq_ref_freq=667000).
Aug 21 19:50:41 titane kernel: new loop_per_jiffy = 669.02 (cpufreq_ref_loops=3325952, cpufreq_ref_freq=667000).

With the patch applied, it work fine :

Aug 22 11:33:40 titane kernel: adjust_jiffies: prechange cur=667000, new=800000
Aug 22 11:33:40 titane kernel: old loop_per_jiffy = 665.19 (cpufreq_ref_loops=3325952, cpufreq_ref_freq=667000).
Aug 22 11:33:40 titane kernel: new loop_per_jiffy = 797.82 (cpufreq_ref_loops=3325952, cpufreq_ref_freq=667000).

--
Yoann Vandoorselaere, http://www.prelude-ids.org

"Programming is a race between programmers, who try and make more and
more idiot-proof software, and universe, which produces more and more
remarkable idiots. Until now, universe leads the race" -- R. Cook