2017-12-20 14:20:12

by Crt Mori

[permalink] [raw]
Subject: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

There is no option to perform 64bit integer sqrt on 32bit platform.
Added stronger typed int_sqrt64 enables the 64bit calculations to
be performed on 32bit platforms. Although int_sqrt() is a rough
approximation, the same algorithm is used in int_sqrt64() as good
enough on 32bit platform.

Signed-off-by: Crt Mori <[email protected]>
---
include/linux/kernel.h | 9 +++++++++
lib/int_sqrt.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 0ad4c3044cf9..e4a3dc64e650 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -460,6 +460,15 @@ extern int func_ptr_is_kernel_text(void *ptr);

unsigned long int_sqrt(unsigned long);

+#if BITS_PER_LONG < 64
+u32 int_sqrt64(u64 x);
+#else
+static inline u32 int_sqrt64(u64 x)
+{
+ return (u32)int_sqrt(x);
+}
+#endif
+
extern void bust_spinlocks(int yes);
extern int oops_in_progress; /* If set, an oops, panic(), BUG() or die() is in progress */
extern int panic_timeout;
diff --git a/lib/int_sqrt.c b/lib/int_sqrt.c
index 1ef4cc344977..184da54677ec 100644
--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -7,6 +7,7 @@

#include <linux/kernel.h>
#include <linux/export.h>
+#include <linux/bitops.h>

/**
* int_sqrt - rough approximation to sqrt
@@ -36,3 +37,33 @@ unsigned long int_sqrt(unsigned long x)
return y;
}
EXPORT_SYMBOL(int_sqrt);
+
+#if BITS_PER_LONG < 64
+/**
+ * int_sqrt64 - strongly typed int_sqrt function when minimum 64 bit input
+ * is expected.
+ * @x: 64bit integer of which to calculate the sqrt
+ */
+u32 int_sqrt64(u64 x)
+{
+ u64 b, m, y = 0;
+
+ if (x <= 1)
+ return x;
+
+ m = 1ULL << (fls64(x) & ~1ULL);
+ while (m != 0) {
+ b = y + m;
+ y >>= 1;
+
+ if (x >= b) {
+ x -= b;
+ y += m;
+ }
+ m >>= 2;
+ }
+
+ return y;
+}
+EXPORT_SYMBOL(int_sqrt64);
+#endif
--
2.15.0


2017-12-20 14:39:15

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

From: Crt Mori
> Sent: 20 December 2017 14:20
>
> There is no option to perform 64bit integer sqrt on 32bit platform.
> Added stronger typed int_sqrt64 enables the 64bit calculations to
> be performed on 32bit platforms. Although int_sqrt() is a rough
> approximation, the same algorithm is used in int_sqrt64() as good
> enough on 32bit platform.

The algorithm used gives an exact (rounded down) square root,
not an approximation.

With minor changes it ought to be possible to remove most of the
64bit arithmetic and shifts.

If you care about performance then using 32 bit maths will be much faster.

David

2017-12-20 15:42:33

by Crt Mori

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

On 20 December 2017 at 15:39, David Laight <[email protected]> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 14:20
>>
>> There is no option to perform 64bit integer sqrt on 32bit platform.
>> Added stronger typed int_sqrt64 enables the 64bit calculations to
>> be performed on 32bit platforms. Although int_sqrt() is a rough
>> approximation, the same algorithm is used in int_sqrt64() as good
>> enough on 32bit platform.
>
> The algorithm used gives an exact (rounded down) square root,
> not an approximation.
>

OK, noted. This is from new changed function indeed - will change in v11.

> With minor changes it ought to be possible to remove most of the
> 64bit arithmetic and shifts.
>
> If you care about performance then using 32 bit maths will be much faster.
>

I need precision not the performance. That is why I would like to
leave int_sqrt as one for good performance while my will be used when
you require a precision.

> David
>

2017-12-20 16:00:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:

> With minor changes it ought to be possible to remove most of the
> 64bit arithmetic and shifts.
>
> If you care about performance then using 32 bit maths will be much faster.

Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
indicated that improvement is possible. At the very least y can be made
a u32 I suppose.

2017-12-20 16:18:20

by Crt Mori

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

On 20 December 2017 at 17:00, Peter Zijlstra <[email protected]> wrote:
> On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
>
>> With minor changes it ought to be possible to remove most of the
>> 64bit arithmetic and shifts.
>>
>> If you care about performance then using 32 bit maths will be much faster.
>
> Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> indicated that improvement is possible. At the very least y can be made
> a u32 I suppose.

OK, is there any more easy optimizations you see?

2017-12-20 16:46:17

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

From: Crt Mori
> Sent: 20 December 2017 16:17
>
> On 20 December 2017 at 17:00, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
> >
> >> With minor changes it ought to be possible to remove most of the
> >> 64bit arithmetic and shifts.
> >>
> >> If you care about performance then using 32 bit maths will be much faster.
> >
> > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> > indicated that improvement is possible. At the very least y can be made
> > a u32 I suppose.
>
> OK, is there any more easy optimizations you see?

I think this version works.
It doesn't have the optimisation for small values.

unsigned int sqrt64(unsigned long long x)
{
unsigned int x_hi = x >> 32;

unsigned int b = 0;
unsigned int y = 0;
unsigned int i;

for (i = 0; i < 32; i++) {
b <<= 2;
b |= x_hi >> 30;
x_hi <<= 2;
if (i == 15)
x_hi = x;
y <<= 1;
if (b > y)
b -= ++y;
}
return y;
}

Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
and compare to that of your version.

David


2017-12-20 17:19:45

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

On Wed, 2017-12-20 at 16:46 +0000, David Laight wrote:
> From: Crt Mori
> > Sent: 20 December 2017 16:17
> >
> > On 20 December 2017 at 17:00, Peter Zijlstra <[email protected]> wrote:
> > > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
> > >
> > > > With minor changes it ought to be possible to remove most of the
> > > > 64bit arithmetic and shifts.
> > > >
> > > > If you care about performance then using 32 bit maths will be much faster.
> > >
> > > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> > > indicated that improvement is possible. At the very least y can be made
> > > a u32 I suppose.
> >
> > OK, is there any more easy optimizations you see?
>
> I think this version works.
> It doesn't have the optimisation for small values.
>
> unsigned int sqrt64(unsigned long long x)
> {
> unsigned int x_hi = x >> 32;
>
> unsigned int b = 0;
> unsigned int y = 0;
> unsigned int i;
>

Perhaps add:

if (x <= UINT_MAX)
return int_sqrt((unsigned long)x);

> for (i = 0; i < 32; i++) {
> b <<= 2;
> b |= x_hi >> 30;
> x_hi <<= 2;
> if (i == 15)
> x_hi = x;
> y <<= 1;
> if (b > y)
> b -= ++y;
> }
> return y;
> }
>
> Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> and compare to that of your version.
>
> David
>

2017-12-20 17:31:09

by Crt Mori

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

On 20 December 2017 at 17:46, David Laight <[email protected]> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 16:17
>>
>> On 20 December 2017 at 17:00, Peter Zijlstra <[email protected]> wrote:
>> > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
>> >
>> >> With minor changes it ought to be possible to remove most of the
>> >> 64bit arithmetic and shifts.
>> >>
>> >> If you care about performance then using 32 bit maths will be much faster.
>> >
>> > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
>> > indicated that improvement is possible. At the very least y can be made
>> > a u32 I suppose.
>>
>> OK, is there any more easy optimizations you see?
>
> I think this version works.
> It doesn't have the optimisation for small values.
>
> unsigned int sqrt64(unsigned long long x)
> {
> unsigned int x_hi = x >> 32;
>
> unsigned int b = 0;
> unsigned int y = 0;
> unsigned int i;
>
> for (i = 0; i < 32; i++) {
> b <<= 2;
> b |= x_hi >> 30;
> x_hi <<= 2;
> if (i == 15)
> x_hi = x;
> y <<= 1;
> if (b > y)
> b -= ++y;
> }
> return y;
> }
>
> Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> and compare to that of your version.
>
> David
>

Wow, thanks. This seems like a major change.

I did a quick run through unit tests for the sensor and the results
are way off. On the sensor I had to convert double calculations to
integer calculations and target was to get end result within 0.02 degC
(with previous approximate sqrt implementation) in sensor working
range. This now gets into 3 degC delta at least and some are way off.
It might be off because of some scaling on the other hand during the
equation (not exactly comparing sqrt implementations here).

I will need a bit more testing of this to see if it is replaceable.

2017-12-21 10:02:35

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

From: Joe Perches
> Sent: 20 December 2017 17:20
...
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> > unsigned int x_hi = x >> 32;
> >
> > unsigned int b = 0;
> > unsigned int y = 0;
> > unsigned int i;
> >
>
> Perhaps add:
>
> if (x <= UINT_MAX)
> return int_sqrt((unsigned long)x);

Actually something like:
i = 32;
if (!x_hi) {
x_hi = x;
i = 16;
}

if (!(x_hi & 0xffff0000)) {
x_hi <<= 16;
i -= 8;
}
Repeat for 0xff000000, 0xf0000000 and 0xc0000000 and adjust loop to count down.

David

>
> > for (i = 0; i < 32; i++) {
> > b <<= 2;
> > b |= x_hi >> 30;
> > x_hi <<= 2;
> > if (i == 15)
> > x_hi = x;
> > y <<= 1;
> > if (b > y)
> > b -= ++y;
> > }
> > return y;
> > }
> >
> > Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> > and compare to that of your version.
> >
> > David
> >

2017-12-21 10:08:02

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

From: Crt Mori
> Sent: 20 December 2017 17:30
> To: David Laight
...
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> > unsigned int x_hi = x >> 32;
> >
> > unsigned int b = 0;
> > unsigned int y = 0;
> > unsigned int i;
> >
> > for (i = 0; i < 32; i++) {
> > b <<= 2;
> > b |= x_hi >> 30;
> > x_hi <<= 2;
> > if (i == 15)
> > x_hi = x;
> > y <<= 1;
> > if (b > y)
> > b -= ++y;
> > }
> > return y;
> > }
>
> Wow, thanks. This seems like a major change.

Not really, it is just doing it slightly differently.
The algorithm is the 'schoolboy' one that used to be taught at school.
(Although I found it in a boot from the 1930s.)

I compared the object code for amd64 (doesn't need to reload
'x_hi' half way through) against the original loop.
They are much the same size, execution speed will depend the lengths
of the dependency chains.

> I did a quick run through unit tests for the sensor and the results
> are way off. On the sensor I had to convert double calculations to
> integer calculations and target was to get end result within 0.02 degC
> (with previous approximate sqrt implementation) in sensor working
> range. This now gets into 3 degC delta at least and some are way off.
> It might be off because of some scaling on the other hand during the
> equation (not exactly comparing sqrt implementations here).
>
> I will need a bit more testing of this to see if it is replaceable.

You probably need to put values through both (all 3) functions to see
where you are getting errors.
It might be rounding, or there might be a fubar somewhere.

David


2017-12-21 10:59:18

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

From: Crt Mori
> Sent: 20 December 2017 17:30
> >> OK, is there any more easy optimizations you see?
> >
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> > unsigned int x_hi = x >> 32;
> >
> > unsigned int b = 0;
> > unsigned int y = 0;
> > unsigned int i;
> >
> > for (i = 0; i < 32; i++) {
> > b <<= 2;
> > b |= x_hi >> 30;
> > x_hi <<= 2;
> > if (i == 15)
> > x_hi = x;
> > y <<= 1;
> > if (b > y)
> > b -= ++y;
> > }
> > return y;
> > }
..
>
> I did a quick run through unit tests for the sensor and the results
> are way off. On the sensor I had to convert double calculations to
> integer calculations and target was to get end result within 0.02 degC
> (with previous approximate sqrt implementation) in sensor working
> range. This now gets into 3 degC delta at least and some are way off.
> It might be off because of some scaling on the other hand during the
> equation (not exactly comparing sqrt implementations here).

I didn't get it quite right...
The last few lines need to be:
if (b > y) {
b -= ++y;
y++;
}
}
return y >> 1;
}

Although that then fails for inputs larger than 2^62.

David



2017-12-21 11:42:46

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

From: Crt Mori
> Sent: 20 December 2017 17:30
> I did a quick run through unit tests for the sensor and the results
> are way off
> ...

Try this version instead:
unsigned int sqrt64(unsigned long long x_in)
{
unsigned int x = x_in >> 32;

unsigned int b = 0;
unsigned int y = 0;
unsigned int i;

i = 31;
if (!x) {
x = x_in;
i = 15;
}
if (!(x & 0xffff0000)) {
x <<= 16;
i -= 8;
}
if (!(x & 0xff000000)) {
x <<= 8;
i -= 4;
}
if (!(x & 0xf0000000)) {
x <<= 4;
i -= 2;
}

do {
b <<= 2;
b |= x >> 30;
x <<= 2;
if (i == 16)
x = x_in;
y <<= 1;
if (b > y) {
b -= ++y;
y++;
}
} while (--i);

/* 'b' becomes 33 bits if the input is greater than 2^62 */
b <<= 1;
b |= x >> 31;
if (b > y || (b == y && x & (1u << 30)))
y |= 1;

return y;
}

I've tested that one with more values.

David


2017-12-21 13:18:31

by Crt Mori

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

On 21 December 2017 at 12:43, David Laight <[email protected]> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 17:30
>> I did a quick run through unit tests for the sensor and the results
>> are way off
>> ...
>
> Try this version instead:
> unsigned int sqrt64(unsigned long long x_in)
> {
> unsigned int x = x_in >> 32;
>
> unsigned int b = 0;
> unsigned int y = 0;
> unsigned int i;

i can be u8. And I will still use explicit typing.

>
> i = 31;
> if (!x) {
> x = x_in;
> i = 15;
> }
> if (!(x & 0xffff0000)) {
> x <<= 16;
> i -= 8;
> }
> if (!(x & 0xff000000)) {
> x <<= 8;
> i -= 4;
> }
> if (!(x & 0xf0000000)) {
> x <<= 4;
> i -= 2;
> }
>

This part above looks like FLS

> do {
> b <<= 2;
> b |= x >> 30;
> x <<= 2;
> if (i == 16)
> x = x_in;
> y <<= 1;
> if (b > y) {
> b -= ++y;
> y++;
> }
> } while (--i);
>
> /* 'b' becomes 33 bits if the input is greater than 2^62 */
> b <<= 1;
> b |= x >> 31;
> if (b > y || (b == y && x & (1u << 30)))
> y |= 1;
>
> return y;
> }
>
> I've tested that one with more values.
>
> David
>

This one indeed works. I did some more testing this morning and I am
fine with either.

So question is: Do I make change as per David's suggestion with his
sign-off, or leave the version it was before the change?

2017-12-21 13:56:43

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

From: Crt Mori
> Sent: 21 December 2017 13:18
...
> > unsigned int i;
>
> i can be u8. And I will still use explicit typing.

u8 will add extra code, unsigned int is good.
'x' needs to be u32, 'y' and 'b' could be larger.

I was testing in userspace.

...
> This part above looks like FLS
It also does the rest of the required shifts.

...
> This one indeed works. I did some more testing this morning and I am
> fine with either.
>
> So question is: Do I make change as per David's suggestion with his
> sign-off, or leave the version it was before the change?

If you generate the actual patch I can add an appropriate sign-off
(or whatever is needed).

David


2017-12-21 14:12:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

On Thu, Dec 21, 2017 at 01:56:55PM +0000, David Laight wrote:
> > This part above looks like FLS
> It also does the rest of the required shifts.

Still, fls() + shift is way faster on hardware that has an fls
instruction.

Writing out that binary search doesn't make sense.

2017-12-21 14:48:01

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

From: Peter Zijlstra
> Sent: 21 December 2017 14:12
...
> > > This part above looks like FLS
> > It also does the rest of the required shifts.
>
> Still, fls() + shift is way faster on hardware that has an fls
> instruction.
>
> Writing out that binary search doesn't make sense.

If the hardware doesn't have an appropriate fls instruction
the soft fls()will be worse.

If you used fls() you'd still need quite a bit of code
to generate the correct shift and loop count adjustment.
Given the cost of the loop iterations the 3 tests are noise.
The open coded version is obviously correct...

I didn't add the 4th one because the code always does 2 iterations.

If you were really worried about performance there are faster
algorithms (even doing 2 or 4 bits a time is faster).

David

2017-12-22 13:45:24

by Crt Mori

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

>From simple strong typing of existing int_sqrt we came to something a
bit more complex or better. Can we decide now which we want in, or I
submit v12 and we decide then (although it is not a v12, but whole new
thing)?

On 21 December 2017 at 15:48, David Laight <[email protected]> wrote:
> From: Peter Zijlstra
>> Sent: 21 December 2017 14:12
> ...
>> > > This part above looks like FLS
>> > It also does the rest of the required shifts.
>>
>> Still, fls() + shift is way faster on hardware that has an fls
>> instruction.
>>
>> Writing out that binary search doesn't make sense.
>
> If the hardware doesn't have an appropriate fls instruction
> the soft fls()will be worse.
>
> If you used fls() you'd still need quite a bit of code
> to generate the correct shift and loop count adjustment.
> Given the cost of the loop iterations the 3 tests are noise.
> The open coded version is obviously correct...
>
> I didn't add the 4th one because the code always does 2 iterations.
>
> If you were really worried about performance there are faster
> algorithms (even doing 2 or 4 bits a time is faster).
>
> David
>

2018-01-09 15:18:48

by Crt Mori

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

It has been some time now since this moved. I have decided not to use
David's implementation because I want to maintain also range above
2^62

Are there any additional objections for this not to go in as it is?

On 22 December 2017 at 14:44, Crt Mori <[email protected]> wrote:
>
> From simple strong typing of existing int_sqrt we came to something a
> bit more complex or better. Can we decide now which we want in, or I
> submit v12 and we decide then (although it is not a v12, but whole new
> thing)?
>
> On 21 December 2017 at 15:48, David Laight <[email protected]> wrote:
> > From: Peter Zijlstra
> >> Sent: 21 December 2017 14:12
> > ...
> >> > > This part above looks like FLS
> >> > It also does the rest of the required shifts.
> >>
> >> Still, fls() + shift is way faster on hardware that has an fls
> >> instruction.
> >>
> >> Writing out that binary search doesn't make sense.
> >
> > If the hardware doesn't have an appropriate fls instruction
> > the soft fls()will be worse.
> >
> > If you used fls() you'd still need quite a bit of code
> > to generate the correct shift and loop count adjustment.
> > Given the cost of the loop iterations the 3 tests are noise.
> > The open coded version is obviously correct...
> >
> > I didn't add the 4th one because the code always does 2 iterations.
> >
> > If you were really worried about performance there are faster
> > algorithms (even doing 2 or 4 bits a time is faster).
> >
> > David
> >

2018-01-12 09:41:11

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

From: Crt Mori [mailto:[email protected]]
> Sent: 09 January 2018 15:18
>
> It has been some time now since this moved. I have decided not to use
> David's implementation because I want to maintain also range above
> 2^62

The last version I did supported the full range.

David

2018-01-15 08:18:25

by Crt Mori

[permalink] [raw]
Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt

On 12 January 2018 at 10:41, David Laight <[email protected]> wrote:
> From: Crt Mori [mailto:[email protected]]
>> Sent: 09 January 2018 15:18
>>
>> It has been some time now since this moved. I have decided not to use
>> David's implementation because I want to maintain also range above
>> 2^62
>
> The last version I did supported the full range.

Nothing changed below that comment, so I was assuming it is not
supported (or did I miss a mail?).

Also fls discussion had opposite opinions or it just seems inconclusive to me?

>
> David
>

So you all agree David Laight version is much better and should be
used instead of currently proposed version?