2015-05-13 20:57:13

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH] lib: fix 842 build on 32-bit architectures

Building the 842 code on 32-bit ARM currently results in this link
error:

ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined!

The reason is that the __do_index function performs a 64-bit
division by a power-of-two number, but it has no insight into
the function arguments.

By marking that function inline, the fsize argument is always
known at the time that do_index is called, and the compiler is
able to replace the extremely expensive 64-bit division with
a cheap constant shift operation.

Aside from fixing that link error, this approach should also improve
both code size and performance on 32-bit architectures significantly.

Signed-off-by: Arnd Bergmann <[email protected]>
---
Found while building arm32 allmodconfig with gcc-5.0

diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
index 6b2b45aecde3..285bf6b6959c 100644
--- a/lib/842/842_decompress.c
+++ b/lib/842/842_decompress.c
@@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n)
return 0;
}

-static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
+static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
{
u64 index, offset, total = round_down(p->out - p->ostart, 8);
int ret;


2015-05-13 23:53:13

by Dan Streetman

[permalink] [raw]
Subject: Re: [PATCH] lib: fix 842 build on 32-bit architectures

> Building the 842 code on 32-bit ARM currently results in this link
> error:
>
> ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined!

Oops! Guess I should build/test on 32 bit more.

>
> The reason is that the __do_index function performs a 64-bit
> division by a power-of-two number, but it has no insight into
> the function arguments.
>
> By marking that function inline, the fsize argument is always
> known at the time that do_index is called, and the compiler is
> able to replace the extremely expensive 64-bit division with
> a cheap constant shift operation.

alternately, we know that fsize will always be less than 64 bits,
at most it's 4<<9 or 8<<8 (both == 1<<11). So we could just change
its type to u16.

diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
index 6b2b45aecde3..285bf6b6959c 100644
--- a/lib/842/842_decompress.c
+++ b/lib/842/842_decompress.c
@@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n)
return 0;
}

-static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
+static int __do_index(struct sw842_param *p, u8 size, u8 bits, u16 fsize)
{
u64 index, offset, total = round_down(p->out - p->ostart, 8);
int ret;

Or, we could inline it and change the type to u16. In any case,

Acked-by: Dan Streetman <[email protected]>

>
> Aside from fixing that link error, this approach should also improve
> both code size and performance on 32-bit architectures significantly.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> Found while building arm32 allmodconfig with gcc-5.0
>
> diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
> index 6b2b45aecde3..285bf6b6959c 100644
> --- a/lib/842/842_decompress.c
> +++ b/lib/842/842_decompress.c
> @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n)
> return 0;
> }
>
> -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
> +static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
> {
> u64 index, offset, total = round_down(p->out - p->ostart, 8);
> int ret;
>

2015-05-14 03:07:04

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] lib: fix 842 build on 32-bit architectures

On Wed, May 13, 2015 at 10:56:39PM +0200, Arnd Bergmann wrote:
> Building the 842 code on 32-bit ARM currently results in this link
> error:
>
> ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined!
>
> The reason is that the __do_index function performs a 64-bit
> division by a power-of-two number, but it has no insight into
> the function arguments.
>
> By marking that function inline, the fsize argument is always
> known at the time that do_index is called, and the compiler is
> able to replace the extremely expensive 64-bit division with
> a cheap constant shift operation.
>
> Aside from fixing that link error, this approach should also improve
> both code size and performance on 32-bit architectures significantly.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> Found while building arm32 allmodconfig with gcc-5.0
>
> diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
> index 6b2b45aecde3..285bf6b6959c 100644
> --- a/lib/842/842_decompress.c
> +++ b/lib/842/842_decompress.c
> @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n)
> return 0;
> }
>
> -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
> +static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)

Ugh, relying on inlining to work is fragile. I'm not against
making this inline but please make it work even when it is out-
of-line.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-05-14 08:54:33

by Dan Streetman

[permalink] [raw]
Subject: Re: [PATCH] lib: fix 842 build on 32-bit architectures

On Wed, May 13, 2015 at 7:52 PM, Dan Streetman <[email protected]> wrote:
>> Building the 842 code on 32-bit ARM currently results in this link
>> error:
>>
>> ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined!
>
> Oops! Guess I should build/test on 32 bit more.
>
>>
>> The reason is that the __do_index function performs a 64-bit
>> division by a power-of-two number, but it has no insight into
>> the function arguments.

wait, do you mean the 64 bit mod, total % fsize? That should already
be fixed in Herbert's tree, I changed it to subtraction instead.

In any case, I looked at the code again and I think the fsize
parameter can be removed, and just simply calculated in the function,
it's just a shift. I'll send a patch.


>>
>> By marking that function inline, the fsize argument is always
>> known at the time that do_index is called, and the compiler is
>> able to replace the extremely expensive 64-bit division with
>> a cheap constant shift operation.
>
> alternately, we know that fsize will always be less than 64 bits,
> at most it's 4<<9 or 8<<8 (both == 1<<11). So we could just change
> its type to u16.
>
> diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
> index 6b2b45aecde3..285bf6b6959c 100644
> --- a/lib/842/842_decompress.c
> +++ b/lib/842/842_decompress.c
> @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n)
> return 0;
> }
>
> -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
> +static int __do_index(struct sw842_param *p, u8 size, u8 bits, u16 fsize)
> {
> u64 index, offset, total = round_down(p->out - p->ostart, 8);
> int ret;
>
> Or, we could inline it and change the type to u16. In any case,
>
> Acked-by: Dan Streetman <[email protected]>
>
>>
>> Aside from fixing that link error, this approach should also improve
>> both code size and performance on 32-bit architectures significantly.
>>
>> Signed-off-by: Arnd Bergmann <[email protected]>
>> ---
>> Found while building arm32 allmodconfig with gcc-5.0
>>
>> diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
>> index 6b2b45aecde3..285bf6b6959c 100644
>> --- a/lib/842/842_decompress.c
>> +++ b/lib/842/842_decompress.c
>> @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n)
>> return 0;
>> }
>>
>> -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
>> +static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
>> {
>> u64 index, offset, total = round_down(p->out - p->ostart, 8);
>> int ret;
>>

2015-05-14 10:03:45

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] lib: fix 842 build on 32-bit architectures

On Wed, May 13, 2015 at 10:56:39PM +0200, Arnd Bergmann wrote:
> Building the 842 code on 32-bit ARM currently results in this link
> error:
>
> ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined!
>
> The reason is that the __do_index function performs a 64-bit
> division by a power-of-two number, but it has no insight into
> the function arguments.
>
> By marking that function inline, the fsize argument is always
> known at the time that do_index is called, and the compiler is
> able to replace the extremely expensive 64-bit division with
> a cheap constant shift operation.
>
> Aside from fixing that link error, this approach should also improve
> both code size and performance on 32-bit architectures significantly.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> Found while building arm32 allmodconfig with gcc-5.0
>
> diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
> index 6b2b45aecde3..285bf6b6959c 100644
> --- a/lib/842/842_decompress.c
> +++ b/lib/842/842_decompress.c
> @@ -169,7 +169,7 @@ static int do_data(struct sw842_param *p, u8 n)
> return 0;
> }
>
> -static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
> +static inline int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)

This had better get a comment to say why this is done, to stop the
"don't do static inline in a .c" brigade reverting this change.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-05-14 10:49:49

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] lib: fix 842 build on 32-bit architectures

On Thursday 14 May 2015 04:54:07 Dan Streetman wrote:
> On Wed, May 13, 2015 at 7:52 PM, Dan Streetman <[email protected]> wrote:
> >> Building the 842 code on 32-bit ARM currently results in this link
> >> error:
> >>
> >> ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined!
> >
> > Oops! Guess I should build/test on 32 bit more.
> >
> >>
> >> The reason is that the __do_index function performs a 64-bit
> >> division by a power-of-two number, but it has no insight into
> >> the function arguments.
>
> wait, do you mean the 64 bit mod, total % fsize? That should already
> be fixed in Herbert's tree, I changed it to subtraction instead.

Yes, that's the one. I was looking at yesterday's linux-next which still
had the bug, but your fix has made it into today's release, so my patch
is no longer needed.

> In any case, I looked at the code again and I think the fsize
> parameter can be removed, and just simply calculated in the function,
> it's just a shift. I'll send a patch.

Not necessary for this problem any more, but it could still make sense
if you think that improves the code. You can also try to see if marking
the function inline has any effect on code size or performance if either
of them matters.

Arnd

2015-05-14 11:03:34

by Dan Streetman

[permalink] [raw]
Subject: Re: [PATCH] lib: fix 842 build on 32-bit architectures

On Thu, May 14, 2015 at 6:49 AM, Arnd Bergmann <[email protected]> wrote:
> On Thursday 14 May 2015 04:54:07 Dan Streetman wrote:
>> On Wed, May 13, 2015 at 7:52 PM, Dan Streetman <[email protected]> wrote:
>> >> Building the 842 code on 32-bit ARM currently results in this link
>> >> error:
>> >>
>> >> ERROR: "__aeabi_uldivmod" [lib/842/842_decompress.ko] undefined!
>> >
>> > Oops! Guess I should build/test on 32 bit more.
>> >
>> >>
>> >> The reason is that the __do_index function performs a 64-bit
>> >> division by a power-of-two number, but it has no insight into
>> >> the function arguments.
>>
>> wait, do you mean the 64 bit mod, total % fsize? That should already
>> be fixed in Herbert's tree, I changed it to subtraction instead.
>
> Yes, that's the one. I was looking at yesterday's linux-next which still
> had the bug, but your fix has made it into today's release, so my patch
> is no longer needed.
>
>> In any case, I looked at the code again and I think the fsize
>> parameter can be removed, and just simply calculated in the function,
>> it's just a shift. I'll send a patch.
>
> Not necessary for this problem any more, but it could still make sense
> if you think that improves the code. You can also try to see if marking
> the function inline has any effect on code size or performance if either
> of them matters.

Yep, I'll run some performance tests both ways and send a patch if it
does improve things. Thanks!

>
> Arnd