2015-02-18 18:49:08

by Alexander Kuleshov

[permalink] [raw]
Subject: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating

Signed-off-by: Alexander Kuleshov <[email protected]>
---
fs/fat/fat.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 64e295e..1a5080b 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -207,12 +207,12 @@ static inline void fat_save_attrs(struct inode *inode, u8 attrs)

static inline unsigned char fat_checksum(const __u8 *name)
{
+ u8 i;
unsigned char s = name[0];
- s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
- s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
- s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
- s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
- s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
+
+ for (i = 1; i < 11; i++)
+ s = (s << 7) + (s >> 1) + name[i];
+
return s;
}

--
2.3.0.80.g18d0fec


2015-02-18 19:25:58

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating

Alexander Kuleshov <[email protected]> writes:

> static inline unsigned char fat_checksum(const __u8 *name)
> {
> + u8 i;
> unsigned char s = name[0];
> - s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
> - s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
> - s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
> - s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
> - s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
> +
> + for (i = 1; i < 11; i++)
> + s = (s << 7) + (s >> 1) + name[i];
> +
> return s;
> }

When I wrote this, IIRC, there was measurable performance
difference. All major gcc versions are enough smart now to optimize this?
--
OGAWA Hirofumi <[email protected]>

2015-02-18 19:46:10

by Alexander Kuleshov

[permalink] [raw]
Subject: Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating

Hello,

I tested two simple program with this (gcc-4.9.1)

Results are following:

time ./test_with_loop

real 0m0.001s
user 0m0.000s
sys 0m0.001s

And

time ./test_direct_calculation:

real 0m0.002s
user 0m0.000s
sys 0m0.001s

As you can see result almsot the same. Also i compared assembly output
from two these programras and there are two notes:

1. Assembly output of the program with loop is half as much than
program with directly rotation, but ofcourse we can't take it for rule
here. Current version prepares stack and than just does:

addq $1, %rax
movzbl (%rax), %eax
addl %edx, %eax
movb %al, -1(%rbp)
movzbl -1(%rbp), %eax
rorb %al
movl %eax, %edx
movq -24(%rbp), %rax

11 times, but version with loop does the same but with cmp/jump.

Thank you.


2015-02-19 1:25 GMT+06:00 OGAWA Hirofumi <[email protected]>:
> Alexander Kuleshov <[email protected]> writes:
>
>> static inline unsigned char fat_checksum(const __u8 *name)
>> {
>> + u8 i;
>> unsigned char s = name[0];
>> - s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
>> - s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
>> - s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
>> - s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
>> - s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
>> +
>> + for (i = 1; i < 11; i++)
>> + s = (s << 7) + (s >> 1) + name[i];
>> +
>> return s;
>> }
>
> When I wrote this, IIRC, there was measurable performance
> difference. All major gcc versions are enough smart now to optimize this?
> --
> OGAWA Hirofumi <[email protected]>

2015-02-18 20:32:27

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating

Alexander Kuleshov <[email protected]> writes:

> time ./test_with_loop
>
> real 0m0.001s
> user 0m0.000s
> sys 0m0.001s
>
> And
>
> time ./test_direct_calculation:
>
> real 0m0.002s
> user 0m0.000s
> sys 0m0.001s

Hm,

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

typedef unsigned char __u8;
typedef unsigned char u8;

#if 1
static inline unsigned char fat_checksum(const __u8 *name)
{
unsigned char s = name[0];
s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
return s;
}
#else
static inline unsigned char fat_checksum(const __u8 *name)
{
unsigned char s = name[0];
u8 i;
for (i = 1; i < 11; i++)
s = (s << 7) + (s >> 1) + name[i];
return s;
}
#endif

static __attribute__ ((noinline)) int test(unsigned char *name)
{
long i;
for (i = 0; i < 100000000L; i++)
name[i % 11] = fat_checksum(name);
return name[0];
}

int main(int argc, char *argv[])
{
printf("%u\n", test((unsigned char *)argv[1]));
return 0;
}


$ gcc --version
gcc (Debian 4.9.1-19) 4.9.1
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -O2 -o c.inline c.c
# change #if 1 => #if 0
$ gcc -O2 -o c.loop c.c

$ time ./c.inline aaaaaaaaaaa
14

real 0m0.550s
user 0m0.548s
sys 0m0.000s
$ time ./c.loop aaaaaaaaaaa
14

real 0m0.901s
user 0m0.896s
sys 0m0.004s

This is my environment only? (gcc (Debian 4.9.1-19) 4.9.1)
--
OGAWA Hirofumi <[email protected]>

2015-02-23 20:41:12

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating

On 18.02.2015 21:32, OGAWA Hirofumi wrote:
> Alexander Kuleshov <[email protected]> writes:
>
>> time ./test_with_loop
>>
>> real 0m0.001s
>> user 0m0.000s
>> sys 0m0.001s
>>
>> And
>>
>> time ./test_direct_calculation:
>>
>> real 0m0.002s
>> user 0m0.000s
>> sys 0m0.001s
>
> Hm,
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> typedef unsigned char __u8;
> typedef unsigned char u8;
>
> #if 1
> static inline unsigned char fat_checksum(const __u8 *name)
> {
> unsigned char s = name[0];
> s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2];
> s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4];
> s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6];
> s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8];
> s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10];
> return s;
> }
> #else
> static inline unsigned char fat_checksum(const __u8 *name)
> {
> unsigned char s = name[0];
> u8 i;
> for (i = 1; i < 11; i++)
> s = (s << 7) + (s >> 1) + name[i];
> return s;
> }
> #endif
>
> static __attribute__ ((noinline)) int test(unsigned char *name)

You have to put
__attribute__((optimize("unroll-loops")))
here to unroll the loop inside the function:


static __attribute__ ((noinline))
__attribute__((optimize("unroll-loops")))
int test(unsigned char *name)

> {
> long i;
> for (i = 0; i < 100000000L; i++)
> name[i % 11] = fat_checksum(name);
> return name[0];
> }
>
> int main(int argc, char *argv[])
> {
> printf("%u\n", test((unsigned char *)argv[1]));
> return 0;
> }
>
>
> $ gcc --version
> gcc (Debian 4.9.1-19) 4.9.1
> Copyright (C) 2014 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> $ gcc -O2 -o c.inline c.c
> # change #if 1 => #if 0
> $ gcc -O2 -o c.loop c.c
>
> $ time ./c.inline aaaaaaaaaaa
> 14
>
> real 0m0.550s
> user 0m0.548s
> sys 0m0.000s
> $ time ./c.loop aaaaaaaaaaa
> 14
>
> real 0m0.901s
> user 0m0.896s
> sys 0m0.004s
>
> This is my environment only? (gcc (Debian 4.9.1-19) 4.9.1)
>

$ time ./c.inline aaaaaaaaaaa
14

real 0m0.743s
user 0m0.740s
sys 0m0.000s

Without __attribute__((optimize("unroll-loops"))) :
$ time ./c.loop aaaaaaaaaaa
14

real 0m1.482s
user 0m1.472s
sys 0m0.004s

With __attribute__((optimize("unroll-loops"))) :

$ time ./c.loop aaaaaaaaaaa
14

real 0m0.742s
user 0m0.740s
sys 0m0.000s

Best regards

Heinrich Schuchardt

2015-02-24 02:42:45

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating

Heinrich Schuchardt <[email protected]> writes:

> You have to put
> __attribute__((optimize("unroll-loops")))
> here to unroll the loop inside the function:
>
>
> static __attribute__ ((noinline))
> __attribute__((optimize("unroll-loops")))
> int test(unsigned char *name)
>
> $ time ./c.inline aaaaaaaaaaa
> 14
>
> real 0m0.743s
> user 0m0.740s
> sys 0m0.000s
>
> Without __attribute__((optimize("unroll-loops"))) :
> $ time ./c.loop aaaaaaaaaaa
> 14
>
> real 0m1.482s
> user 0m1.472s
> sys 0m0.004s
>
> With __attribute__((optimize("unroll-loops"))) :
>
> $ time ./c.loop aaaaaaaaaaa
> 14
>
> real 0m0.742s
> user 0m0.740s
> sys 0m0.000s

This attribute has to be added to the caller, not fat_checksum()? I.e.,
we has to add it to all callers of fat_checksum()?

Well, this is interesting gcc optimize option though, maybe not worth to
introduce this to kernel only for fatfs.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2015-02-24 19:22:36

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: [PATCH] fs/fat: calculate checksum in a loop instead of directly calculating

On 24.02.2015 03:42, OGAWA Hirofumi wrote:
> Heinrich Schuchardt <[email protected]> writes:
>
>> You have to put
>> __attribute__((optimize("unroll-loops")))
>> here to unroll the loop inside the function:
>>
>>
>> static __attribute__ ((noinline))
>> __attribute__((optimize("unroll-loops")))
>> int test(unsigned char *name)
>>
>> $ time ./c.inline aaaaaaaaaaa
>> 14
>>
>> real 0m0.743s
>> user 0m0.740s
>> sys 0m0.000s
>>
>> Without __attribute__((optimize("unroll-loops"))) :
>> $ time ./c.loop aaaaaaaaaaa
>> 14
>>
>> real 0m1.482s
>> user 0m1.472s
>> sys 0m0.004s
>>
>> With __attribute__((optimize("unroll-loops"))) :
>>
>> $ time ./c.loop aaaaaaaaaaa
>> 14
>>
>> real 0m0.742s
>> user 0m0.740s
>> sys 0m0.000s
>
> This attribute has to be added to the caller, not fat_checksum()? I.e.,
> we has to add it to all callers of fat_checksum()?

The attribute is applied to the declaration of the function in which you
need loop unrolling.
https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/Attribute-Syntax.html

>
> Well, this is interesting gcc optimize option though, maybe not worth to
> introduce this to kernel only for fatfs.

There is an effort to compile the Linux kernel with Clang. We should
avoid new GCC specific items.

Best regards

Heinrich