2003-08-26 08:25:06

by J.A. Magallon

[permalink] [raw]
Subject: [PATCH] 2.4: always_inline for gcc3

Hi.

Resending for 2.4.23-pre ;)

--- 25/include/linux/compiler.h~gcc3-inline-fix 2003-03-06
03:02:43.000000000 -0800
+++ 25-akpm/include/linux/compiler.h 2003-03-06 03:11:42.000000000 -0800
@@ -1,6 +1,13 @@
#ifndef __LINUX_COMPILER_H
#define __LINUX_COMPILER_H

+#if __GNUC__ >= 3
+#define inline __inline__ __attribute__((always_inline))
+#define inline__ __inline__ __attribute__((always_inline))
+#define __inline __inline__ __attribute__((always_inline))
+#define __inline__ __inline__ __attribute__((always_inline))
+#endif
+
/* Somewhere in the middle of the GCC 2.96 development cycle, we implemented
a mechanism by which the user can annotate likely branch directions and
expect the blocks to be reordered appropriately. Define __builtin_expect


--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk))


2003-08-26 12:20:06

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] 2.4: always_inline for gcc3


Can you please explain me what are the differences when using "__inline__
__attribute__((always_inline))" and why you chose to use that?

On Tue, 26 Aug 2003, J.A. Magallon wrote:

> Hi.
>
> Resending for 2.4.23-pre ;)
>
> --- 25/include/linux/compiler.h~gcc3-inline-fix 2003-03-06
> 03:02:43.000000000 -0800
> +++ 25-akpm/include/linux/compiler.h 2003-03-06 03:11:42.000000000 -0800
> @@ -1,6 +1,13 @@
> #ifndef __LINUX_COMPILER_H
> #define __LINUX_COMPILER_H
>
> +#if __GNUC__ >= 3
> +#define inline __inline__ __attribute__((always_inline))
> +#define inline__ __inline__ __attribute__((always_inline))
> +#define __inline __inline__ __attribute__((always_inline))
> +#define __inline__ __inline__ __attribute__((always_inline))
> +#endif
> +
> /* Somewhere in the middle of the GCC 2.96 development cycle, we implemented
> a mechanism by which the user can annotate likely branch directions and
> expect the blocks to be reordered appropriately. Define __builtin_expect
>
>
>

2003-08-26 16:24:57

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCH] 2.4: always_inline for gcc3


On 08.26, Marcelo Tosatti wrote:
>
> Can you please explain me what are the differences when using "__inline__
> __attribute__((always_inline))" and why you chose to use that?
>

gcc3 did not inline big functions, even if they were marked as inline
Thread:
http://marc.theaimsgroup.com/?t=103632325600005&r=1&w=2
Things like memcpy and copy_to/from_user were affected.
They were not inlined and you got tons of instances in vmlinux.

An initial patch was proposed by Denis Vlasenko, and refined by
akpm I think.

TIA

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk))

2003-08-26 17:00:21

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCH] 2.4: always_inline for gcc3


On 08.26, J.A. Magallon wrote:
>
> On 08.26, Marcelo Tosatti wrote:
> >
> > Can you please explain me what are the differences when using "__inline__
> > __attribute__((always_inline))" and why you chose to use that?
> >
>
> gcc3 did not inline big functions, even if they were marked as inline
> Thread:
> http://marc.theaimsgroup.com/?t=103632325600005&r=1&w=2
> Things like memcpy and copy_to/from_user were affected.
> They were not inlined and you got tons of instances in vmlinux.
>
> An initial patch was proposed by Denis Vlasenko, and refined by
> akpm I think.
>

A comparison
run

cat System.map | cut -d' ' -f 3 | sort | uniq -c | sort -nr | head -n 10

for my custom kernel and for a mandrake's standard one (not the late...)

System.map:
16 __constant_c_and_count_memset
8 .text.lock.inode
7 parse_options
6 __constant_memcpy
5 devfs_handle
4 init_once
3 p.0
3 debug
2 want_value
2 want_numeric

System.map-2.4.21-6mdksmp:
150 __constant_c_and_count_memset
81 __constant_memcpy
45 __constant_copy_to_user
45 __constant_copy_from_user
18 level_save.1
18 layer_save.0
14 driver
13 buf.0
9 config_chipset_for_dma
6 config_chipset_for_pio


--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk))

2003-08-27 14:38:29

by Alan

[permalink] [raw]
Subject: Re: [PATCH] 2.4: always_inline for gcc3

On Maw, 2003-08-26 at 17:24, J.A. Magallon wrote:
> gcc3 did not inline big functions, even if they were marked as inline
> Thread:
> http://marc.theaimsgroup.com/?t=103632325600005&r=1&w=2
> Things like memcpy and copy_to/from_user were affected.
> They were not inlined and you got tons of instances in vmlinux.

The more interesting question you want to answer first is - was
gcc right. Repeated code is bad for cache

2003-08-27 16:17:34

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCH] 2.4: always_inline for gcc3


On 08.27, Alan Cox wrote:
> On Maw, 2003-08-26 at 17:24, J.A. Magallon wrote:
> > gcc3 did not inline big functions, even if they were marked as inline
> > Thread:
> > http://marc.theaimsgroup.com/?t=103632325600005&r=1&w=2
> > Things like memcpy and copy_to/from_user were affected.
> > They were not inlined and you got tons of instances in vmlinux.
>
> The more interesting question you want to answer first is - was
> gcc right. Repeated code is bad for cache
>

That would be true for a regular function, too much inlining can be bad.
But __constant_c_and_count_memset is special, you know.
It is written like a big switch:

copy( size )
switch (size)
case 1: ...
case 2: ...
...

Author and users relay on gcc's optimizer:
- gcc inlines it
- control variable in switch is constant at compile time
- switch is killed and just the suitable case stays.
(\label{1})

If gcc does not inline it, there are a ton of users of copy_to/from
and memset that execute the full switch() (w/o the patch, 150 out-of-line
instances of __constant_c_and_count_memset stay, with the patch
only 16, so it means gcc failed to inline about 140...).
Depending on which users are not inlined, this can be more or less
harmfull in terms of performance, but sure it is not the designed
behaviour.

And of course, I can be totally wrong. Comments welcome.

Ah, all this supposing that gcc still does correctly what I said
in \ref{1} ;)

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk))