As Segher pointed out, inline asm is better than the volatile casting all over
the place. From the PowerPC patch description:
Also use inline functions instead of macros; this actually
improves code generation (some code becomes a little smaller,
probably because of improved alias information -- just a few
hundred bytes total on a default kernel build, nothing shocking).
My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2):
text data bss dec hex filename
3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux
3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux
3436201 249176 176128 3861505 3aec01 atomic_inline_volatile/vmlinux
3436203 249176 176128 3861507 3aec03 atomic_volatile/vmlinux
Signed-off-by: Sebastian Siewior <[email protected]>
--- a/include/asm-i386/atomic.h
+++ b/include/asm-i386/atomic.h
@@ -22,19 +22,34 @@ typedef struct { int counter; } atomic_t
/**
* atomic_read - read atomic variable
* @v: pointer of type atomic_t
- *
+ *
* Atomically reads the value of @v.
- */
-#define atomic_read(v) ((v)->counter)
+ */
+static __inline__ int atomic_read(const atomic_t *v)
+{
+ int t;
+
+ __asm__ __volatile__(
+ "movl %1,%0"
+ : "=r"(t)
+ : "m"(v->counter));
+ return t;
+}
/**
* atomic_set - set atomic variable
* @v: pointer of type atomic_t
* @i: required value
- *
+ *
* Atomically sets the value of @v to @i.
- */
-#define atomic_set(v,i) (((v)->counter) = (i))
+ */
+static __inline__ void atomic_set(atomic_t *v, int i)
+{
+ __asm__ __volatile__(
+ "movl %1,%0"
+ : "=m"(v->counter)
+ : "ir"(i));
+}
/**
* atomic_add - add integer to atomic variable
--
> My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2):
> text data bss dec hex filename
> 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux
> 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux
What is the difference between atomic_normal and atomic_inlineasm?
> /**
> * atomic_read - read atomic variable
> * @v: pointer of type atomic_t
> - *
> + *
Please don't change white space in patches
> * Atomically reads the value of @v.
> - */
> -#define atomic_read(v) ((v)->counter)
> + */
> +static __inline__ int atomic_read(const atomic_t *v)
> +{
> + int t;
> +
> + __asm__ __volatile__(
And don't use __*__ in new code
-Andi
* Andi Kleen | 2007-08-15 02:20:35 [+0200]:
>> My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2):
>> text data bss dec hex filename
>> 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux
>> 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux
>
>What is the difference between atomic_normal and atomic_inlineasm?
atomic normal is Linus' tree, commit 28e8351. Inline asm is with this
patch. I wrote this in 0/2 (you want this here as well?).
>
>> /**
>> * atomic_read - read atomic variable
>> * @v: pointer of type atomic_t
>> - *
>> + *
>
>Please don't change white space in patches
I fixed white space errors. SubmitChecklist:24 says I should not
introduce any new ones so fixing existig sounds like the right thing to
do.
No white space fixing in future?
>
>> * Atomically reads the value of @v.
>> - */
>> -#define atomic_read(v) ((v)->counter)
>> + */
>> +static __inline__ int atomic_read(const atomic_t *v)
>> +{
>> + int t;
>> +
>> + __asm__ __volatile__(
>
>And don't use __*__ in new code
Okey.
>
>-Andi
Sebastian
Andi Kleen <[email protected]> wrote:
>> My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2):
>> text data bss dec hex filename
>> 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux
>> 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux
>
> What is the difference between atomic_normal and atomic_inlineasm?
The inline asm stops certain optimisations from occuring.
I'm still unconvinced why we need this because nobody has
brought up any examples of kernel code that legitimately
need this.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Wed, 15 Aug 2007, Herbert Xu wrote:
> Andi Kleen <[email protected]> wrote:
> >> My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2):
> >> text data bss dec hex filename
> >> 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux
> >> 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux
> >
> > What is the difference between atomic_normal and atomic_inlineasm?
>
> The inline asm stops certain optimisations from occuring.
Yup, the "__volatile__" after "__asm__" shouldn't be required. The
previous definitions allowed the compiler to optimize certain atomic
ops away, and considering we haven't seen any bugs due to the past
behaviour anyway, changing it like this sounds unnecessary.
The second behavioral change that comes about here is the force-
constraining of v->counter to memory in the extended asm's constraint
sets. But that's required to give "volatility"-like behaviour, which
I suspect is the goal of this patch.
Sebastian, could you look at the kernel images in detail and see why,
how and what in the text expanded by 1158 bytes with these changes?
> I'm still unconvinced why we need this because nobody has
> brought up any examples of kernel code that legitimately
> need this.
Me too, but I do find these more palatable than the variants using
"volatile", I admit.
Satyam
* Satyam Sharma | 2007-08-15 18:32:10 [+0530]:
>On Wed, 15 Aug 2007, Herbert Xu wrote:
>
>> Andi Kleen <[email protected]> wrote:
>> >> My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2):
>> >> text data bss dec hex filename
>> >> 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux
>> >> 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux
>> >
>> > What is the difference between atomic_normal and atomic_inlineasm?
>>
>> The inline asm stops certain optimisations from occuring.
>
>Yup, the "__volatile__" after "__asm__" shouldn't be required. The
>previous definitions allowed the compiler to optimize certain atomic
>ops away, and considering we haven't seen any bugs due to the past
>behaviour anyway, changing it like this sounds unnecessary.
>
>The second behavioral change that comes about here is the force-
>constraining of v->counter to memory in the extended asm's constraint
>sets. But that's required to give "volatility"-like behaviour, which
>I suspect is the goal of this patch.
exactly.
>Sebastian, could you look at the kernel images in detail and see why,
>how and what in the text expanded by 1158 bytes with these changes?
Wow, easy. "normal" is Linus' git. It has no volatile or anything. I
suspect the compiler to "optimize" some of the reads and sets away
(which remain after my patch is applied).
Anyway, I tried to find out the source of this and it is not that easy.
If the text is getting bigger, all references to the data section are
changed and I can't simply diff. Than I searched for two "equal" .o
files which differ in size. This was not better because gcc reorganized
the whole .o file and the functions that were at the beginning in the
first file, were at a different position in the second one.
For those who have plenty of time in their hands, I posted the four
kernels [1] and my four O= folders [2].
Once I have some time, I try to diff the specific function within the .o
file which used the atomic_{read|set} operation.
>> I'm still unconvinced why we need this because nobody has
>> brought up any examples of kernel code that legitimately
>> need this.
>
>Me too, but I do find these more palatable than the variants using
>"volatile", I admit.
Yes. The inline asm is even smaller than the volatile ones, what sounds
like better optimization by the gcc (and it does exactly what volatile
was used for).
[1] http://download.breakpoint.cc/kernel_builds_vmlinux.tbz2 8.3 MiB
[2] http://download.breakpoint.cc/kernel_builds.tbz2 117 MiB
>
>Satyam
Sebastian
Herbert Xu wrote:
> Andi Kleen <[email protected]> wrote:
>>> My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2):
>>> text data bss dec hex filename
>>> 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux
>>> 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux
>> What is the difference between atomic_normal and atomic_inlineasm?
>
> The inline asm stops certain optimisations from occuring.
>
> I'm still unconvinced why we need this because nobody has
> brought up any examples of kernel code that legitimately
> need this.
There's plenty of kernel code that *wants* this though. If we can
reduce the need for register-clobbering barriers, shrink our binaries,
shrink our code, improve performance, and avoid heisenbugs, I think it's
a win, whether or not we *need* it.
-- Chris
On Wed, Aug 15, 2007 at 01:02:23PM -0400, Chris Snook wrote:
> Herbert Xu wrote:
> >Andi Kleen <[email protected]> wrote:
> >>>My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2):
> >>> text data bss dec hex filename
> >>>3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux
> >>>3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux
> >>What is the difference between atomic_normal and atomic_inlineasm?
> >
> >The inline asm stops certain optimisations from occuring.
> >
> >I'm still unconvinced why we need this because nobody has
> >brought up any examples of kernel code that legitimately
> >need this.
>
> There's plenty of kernel code that *wants* this though. If we can
You keep saying this yet everytime I ask for an example I
get nothing.
> reduce the need for register-clobbering barriers, shrink our binaries,
> shrink our code, improve performance, and avoid heisenbugs, I think it's
> a win, whether or not we *need* it.
Hmm, you're increasing our binary size and probably killing
performance.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 01:02:23PM -0400, Chris Snook wrote:
>
>>Herbert Xu wrote:
>>>I'm still unconvinced why we need this because nobody has
>>>brought up any examples of kernel code that legitimately
>>>need this.
>>
>>There's plenty of kernel code that *wants* this though. If we can
>
>
> You keep saying this yet everytime I ask for an example I
> get nothing.
Agreed. Simplest thing to do will be to post those patches which
eliminate the register-clobbering barriers, shrink binaries, etc.
for all that code that wants this.
--
SUSE Labs, Novell Inc.
Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 01:02:23PM -0400, Chris Snook wrote:
>> Herbert Xu wrote:
>>> Andi Kleen <[email protected]> wrote:
>>>>> My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2):
>>>>> text data bss dec hex filename
>>>>> 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux
>>>>> 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux
>>>> What is the difference between atomic_normal and atomic_inlineasm?
>>> The inline asm stops certain optimisations from occuring.
>>>
>>> I'm still unconvinced why we need this because nobody has
>>> brought up any examples of kernel code that legitimately
>>> need this.
>> There's plenty of kernel code that *wants* this though. If we can
>
> You keep saying this yet everytime I ask for an example I
> get nothing.
Just look for all the code (and there's an immense amount) that has a
barrier() between two atomic_* operations, or in a loop with such
operations. With these patches merged, we can proceed to *remove* those
barriers.
>> reduce the need for register-clobbering barriers, shrink our binaries,
>> shrink our code, improve performance, and avoid heisenbugs, I think it's
>> a win, whether or not we *need* it.
>
> Hmm, you're increasing our binary size and probably killing
> performance.
The cost of these patches themselves is negligible, since people pretty
much always use barriers in places where it would matter. The long-term
benefit is that with guaranteed volatile behavior, we can go through and
remove a whole bunch of these barriers.
-- Chris
On Thu, Aug 16, 2007 at 03:23:49PM -0400, Chris Snook wrote:
>
> >You keep saying this yet everytime I ask for an example I
> >get nothing.
>
> Just look for all the code (and there's an immense amount) that has a
> barrier() between two atomic_* operations, or in a loop with such
> operations. With these patches merged, we can proceed to *remove* those
> barriers.
When I say example I don't mean some hypothetical case, please
give actual function/file names.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt