From: Behan Webster <[email protected]>
Use asm to make the globally named register work again for gcc and clang.
Much more efficient than copying the stack pointer to a variable and back again.
Signed-off-by: Behan Webster <[email protected]>
---
arch/x86/include/asm/thread_info.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index e1940c0..e27ccc1 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -163,10 +163,10 @@ struct thread_info {
*/
#ifndef __ASSEMBLY__
-#define current_stack_pointer ({ \
- unsigned long sp; \
- asm("mov %%esp,%0" : "=g" (sp)); \
- sp; \
+#define current_stack_pointer ({ \
+ register unsigned long sp asm("esp") __used; \
+ asm("" : "=r" (sp)); \
+ sp; \
})
/* how to get the thread information struct from C */
--
1.8.3.2
This seems like really deep magic when looking at it... at the very
least, this needs to be very carefully commented, including why it works
on the various platforms.
How much does this actually affect the output? I only see three uses of
current_stack_pointer:
/* how to get the thread information struct from C */
static inline struct thread_info *current_thread_info(void)
{
return (struct thread_info *)
(current_stack_pointer & ~(THREAD_SIZE - 1));
}
... here we need the mov anyway, because we have to then AND it with a
mask, which we obviously can't do inside the stack pointer.
kernel/irq_32.c: irqctx->tinfo.previous_esp = current_stack_pointer;
(two times)
Here we are moving it into a memory variable anyway, which the "=g"
constraint should allow.
So I see no evidence this is more efficient in any way.
-hpa
On 02/20/2014 08:44 PM, [email protected] wrote:
> From: Behan Webster <[email protected]>
>
> Use asm to make the globally named register work again for gcc and clang.
> Much more efficient than copying the stack pointer to a variable and back again.
>
> Signed-off-by: Behan Webster <[email protected]>
> ---
> arch/x86/include/asm/thread_info.h | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index e1940c0..e27ccc1 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -163,10 +163,10 @@ struct thread_info {
> */
> #ifndef __ASSEMBLY__
>
> -#define current_stack_pointer ({ \
> - unsigned long sp; \
> - asm("mov %%esp,%0" : "=g" (sp)); \
> - sp; \
> +#define current_stack_pointer ({ \
> + register unsigned long sp asm("esp") __used; \
> + asm("" : "=r" (sp)); \
> + sp; \
> })
>
> /* how to get the thread information struct from C */
>
On 02/20/2014 08:55 PM, H. Peter Anvin wrote:
> This seems like really deep magic when looking at it... at the very
> least, this needs to be very carefully commented, including why it works
> on the various platforms.
>
> How much does this actually affect the output? I only see three uses of
> current_stack_pointer:
>
> /* how to get the thread information struct from C */
> static inline struct thread_info *current_thread_info(void)
> {
> return (struct thread_info *)
> (current_stack_pointer & ~(THREAD_SIZE - 1));
> }
>
> ... here we need the mov anyway, because we have to then AND it with a
> mask, which we obviously can't do inside the stack pointer.
No clue what code is actually generated, but the new code could generate:
mov $MASK, %rax;
and %esp, %rax;
Admittedly, I can't see any reason why this would be an improvement.
--Andy
On 02/25/2014 07:00 PM, Andy Lutomirski wrote:
>>
>> How much does this actually affect the output? I only see three uses of
>> current_stack_pointer:
>>
>> /* how to get the thread information struct from C */
>> static inline struct thread_info *current_thread_info(void)
>> {
>> return (struct thread_info *)
>> (current_stack_pointer & ~(THREAD_SIZE - 1));
>> }
>>
>> ... here we need the mov anyway, because we have to then AND it with a
>> mask, which we obviously can't do inside the stack pointer.
>
> No clue what code is actually generated, but the new code could generate:
>
> mov $MASK, %rax;
> and %esp, %rax;
>
> Admittedly, I can't see any reason why this would be an improvement.
>
You have to generate one of the code sequences:
mov $MASK, %eax
and %esp, %eax
... or ...
mov %esp, %eax
and $MASK, %eax
No real difference either way.
-hpa