2008-11-26 19:17:21

by Cyrill Gorcunov

[permalink] [raw]
Subject: x86 -tip: entry_64.S cleanup series


Hi, here is a trivial cleanups for entry_64.S,
please review. The dangerous one is patch 2: get rid
of back jump -- check it twice.

Alexander, I hope it doesn't interfere with the stuff
you are working on?

- Cyrill -


2008-11-26 19:17:35

by Cyrill Gorcunov

[permalink] [raw]
Subject: [PATCH 1/5] x86: entry_64.S - use ENTRY to define child_rip

From: Cyrill Gorcunov <[email protected]>

child_rip is called not by its name but indirectly
rather so make it global and aligned.

Signed-off-by: Cyrill Gorcunov <[email protected]>
---
arch/x86/kernel/entry_64.S | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 64688db..098ba0b 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1322,7 +1322,7 @@ ENTRY(kernel_thread)
CFI_ENDPROC
END(kernel_thread)

-child_rip:
+ENTRY(child_rip)
pushq $0 # fake return address
CFI_STARTPROC
/*
--
1.6.0.4.603.gbc9c0

2008-11-26 19:17:51

by Cyrill Gorcunov

[permalink] [raw]
Subject: [PATCH 2/5] x86: ret_from_fork - get rid of jump back

From: Cyrill Gorcunov <[email protected]>

Impact: cleanup, microoptimization

Use straightforward jumping if possible, cpu like it.

Signed-off-by: Cyrill Gorcunov <[email protected]>
---
arch/x86/kernel/entry_64.S | 10 ++++------
1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 098ba0b..7b52b0f 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -379,7 +379,10 @@ ENTRY(ret_from_fork)
GET_THREAD_INFO(%rcx)
testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),TI_flags(%rcx)
CFI_REMEMBER_STATE
- jnz rff_trace
+ jz rff_action
+ movq %rsp,%rdi
+ call syscall_trace_leave
+ GET_THREAD_INFO(%rcx)
rff_action:
RESTORE_REST
testl $3,CS-ARGOFFSET(%rsp) # from kernel_thread?
@@ -389,11 +392,6 @@ rff_action:
RESTORE_TOP_OF_STACK %rdi, -ARGOFFSET
jmp ret_from_sys_call
CFI_RESTORE_STATE
-rff_trace:
- movq %rsp,%rdi
- call syscall_trace_leave
- GET_THREAD_INFO(%rcx)
- jmp rff_action
CFI_ENDPROC
END(ret_from_fork)

--
1.6.0.4.603.gbc9c0

2008-11-26 19:18:16

by Cyrill Gorcunov

[permalink] [raw]
Subject: [PATCH 3/5] x86: entry_64.S - use X86_EFLAGS_IF instead of hardcoded number

From: Cyrill Gorcunov <[email protected]>

Impact: cleanup

Signed-off-by: Cyrill Gorcunov <[email protected]>
---
arch/x86/kernel/entry_64.S | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 7b52b0f..c409f73 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -199,7 +199,7 @@ ENTRY(native_usergs_sysret64)
pushq %rax /* rsp */
CFI_ADJUST_CFA_OFFSET 8
CFI_REL_OFFSET rsp,0
- pushq $(1<<9) /* eflags - interrupts on */
+ pushq $X86_EFLAGS_IF /* eflags - interrupts on */
CFI_ADJUST_CFA_OFFSET 8
/*CFI_REL_OFFSET rflags,0*/
pushq $__KERNEL_CS /* cs */
--
1.6.0.4.603.gbc9c0

2008-11-26 19:18:33

by Cyrill Gorcunov

[permalink] [raw]
Subject: [PATCH 5/5] x86: entry_64.S - trivial: space, comments fixup

From: Cyrill Gorcunov <[email protected]>

Impact: cleanup

Signed-off-by: Cyrill Gorcunov <[email protected]>
---
arch/x86/kernel/entry_64.S | 92 ++++++++++++++++++++++---------------------
1 files changed, 47 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index a21be86..5ccf410 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1119,8 +1119,8 @@ paranoidzeroentry machine_check do_machine_check
zeroentry simd_coprocessor_error do_simd_coprocessor_error

/*
- * "Paranoid" exit path from exception stack.
- * Paranoid because this is used by NMIs and cannot take
+ * "Paranoid" exit path from exception stack.
+ * Paranoid because this is used by NMIs and cannot take
* any kernel state for granted.
* We don't do kernel preemption checks here, because only
* NMI should be common and it does not enable IRQs and
@@ -1225,7 +1225,7 @@ error_kernelspace:
cmpq %rcx,RIP+8(%rsp)
je error_swapgs
cmpq $gs_change,RIP+8(%rsp)
- je error_swapgs
+ je error_swapgs
jmp error_sti
KPROBE_END(error_entry)

@@ -1249,36 +1249,36 @@ KPROBE_ENTRY(error_exit)
CFI_ENDPROC
KPROBE_END(error_exit)

- /* Reload gs selector with exception handling */
- /* edi: new selector */
+ /* Reload gs selector with exception handling */
+ /* edi: new selector */
ENTRY(native_load_gs_index)
CFI_STARTPROC
pushf
CFI_ADJUST_CFA_OFFSET 8
DISABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
- SWAPGS
+ SWAPGS
gs_change:
- movl %edi,%gs
+ movl %edi,%gs
2: mfence /* workaround */
SWAPGS
- popf
+ popf
CFI_ADJUST_CFA_OFFSET -8
- ret
+ ret
CFI_ENDPROC
END(native_load_gs_index)

- .section __ex_table,"a"
- .align 8
- .quad gs_change,bad_gs
- .previous
- .section .fixup,"ax"
+ .section __ex_table,"a"
+ .align 8
+ .quad gs_change,bad_gs
+ .previous
+ .section .fixup,"ax"
/* running with kernelgs */
bad_gs:
SWAPGS /* switch back to user gs */
xorl %eax,%eax
- movl %eax,%gs
- jmp 2b
- .previous
+ movl %eax,%gs
+ jmp 2b
+ .previous

/*
* Create a kernel thread.
@@ -1313,7 +1313,7 @@ ENTRY(kernel_thread)
* so internally to the x86_64 port you can rely on kernel_thread()
* not to reschedule the child before returning, this avoids the need
* of hacks for example to fork off the per-CPU idle tasks.
- * [Hopefully no generic code relies on the reschedule -AK]
+ * [Hopefully no generic code relies on the reschedule -AK]
*/
RESTORE_ALL
UNFAKE_STACK_FRAME
@@ -1420,7 +1420,7 @@ nmi_schedule:
CFI_ENDPROC
#else
jmp paranoid_exit
- CFI_ENDPROC
+ CFI_ENDPROC
#endif
KPROBE_END(nmi)

@@ -1455,22 +1455,24 @@ KPROBE_END(ignore_sysret)
zeroentry xen_hypervisor_callback xen_do_hypervisor_callback

/*
-# A note on the "critical region" in our callback handler.
-# We want to avoid stacking callback handlers due to events occurring
-# during handling of the last event. To do this, we keep events disabled
-# until we've done all processing. HOWEVER, we must enable events before
-# popping the stack frame (can't be done atomically) and so it would still
-# be possible to get enough handler activations to overflow the stack.
-# Although unlikely, bugs of that kind are hard to track down, so we'd
-# like to avoid the possibility.
-# So, on entry to the handler we detect whether we interrupted an
-# existing activation in its critical region -- if so, we pop the current
-# activation and restart the handler using the previous one.
-*/
+ * A note on the "critical region" in our callback handler.
+ * We want to avoid stacking callback handlers due to events occurring
+ * during handling of the last event. To do this, we keep events disabled
+ * until we've done all processing. HOWEVER, we must enable events before
+ * popping the stack frame (can't be done atomically) and so it would still
+ * be possible to get enough handler activations to overflow the stack.
+ * Although unlikely, bugs of that kind are hard to track down, so we'd
+ * like to avoid the possibility.
+ * So, on entry to the handler we detect whether we interrupted an
+ * existing activation in its critical region -- if so, we pop the current
+ * activation and restart the handler using the previous one.
+ */
ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs)
CFI_STARTPROC
-/* Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
- see the correct pointer to the pt_regs */
+/*
+ * Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
+ * see the correct pointer to the pt_regs
+ */
movq %rdi, %rsp # we don't return, adjust the stack frame
CFI_ENDPROC
DEFAULT_FRAME
@@ -1488,18 +1490,18 @@ ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs)
END(do_hypervisor_callback)

/*
-# Hypervisor uses this for application faults while it executes.
-# We get here for two reasons:
-# 1. Fault while reloading DS, ES, FS or GS
-# 2. Fault while executing IRET
-# Category 1 we do not need to fix up as Xen has already reloaded all segment
-# registers that could be reloaded and zeroed the others.
-# Category 2 we fix up by killing the current process. We cannot use the
-# normal Linux return path in this case because if we use the IRET hypercall
-# to pop the stack frame we end up in an infinite loop of failsafe callbacks.
-# We distinguish between categories by comparing each saved segment register
-# with its current contents: any discrepancy means we in category 1.
-*/
+ * Hypervisor uses this for application faults while it executes.
+ * We get here for two reasons:
+ * 1. Fault while reloading DS, ES, FS or GS
+ * 2. Fault while executing IRET
+ * Category 1 we do not need to fix up as Xen has already reloaded all segment
+ * registers that could be reloaded and zeroed the others.
+ * Category 2 we fix up by killing the current process. We cannot use the
+ * normal Linux return path in this case because if we use the IRET hypercall
+ * to pop the stack frame we end up in an infinite loop of failsafe callbacks.
+ * We distinguish between categories by comparing each saved segment register
+ * with its current contents: any discrepancy means we in category 1.
+ */
ENTRY(xen_failsafe_callback)
INTR_FRAME 1 (6*8)
/*CFI_REL_OFFSET gs,GS*/
--
1.6.0.4.603.gbc9c0

2008-11-26 19:18:50

by Cyrill Gorcunov

[permalink] [raw]
Subject: [PATCH 4/5] x86: ret_from_fork - add CFI proc annotation

From: Cyrill Gorcunov <[email protected]>

ret_from_fork is definitly procedure so mark it as that.

Signed-off-by: Cyrill Gorcunov <[email protected]>
---
arch/x86/kernel/entry_64.S | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index c409f73..a21be86 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -370,6 +370,7 @@ END(save_paranoid)
*/
/* rdi: prev */
ENTRY(ret_from_fork)
+ CFI_STARTPROC
DEFAULT_FRAME
push kernel_eflags(%rip)
CFI_ADJUST_CFA_OFFSET 8
--
1.6.0.4.603.gbc9c0

2008-11-26 19:33:49

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back

[[email protected] - Wed, Nov 26, 2008 at 10:17:01PM +0300]
| From: Cyrill Gorcunov <[email protected]>
|
| Impact: cleanup, microoptimization
|
| Use straightforward jumping if possible, cpu like it.

Wrong word, not _straight_ forward but forward branch target
in this case.

|
| Signed-off-by: Cyrill Gorcunov <[email protected]>
| ---
| arch/x86/kernel/entry_64.S | 10 ++++------
| 1 files changed, 4 insertions(+), 6 deletions(-)
|
- Cyrill -

2008-11-26 20:04:37

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back

[email protected] writes:
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -379,7 +379,10 @@ ENTRY(ret_from_fork)
> GET_THREAD_INFO(%rcx)
> testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),TI_flags(%rcx)
> CFI_REMEMBER_STATE
> - jnz rff_trace
> + jz rff_action
> + movq %rsp,%rdi
> + call syscall_trace_leave
> + GET_THREAD_INFO(%rcx)

The uncommon path is supposed to be out of line. I don't think
the CPU will like that.

-Andi

> rff_action:

--
[email protected]

2008-11-26 20:11:13

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back

[Andi Kleen - Wed, Nov 26, 2008 at 09:04:33PM +0100]
| [email protected] writes:
| > --- a/arch/x86/kernel/entry_64.S
| > +++ b/arch/x86/kernel/entry_64.S
| > @@ -379,7 +379,10 @@ ENTRY(ret_from_fork)
| > GET_THREAD_INFO(%rcx)
| > testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),TI_flags(%rcx)
| > CFI_REMEMBER_STATE
| > - jnz rff_trace
| > + jz rff_action
| > + movq %rsp,%rdi
| > + call syscall_trace_leave
| > + GET_THREAD_INFO(%rcx)
|
| The uncommon path is supposed to be out of line. I don't think
| the CPU will like that.
|
| -Andi
|
| > rff_action:
|
| --
| [email protected]
|

Aha! Thanks Andi for review.

- Cyrill -

2008-11-26 20:15:38

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back

[Cyrill Gorcunov - Wed, Nov 26, 2008 at 11:10:54PM +0300]
...
| |
| | The uncommon path is supposed to be out of line. I don't think
| | the CPU will like that.
| |
| | -Andi
| |
| | > rff_action:
| |
| | --
| | [email protected]
| |
|
| Aha! Thanks Andi for review.
|
| - Cyrill -

And if I would be tracing all and everywhere would it be
common then? :)

- Cyrill -

2008-11-27 10:15:36

by Alexander van Heukelum

[permalink] [raw]
Subject: Re: [PATCH 4/5] x86: ret_from_fork - add CFI proc annotation


On Wed, 26 Nov 2008 22:17:03 +0300, [email protected] said:
> From: Cyrill Gorcunov <[email protected]>
>
> ret_from_fork is definitly procedure so mark it as that.
>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> ---
> arch/x86/kernel/entry_64.S | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index c409f73..a21be86 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -370,6 +370,7 @@ END(save_paranoid)
> */
> /* rdi: prev */
> ENTRY(ret_from_fork)
> + CFI_STARTPROC
> DEFAULT_FRAME
> push kernel_eflags(%rip)
> CFI_ADJUST_CFA_OFFSET 8

This is not needed/wanted. DEFAULT_FRAME includes "CFI_STARTPROC simple"
already.

Alexander

> 1.6.0.4.603.gbc9c0
>
--
Alexander van Heukelum
[email protected]

--
http://www.fastmail.fm - The professional email service

2008-11-27 10:56:31

by Alexander van Heukelum

[permalink] [raw]
Subject: Re: [PATCH 1/5] x86: entry_64.S - use ENTRY to define child_rip


On Wed, 26 Nov 2008 22:17:00 +0300, [email protected] said:
> From: Cyrill Gorcunov <[email protected]>
>
> child_rip is called not by its name but indirectly
> rather so make it global and aligned.
>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> ---
> arch/x86/kernel/entry_64.S | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 64688db..098ba0b 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1322,7 +1322,7 @@ ENTRY(kernel_thread)
> CFI_ENDPROC
> END(kernel_thread)
>
> -child_rip:
> +ENTRY(child_rip)
> pushq $0 # fake return address
> CFI_STARTPROC
> /*

Acked-by: Alexander van Heukelum <[email protected]>

> 1.6.0.4.603.gbc9c0
>
--
Alexander van Heukelum
[email protected]

--
http://www.fastmail.fm - Accessible with your email software
or over the web

2008-11-27 10:56:51

by Alexander van Heukelum

[permalink] [raw]
Subject: Re: [PATCH 3/5] x86: entry_64.S - use X86_EFLAGS_IF instead of hardcoded number


On Wed, 26 Nov 2008 22:17:02 +0300, [email protected] said:
> From: Cyrill Gorcunov <[email protected]>
>
> Impact: cleanup
>
> Signed-off-by: Cyrill Gorcunov <[email protected]>

Acked-by: Alexander van Heukelum <[email protected]>


> arch/x86/kernel/entry_64.S | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 7b52b0f..c409f73 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -199,7 +199,7 @@ ENTRY(native_usergs_sysret64)
> pushq %rax /* rsp */
> CFI_ADJUST_CFA_OFFSET 8
> CFI_REL_OFFSET rsp,0
> - pushq $(1<<9) /* eflags - interrupts on */
> + pushq $X86_EFLAGS_IF /* eflags - interrupts on */
> CFI_ADJUST_CFA_OFFSET 8
> /*CFI_REL_OFFSET rflags,0*/
> pushq $__KERNEL_CS /* cs */
>
> 1.6.0.4.603.gbc9c0
>
--
Alexander van Heukelum
[email protected]

--
http://www.fastmail.fm - A no graphics, no pop-ups email service

2008-11-27 10:57:16

by Alexander van Heukelum

[permalink] [raw]
Subject: Re: [PATCH 5/5] x86: entry_64.S - trivial: space, comments fixup


On Wed, 26 Nov 2008 22:17:04 +0300, [email protected] said:
> From: Cyrill Gorcunov <[email protected]>
>
> Impact: cleanup
>
> Signed-off-by: Cyrill Gorcunov <[email protected]>

Acked-by: Alexander van Heukelum <[email protected]>


> arch/x86/kernel/entry_64.S | 92
> ++++++++++++++++++++++---------------------
> 1 files changed, 47 insertions(+), 45 deletions(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index a21be86..5ccf410 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1119,8 +1119,8 @@ paranoidzeroentry machine_check do_machine_check
> zeroentry simd_coprocessor_error do_simd_coprocessor_error
>
> /*
> - * "Paranoid" exit path from exception stack.
> - * Paranoid because this is used by NMIs and cannot take
> + * "Paranoid" exit path from exception stack.
> + * Paranoid because this is used by NMIs and cannot take
> * any kernel state for granted.
> * We don't do kernel preemption checks here, because only
> * NMI should be common and it does not enable IRQs and
> @@ -1225,7 +1225,7 @@ error_kernelspace:
> cmpq %rcx,RIP+8(%rsp)
> je error_swapgs
> cmpq $gs_change,RIP+8(%rsp)
> - je error_swapgs
> + je error_swapgs
> jmp error_sti
> KPROBE_END(error_entry)
>
> @@ -1249,36 +1249,36 @@ KPROBE_ENTRY(error_exit)
> CFI_ENDPROC
> KPROBE_END(error_exit)
>
> - /* Reload gs selector with exception handling */
> - /* edi: new selector */
> + /* Reload gs selector with exception handling */
> + /* edi: new selector */
> ENTRY(native_load_gs_index)
> CFI_STARTPROC
> pushf
> CFI_ADJUST_CFA_OFFSET 8
> DISABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
> - SWAPGS
> + SWAPGS
> gs_change:
> - movl %edi,%gs
> + movl %edi,%gs
> 2: mfence /* workaround */
> SWAPGS
> - popf
> + popf
> CFI_ADJUST_CFA_OFFSET -8
> - ret
> + ret
> CFI_ENDPROC
> END(native_load_gs_index)
>
> - .section __ex_table,"a"
> - .align 8
> - .quad gs_change,bad_gs
> - .previous
> - .section .fixup,"ax"
> + .section __ex_table,"a"
> + .align 8
> + .quad gs_change,bad_gs
> + .previous
> + .section .fixup,"ax"
> /* running with kernelgs */
> bad_gs:
> SWAPGS /* switch back to user gs */
> xorl %eax,%eax
> - movl %eax,%gs
> - jmp 2b
> - .previous
> + movl %eax,%gs
> + jmp 2b
> + .previous
>
> /*
> * Create a kernel thread.
> @@ -1313,7 +1313,7 @@ ENTRY(kernel_thread)
> * so internally to the x86_64 port you can rely on kernel_thread()
> * not to reschedule the child before returning, this avoids the need
> * of hacks for example to fork off the per-CPU idle tasks.
> - * [Hopefully no generic code relies on the reschedule -AK]
> + * [Hopefully no generic code relies on the reschedule -AK]
> */
> RESTORE_ALL
> UNFAKE_STACK_FRAME
> @@ -1420,7 +1420,7 @@ nmi_schedule:
> CFI_ENDPROC
> #else
> jmp paranoid_exit
> - CFI_ENDPROC
> + CFI_ENDPROC
> #endif
> KPROBE_END(nmi)
>
> @@ -1455,22 +1455,24 @@ KPROBE_END(ignore_sysret)
> zeroentry xen_hypervisor_callback xen_do_hypervisor_callback
>
> /*
> -# A note on the "critical region" in our callback handler.
> -# We want to avoid stacking callback handlers due to events occurring
> -# during handling of the last event. To do this, we keep events disabled
> -# until we've done all processing. HOWEVER, we must enable events before
> -# popping the stack frame (can't be done atomically) and so it would
> still
> -# be possible to get enough handler activations to overflow the stack.
> -# Although unlikely, bugs of that kind are hard to track down, so we'd
> -# like to avoid the possibility.
> -# So, on entry to the handler we detect whether we interrupted an
> -# existing activation in its critical region -- if so, we pop the
> current
> -# activation and restart the handler using the previous one.
> -*/
> + * A note on the "critical region" in our callback handler.
> + * We want to avoid stacking callback handlers due to events occurring
> + * during handling of the last event. To do this, we keep events
> disabled
> + * until we've done all processing. HOWEVER, we must enable events
> before
> + * popping the stack frame (can't be done atomically) and so it would
> still
> + * be possible to get enough handler activations to overflow the stack.
> + * Although unlikely, bugs of that kind are hard to track down, so we'd
> + * like to avoid the possibility.
> + * So, on entry to the handler we detect whether we interrupted an
> + * existing activation in its critical region -- if so, we pop the
> current
> + * activation and restart the handler using the previous one.
> + */
> ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct
> *pt_regs)
> CFI_STARTPROC
> -/* Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
> - see the correct pointer to the pt_regs */
> +/*
> + * Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
> + * see the correct pointer to the pt_regs
> + */
> movq %rdi, %rsp # we don't return, adjust the stack frame
> CFI_ENDPROC
> DEFAULT_FRAME
> @@ -1488,18 +1490,18 @@ ENTRY(xen_do_hypervisor_callback) #
> do_hypervisor_callback(struct *pt_regs)
> END(do_hypervisor_callback)
>
> /*
> -# Hypervisor uses this for application faults while it executes.
> -# We get here for two reasons:
> -# 1. Fault while reloading DS, ES, FS or GS
> -# 2. Fault while executing IRET
> -# Category 1 we do not need to fix up as Xen has already reloaded all
> segment
> -# registers that could be reloaded and zeroed the others.
> -# Category 2 we fix up by killing the current process. We cannot use the
> -# normal Linux return path in this case because if we use the IRET
> hypercall
> -# to pop the stack frame we end up in an infinite loop of failsafe
> callbacks.
> -# We distinguish between categories by comparing each saved segment
> register
> -# with its current contents: any discrepancy means we in category 1.
> -*/
> + * Hypervisor uses this for application faults while it executes.
> + * We get here for two reasons:
> + * 1. Fault while reloading DS, ES, FS or GS
> + * 2. Fault while executing IRET
> + * Category 1 we do not need to fix up as Xen has already reloaded all
> segment
> + * registers that could be reloaded and zeroed the others.
> + * Category 2 we fix up by killing the current process. We cannot use
> the
> + * normal Linux return path in this case because if we use the IRET
> hypercall
> + * to pop the stack frame we end up in an infinite loop of failsafe
> callbacks.
> + * We distinguish between categories by comparing each saved segment
> register
> + * with its current contents: any discrepancy means we in category 1.
> + */
> ENTRY(xen_failsafe_callback)
> INTR_FRAME 1 (6*8)
> /*CFI_REL_OFFSET gs,GS*/
> --
> 1.6.0.4.603.gbc9c0
>
--
Alexander van Heukelum
[email protected]

--
http://www.fastmail.fm - The way an email service should be

2008-11-27 11:27:23

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 4/5] x86: ret_from_fork - add CFI proc annotation

On Thu, Nov 27, 2008 at 1:15 PM, Alexander van Heukelum
<[email protected]> wrote:
>
> On Wed, 26 Nov 2008 22:17:03 +0300, [email protected] said:
>> From: Cyrill Gorcunov <[email protected]>
>>
>> ret_from_fork is definitly procedure so mark it as that.
>>
>> Signed-off-by: Cyrill Gorcunov <[email protected]>
>> ---
>> arch/x86/kernel/entry_64.S | 1 +
>> 1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index c409f73..a21be86 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -370,6 +370,7 @@ END(save_paranoid)
>> */
>> /* rdi: prev */
>> ENTRY(ret_from_fork)
>> + CFI_STARTPROC
>> DEFAULT_FRAME
>> push kernel_eflags(%rip)
>> CFI_ADJUST_CFA_OFFSET 8
>
> This is not needed/wanted. DEFAULT_FRAME includes "CFI_STARTPROC simple"
> already.
>
> Alexander
>
>> 1.6.0.4.603.gbc9c0
>>
> --
> Alexander van Heukelum
> [email protected]
>
> --
> http://www.fastmail.fm - The professional email service
>
>

Indeed, thanks (such an indirect and nested definition sometime confusing)

2008-11-27 12:01:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/5] x86: entry_64.S - use X86_EFLAGS_IF instead of hardcoded number


* Alexander van Heukelum <[email protected]> wrote:

> On Wed, 26 Nov 2008 22:17:02 +0300, [email protected] said:
> > From: Cyrill Gorcunov <[email protected]>
> >
> > Impact: cleanup
> >
> > Signed-off-by: Cyrill Gorcunov <[email protected]>
>
> Acked-by: Alexander van Heukelum <[email protected]>

applied, thanks guys!

Ingo

2008-11-27 12:02:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 5/5] x86: entry_64.S - trivial: space, comments fixup


* Alexander van Heukelum <[email protected]> wrote:

> On Wed, 26 Nov 2008 22:17:04 +0300, [email protected] said:
> > From: Cyrill Gorcunov <[email protected]>
> >
> > Impact: cleanup
> >
> > Signed-off-by: Cyrill Gorcunov <[email protected]>
>
> Acked-by: Alexander van Heukelum <[email protected]>

Cyrill, could you please rework the still relevant bits against latest
tip/master? Alexander's wider patch got in the way.

Thanks,

Ingo

2008-11-27 12:04:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/5] x86: entry_64.S - use ENTRY to define child_rip


* Alexander van Heukelum <[email protected]> wrote:

>
> On Wed, 26 Nov 2008 22:17:00 +0300, [email protected] said:
> > From: Cyrill Gorcunov <[email protected]>
> >
> > child_rip is called not by its name but indirectly
> > rather so make it global and aligned.
> >
> > Signed-off-by: Cyrill Gorcunov <[email protected]>
> > ---
> > arch/x86/kernel/entry_64.S | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index 64688db..098ba0b 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -1322,7 +1322,7 @@ ENTRY(kernel_thread)
> > CFI_ENDPROC
> > END(kernel_thread)
> >
> > -child_rip:
> > +ENTRY(child_rip)
> > pushq $0 # fake return address
> > CFI_STARTPROC
> > /*
>
> Acked-by: Alexander van Heukelum <[email protected]>

applied to tip/x86/irq, thanks!

Ingo

2008-11-27 12:25:26

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 5/5] x86: entry_64.S - trivial: space, comments fixup

On Thu, Nov 27, 2008 at 3:02 PM, Ingo Molnar <[email protected]> wrote:
>
> * Alexander van Heukelum <[email protected]> wrote:
>
>> On Wed, 26 Nov 2008 22:17:04 +0300, [email protected] said:
>> > From: Cyrill Gorcunov <[email protected]>
>> >
>> > Impact: cleanup
>> >
>> > Signed-off-by: Cyrill Gorcunov <[email protected]>
>>
>> Acked-by: Alexander van Heukelum <[email protected]>
>
> Cyrill, could you please rework the still relevant bits against latest
> tip/master? Alexander's wider patch got in the way.
>
> Thanks,
>
> Ingo
>

yep, no problem (wait 'till evening please)

2008-11-27 13:41:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back


* Cyrill Gorcunov <[email protected]> wrote:

> [Andi Kleen - Wed, Nov 26, 2008 at 09:04:33PM +0100]
> | [email protected] writes:
> | > --- a/arch/x86/kernel/entry_64.S
> | > +++ b/arch/x86/kernel/entry_64.S
> | > @@ -379,7 +379,10 @@ ENTRY(ret_from_fork)
> | > GET_THREAD_INFO(%rcx)
> | > testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),TI_flags(%rcx)
> | > CFI_REMEMBER_STATE
> | > - jnz rff_trace
> | > + jz rff_action
> | > + movq %rsp,%rdi
> | > + call syscall_trace_leave
> | > + GET_THREAD_INFO(%rcx)
> |
> | The uncommon path is supposed to be out of line. I don't think
> | the CPU will like that.
> |
> | -Andi
> |
> | > rff_action:
> |
> | --
> | [email protected]
> |
>
> Aha! Thanks Andi for review.

Unfortunately that review got you off the right track ...

The principle you applied was good: entry_64.S is murky, and we want
to simplify the current overcomplicated assembly code flow spaghetti
there.

Note that if you take a closer look at rff_trace/rff_action
ret_from_fork code here, you'll realize that it does all the wrong
things: for example it checks the TIF flag - while later on jumping
back to the ret-from-syscall path - duplicating the check needlessly.

But it gets worse than that: checking for _TIF_SYSCALL_TRACE is
completely unnecessary here because we clear that flag for every
freshly forked task. So the whole "tracing" code here, for which there
is this "out of line jump optimization" is in reality completely dead
code in the ptrace case...

Could you test something like the patch attached below, which cleans
up this code and applies the code reduction and speedup? Warning:
completely untested! Please check that things like strace -f and gdb
attaching to forked tasks still works fine. (it should by all means)

Thanks,

Ingo

---
arch/x86/kernel/entry_64.S | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)

Index: linux/arch/x86/kernel/entry_64.S
===================================================================
--- linux.orig/arch/x86/kernel/entry_64.S
+++ linux/arch/x86/kernel/entry_64.S
@@ -366,34 +366,35 @@ ENTRY(save_paranoid)
END(save_paranoid)

/*
- * A newly forked process directly context switches into this.
+ * A newly forked process directly context switches into this address.
+ *
+ * rdi: prev task we switched from
*/
-/* rdi: prev */
ENTRY(ret_from_fork)
DEFAULT_FRAME
+
push kernel_eflags(%rip)
CFI_ADJUST_CFA_OFFSET 8
- popf # reset kernel eflags
+ popf # reset kernel eflags
CFI_ADJUST_CFA_OFFSET -8
- call schedule_tail
+
+ call schedule_tail # rdi: 'prev' task parameter
+
GET_THREAD_INFO(%rcx)
- testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),TI_flags(%rcx)
+
CFI_REMEMBER_STATE
- jnz rff_trace
-rff_action:
RESTORE_REST
- testl $3,CS-ARGOFFSET(%rsp) # from kernel_thread?
+
+ testl $3, CS-ARGOFFSET(%rsp) # from kernel_thread?
je int_ret_from_sys_call
- testl $_TIF_IA32,TI_flags(%rcx)
+
+ testl $_TIF_IA32, TI_flags(%rcx) # 32-bit compat task needs IRET
jnz int_ret_from_sys_call
+
RESTORE_TOP_OF_STACK %rdi, -ARGOFFSET
- jmp ret_from_sys_call
+ jmp ret_from_sys_call # go to the SYSRET fastpath
+
CFI_RESTORE_STATE
-rff_trace:
- movq %rsp,%rdi
- call syscall_trace_leave
- GET_THREAD_INFO(%rcx)
- jmp rff_action
CFI_ENDPROC
END(ret_from_fork)

2008-11-27 14:06:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back

> But it gets worse than that: checking for _TIF_SYSCALL_TRACE is
> completely unnecessary here because we clear that flag for every

That's true. I found your 2005 changeset which did that, although
it seems to have been written before meaningful changelogs
became en vogue so it's unclear why. But assuming strace/gdb
always reattach it's probably ok.

But _TIF_SYSCALL_AUDIT for which it also checks is not always cleared so
it's not completely dead.

That said in theory the ret_from_sys checks should handle those too,
so it might be unnecessary (I don't remember why I added the additional
check here). Still removing it would change the order
of signal checks versus trace checks and since signal
handling is always hairy one would need to be very careful
when changing it to not break anything.

-Andi

2008-11-27 16:21:14

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back

[Ingo Molnar - Thu, Nov 27, 2008 at 02:41:21PM +0100]
...
|
| Could you test something like the patch attached below, which cleans
| up this code and applies the code reduction and speedup? Warning:
| completely untested! Please check that things like strace -f and gdb
| attaching to forked tasks still works fine. (it should by all means)
|
...

Thanks for explanation Ingo! Will try to test this but I will require
some time.

- Cyrill -

2008-11-27 18:10:25

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 5/5] x86: entry_64.S - trivial: space, comments fixup

[Ingo Molnar - Thu, Nov 27, 2008 at 01:02:34PM +0100]
| > > From: Cyrill Gorcunov <[email protected]>
| > >
| > > Impact: cleanup
| > >
| > > Signed-off-by: Cyrill Gorcunov <[email protected]>
| >
| > Acked-by: Alexander van Heukelum <[email protected]>
|
| Cyrill, could you please rework the still relevant bits against latest
| tip/master? Alexander's wider patch got in the way.
|
| Thanks,
|
| Ingo
|

Here is an updated version
---
From: Cyrill Gorcunov <[email protected]>
Subject: x86: entry_64.S - trivial: space, comments fixup

Impact: cleanup

Signed-off-by: Cyrill Gorcunov <[email protected]>
---
arch/x86/kernel/entry_64.S | 94 ++++++++++++++++++++++-----------------------
1 file changed, 48 insertions(+), 46 deletions(-)

Index: linux-2.6.git/arch/x86/kernel/entry_64.S
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/entry_64.S
+++ linux-2.6.git/arch/x86/kernel/entry_64.S
@@ -1027,7 +1027,7 @@ END(\sym)

.macro paranoidzeroentry_ist sym do_sym ist
ENTRY(\sym)
- INTR_FRAME
+ INTR_FRAME
PARAVIRT_ADJUST_EXCEPTION_FRAME
pushq $-1 /* ORIG_RAX: no syscall to restart */
CFI_ADJUST_CFA_OFFSET 8
@@ -1095,36 +1095,36 @@ zeroentry coprocessor_error do_coprocess
errorentry alignment_check do_alignment_check
zeroentry simd_coprocessor_error do_simd_coprocessor_error

- /* Reload gs selector with exception handling */
- /* edi: new selector */
+ /* Reload gs selector with exception handling */
+ /* edi: new selector */
ENTRY(native_load_gs_index)
CFI_STARTPROC
pushf
CFI_ADJUST_CFA_OFFSET 8
DISABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
- SWAPGS
+ SWAPGS
gs_change:
- movl %edi,%gs
+ movl %edi,%gs
2: mfence /* workaround */
SWAPGS
- popf
+ popf
CFI_ADJUST_CFA_OFFSET -8
- ret
+ ret
CFI_ENDPROC
END(native_load_gs_index)

- .section __ex_table,"a"
- .align 8
- .quad gs_change,bad_gs
- .previous
- .section .fixup,"ax"
+ .section __ex_table,"a"
+ .align 8
+ .quad gs_change,bad_gs
+ .previous
+ .section .fixup,"ax"
/* running with kernelgs */
bad_gs:
SWAPGS /* switch back to user gs */
xorl %eax,%eax
- movl %eax,%gs
- jmp 2b
- .previous
+ movl %eax,%gs
+ jmp 2b
+ .previous

/*
* Create a kernel thread.
@@ -1159,7 +1159,7 @@ ENTRY(kernel_thread)
* so internally to the x86_64 port you can rely on kernel_thread()
* not to reschedule the child before returning, this avoids the need
* of hacks for example to fork off the per-CPU idle tasks.
- * [Hopefully no generic code relies on the reschedule -AK]
+ * [Hopefully no generic code relies on the reschedule -AK]
*/
RESTORE_ALL
UNFAKE_STACK_FRAME
@@ -1239,22 +1239,24 @@ END(call_softirq)
zeroentry xen_hypervisor_callback xen_do_hypervisor_callback

/*
-# A note on the "critical region" in our callback handler.
-# We want to avoid stacking callback handlers due to events occurring
-# during handling of the last event. To do this, we keep events disabled
-# until we've done all processing. HOWEVER, we must enable events before
-# popping the stack frame (can't be done atomically) and so it would still
-# be possible to get enough handler activations to overflow the stack.
-# Although unlikely, bugs of that kind are hard to track down, so we'd
-# like to avoid the possibility.
-# So, on entry to the handler we detect whether we interrupted an
-# existing activation in its critical region -- if so, we pop the current
-# activation and restart the handler using the previous one.
-*/
+ * A note on the "critical region" in our callback handler.
+ * We want to avoid stacking callback handlers due to events occurring
+ * during handling of the last event. To do this, we keep events disabled
+ * until we've done all processing. HOWEVER, we must enable events before
+ * popping the stack frame (can't be done atomically) and so it would still
+ * be possible to get enough handler activations to overflow the stack.
+ * Although unlikely, bugs of that kind are hard to track down, so we'd
+ * like to avoid the possibility.
+ * So, on entry to the handler we detect whether we interrupted an
+ * existing activation in its critical region -- if so, we pop the current
+ * activation and restart the handler using the previous one.
+ */
ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs)
CFI_STARTPROC
-/* Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
- see the correct pointer to the pt_regs */
+/*
+ * Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
+ * see the correct pointer to the pt_regs
+ */
movq %rdi, %rsp # we don't return, adjust the stack frame
CFI_ENDPROC
DEFAULT_FRAME
@@ -1272,18 +1274,18 @@ ENTRY(xen_do_hypervisor_callback) # do
END(do_hypervisor_callback)

/*
-# Hypervisor uses this for application faults while it executes.
-# We get here for two reasons:
-# 1. Fault while reloading DS, ES, FS or GS
-# 2. Fault while executing IRET
-# Category 1 we do not need to fix up as Xen has already reloaded all segment
-# registers that could be reloaded and zeroed the others.
-# Category 2 we fix up by killing the current process. We cannot use the
-# normal Linux return path in this case because if we use the IRET hypercall
-# to pop the stack frame we end up in an infinite loop of failsafe callbacks.
-# We distinguish between categories by comparing each saved segment register
-# with its current contents: any discrepancy means we in category 1.
-*/
+ * Hypervisor uses this for application faults while it executes.
+ * We get here for two reasons:
+ * 1. Fault while reloading DS, ES, FS or GS
+ * 2. Fault while executing IRET
+ * Category 1 we do not need to fix up as Xen has already reloaded all segment
+ * registers that could be reloaded and zeroed the others.
+ * Category 2 we fix up by killing the current process. We cannot use the
+ * normal Linux return path in this case because if we use the IRET hypercall
+ * to pop the stack frame we end up in an infinite loop of failsafe callbacks.
+ * We distinguish between categories by comparing each saved segment register
+ * with its current contents: any discrepancy means we in category 1.
+ */
ENTRY(xen_failsafe_callback)
INTR_FRAME 1 (6*8)
/*CFI_REL_OFFSET gs,GS*/
@@ -1347,8 +1349,8 @@ paranoidzeroentry machine_check do_machi
#endif

/*
- * "Paranoid" exit path from exception stack.
- * Paranoid because this is used by NMIs and cannot take
+ * "Paranoid" exit path from exception stack.
+ * Paranoid because this is used by NMIs and cannot take
* any kernel state for granted.
* We don't do kernel preemption checks here, because only
* NMI should be common and it does not enable IRQs and
@@ -1453,7 +1455,7 @@ error_kernelspace:
cmpq %rcx,RIP+8(%rsp)
je error_swapgs
cmpq $gs_change,RIP+8(%rsp)
- je error_swapgs
+ je error_swapgs
jmp error_sti
END(error_entry)

@@ -1529,7 +1531,7 @@ nmi_schedule:
CFI_ENDPROC
#else
jmp paranoid_exit
- CFI_ENDPROC
+ CFI_ENDPROC
#endif
END(nmi)

2008-11-27 19:12:48

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back

[Ingo Molnar - Thu, Nov 27, 2008 at 02:41:21PM +0100]
|
| * Cyrill Gorcunov <[email protected]> wrote:
|
...
|
| Could you test something like the patch attached below, which cleans
| up this code and applies the code reduction and speedup? Warning:
| completely untested! Please check that things like strace -f and gdb
| attaching to forked tasks still works fine. (it should by all means)
|
| Thanks,
|
| Ingo

Ingo, I found 2.6.26 test machine where I've patched the entry_64.S
in a manner you pointed. So here is a test program and strace output.
Seems all works fine. Didn't check "audit" code which I simply don't
know how to be done.

- Cyrill -
---

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
int pid;

pid = fork();

if (pid < 0) {
fprintf(stderr, "Fork failed!\n");
exit(-1);
}

else if (pid == 0) {
printf("I am the child, return from fork=%d\n", pid);
} else {
printf("I am the parent, return from fork, child pid=%d\n", pid);
printf("Parent exiting!\n");
exit(0);
}
}
---

5897 execve("./main", ["./main"], [/* 18 vars */]) = 0
5897 brk(0) = 0x2514000
5897 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0e517f000
5897 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
5897 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0e517d000
5897 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
5897 open("/etc/ld.so.cache", O_RDONLY) = 3
5897 fstat(3, {st_mode=S_IFREG|0644, st_size=48530, ...}) = 0
5897 mmap(NULL, 48530, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa0e5171000
5897 close(3) = 0
5897 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
5897 open("/lib/libc.so.6", O_RDONLY) = 3
5897 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\345"..., 832) = 832
5897 fstat(3, {st_mode=S_IFREG|0755, st_size=1502520, ...}) = 0
5897 mmap(NULL, 3609304, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa0e4bf0000
5897 mprotect(0x7fa0e4d59000, 2093056, PROT_NONE) = 0
5897 mmap(0x7fa0e4f58000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x168000) = 0x7fa0e4f58000
5897 mmap(0x7fa0e4f5d000, 17112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fa0e4f5d000
5897 close(3) = 0
5897 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0e5170000
5897 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0e516f000
5897 arch_prctl(ARCH_SET_FS, 0x7fa0e516f6e0) = 0
5897 mprotect(0x7fa0e4f58000, 16384, PROT_READ) = 0
5897 mprotect(0x7fa0e5180000, 4096, PROT_READ) = 0
5897 munmap(0x7fa0e5171000, 48530) = 0
5897 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa0e516f770) = 5898
5897 fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(5, 1), ...}) = 0
5898 fstat(1, <unfinished ...>
5897 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS <unfinished ...>
5898 <... fstat resumed> {st_mode=S_IFCHR|0600, st_rdev=makedev(5, 1), ...}) = 0
5897 <... ioctl resumed> , {B38400 opost isig icanon echo ...}) = 0
5897 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
5898 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS <unfinished ...>
5897 <... mmap resumed> ) = 0x7fa0e517c000
5898 <... ioctl resumed> , {B38400 opost isig icanon echo ...}) = 0
5897 write(1, "I am the parent, return from for"..., 50 <unfinished ...>
5898 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
5897 <... write resumed> ) = 50
5898 <... mmap resumed> ) = 0x7fa0e517c000
5897 write(1, "Parent exiting!\n", 16 <unfinished ...>
5898 write(1, "I am the child, return from fork"..., 35 <unfinished ...>
5897 <... write resumed> ) = 16
5898 <... write resumed> ) = 35
5897 exit_group(0) = ?
5898 exit_group(0) = ?
---

2008-11-28 13:48:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back


* Andi Kleen <[email protected]> wrote:

> > But it gets worse than that: checking for _TIF_SYSCALL_TRACE is
> > completely unnecessary here because we clear that flag for every
>
> That's true. I found your 2005 changeset which did that, [...]

that's irrelevant, because all the necessary TIF_ flag processing is
already done in ret_from_sys_call.

The unnecessary TIF_SYSCALL_TRACE code in the 64-bit ret_from_fork was
apparently added by you in 2002:

+ENTRY(ret_from_fork)
+ movq %rbx, %rdi
+ call schedule_tail
+ GET_THREAD_INFO(%rcx)
+ bt $TIF_SYSCALL_TRACE,threadinfo_flags(%rcx)
+ jc rff_trace

when you added it to x86_64 via the changeset quoted below.

at that time the 32-bit code (from which you copied the x86_64 tree,
creating the whole split x86 trees approach) had this simple and
straightforward ret_from_fork implementation:

ENTRY(ret_from_fork)
#if CONFIG_SMP
pushl %ebx
call SYMBOL_NAME(schedule_tail)
addl $4, %esp
#endif
GET_THREAD_INFO(%ebx)
jmp syscall_exit

so by all means the rff_trace/rff_action hack came from you.

Ingo

------------------>
>From 0457d99a336be658cea1a5bdb689de5adb3b382d Mon Sep 17 00:00:00 2001
From: Andi Kleen <[email protected]>
Date: Tue, 12 Feb 2002 20:17:35 -0800
Subject: [PATCH] [PATCH] x86_64 merge: arch + asm

This adds the x86_64 arch and asm directories and a Documentation/x86_64.

It took a bit longer because I first had to make preemption and thread_info
work and also found some other bugs while doing this. The port has been
tested for a long time on UP.

I'm not sure what I should describe. A lot is based on i386 with
a lot of cleanups. I wrote a paper about it for last year's OLS that describes
most of the changes (ftp://ftp.firstfloor.org/pub/ak/x86_64.ps.gz). It is
a bit outdated now, but should give a good overview.

It currently has a completely cut'n'pasted from others+hacked 32bit
emulation. I hope to clean that up in the future by merging the generic
core of this with other 64bit archs.

Thanks,
-Andi
---
Documentation/x86_64/mm.txt | 148 ++
arch/x86_64/Config.help | 531 +++++
arch/x86_64/Makefile | 130 ++
arch/x86_64/boot/Makefile | 92 +
arch/x86_64/boot/bootsect.S | 416 ++++
arch/x86_64/boot/compressed/Makefile | 43 +
arch/x86_64/boot/compressed/head.S | 142 ++
arch/x86_64/boot/compressed/misc.c | 431 ++++
arch/x86_64/boot/compressed/miscsetup.h | 39 +
arch/x86_64/boot/install.sh | 39 +
arch/x86_64/boot/setup.S | 955 ++++++++
arch/x86_64/boot/tools/build.c | 189 ++
arch/x86_64/boot/video.S | 1934 ++++++++++++++++
arch/x86_64/config.in | 240 ++
arch/x86_64/defconfig | 568 +++++
arch/x86_64/ia32/Makefile | 18 +
arch/x86_64/ia32/ia32_binfmt.c | 165 ++
arch/x86_64/ia32/ia32_ioctl.c | 3843 +++++++++++++++++++++++++++++++
arch/x86_64/ia32/ia32_signal.c | 489 ++++
arch/x86_64/ia32/ia32entry.S | 365 +++
arch/x86_64/ia32/ptrace32.c | 312 +++
arch/x86_64/ia32/socket32.c | 686 ++++++
arch/x86_64/ia32/sys_ia32.c | 2788 ++++++++++++++++++++++
arch/x86_64/kernel/Makefile | 38 +
arch/x86_64/kernel/apic.c | 1160 ++++++++++
arch/x86_64/kernel/bluesmoke.c | 174 ++
arch/x86_64/kernel/cpuid.c | 177 ++
arch/x86_64/kernel/early_printk.c | 77 +
arch/x86_64/kernel/entry.S | 640 +++++
arch/x86_64/kernel/head.S | 342 +++
arch/x86_64/kernel/head64.c | 85 +
arch/x86_64/kernel/i387.c | 439 ++++
arch/x86_64/kernel/i8259.c | 485 ++++
arch/x86_64/kernel/init_task.c | 42 +
arch/x86_64/kernel/io_apic.c | 1617 +++++++++++++
arch/x86_64/kernel/ioport.c | 114 +
arch/x86_64/kernel/irq.c | 1198 ++++++++++
arch/x86_64/kernel/ldt.c | 177 ++
arch/x86_64/kernel/mpparse.c | 670 ++++++
arch/x86_64/kernel/msr.c | 279 +++
arch/x86_64/kernel/mtrr.c | 2310 +++++++++++++++++++
arch/x86_64/kernel/nmi.c | 272 +++
arch/x86_64/kernel/pci-dma.c | 33 +
arch/x86_64/kernel/pci-irq.c | 753 ++++++
arch/x86_64/kernel/pci-pc.c | 438 ++++
arch/x86_64/kernel/pci-x86_64.c | 384 +++
arch/x86_64/kernel/pci-x86_64.h | 72 +
arch/x86_64/kernel/process.c | 756 ++++++
arch/x86_64/kernel/ptrace.c | 435 ++++
arch/x86_64/kernel/semaphore.c | 225 ++
arch/x86_64/kernel/setup.c | 1128 +++++++++
arch/x86_64/kernel/setup64.c | 142 ++
arch/x86_64/kernel/signal.c | 591 +++++
arch/x86_64/kernel/smp.c | 584 +++++
arch/x86_64/kernel/smpboot.c | 1023 ++++++++
arch/x86_64/kernel/sys_x86_64.c | 113 +
arch/x86_64/kernel/syscall.c | 25 +
arch/x86_64/kernel/time.c | 494 ++++
arch/x86_64/kernel/trampoline.S | 72 +
arch/x86_64/kernel/traps.c | 771 +++++++
arch/x86_64/kernel/vsyscall.c | 190 ++
arch/x86_64/kernel/x8664_ksyms.c | 162 ++
arch/x86_64/lib/Makefile | 17 +
arch/x86_64/lib/checksum_copy.S | 142 ++
arch/x86_64/lib/dec_and_lock.c | 40 +
arch/x86_64/lib/delay.c | 45 +
arch/x86_64/lib/generic-checksum.c | 124 +
arch/x86_64/lib/getuser.S | 90 +
arch/x86_64/lib/iodebug.c | 11 +
arch/x86_64/lib/mmx.c | 377 +++
arch/x86_64/lib/old-checksum.c | 33 +
arch/x86_64/lib/putuser.S | 88 +
arch/x86_64/lib/rwsem_thunk.S | 27 +
arch/x86_64/lib/usercopy.c | 147 ++
arch/x86_64/mm/Makefile | 13 +
arch/x86_64/mm/extable.c | 62 +
arch/x86_64/mm/fault.c | 324 +++
arch/x86_64/mm/init.c | 387 ++++
arch/x86_64/mm/ioremap.c | 163 ++
arch/x86_64/tools/Makefile | 31 +
arch/x86_64/tools/offset.c | 70 +
arch/x86_64/tools/offset.sed | 7 +
arch/x86_64/vmlinux.lds | 112 +
include/asm-x86_64/a.out.h | 30 +
include/asm-x86_64/apic.h | 103 +
include/asm-x86_64/apicdef.h | 363 +++
include/asm-x86_64/atomic.h | 206 ++
include/asm-x86_64/bitops.h | 465 ++++
include/asm-x86_64/boot.h | 15 +
include/asm-x86_64/bootsetup.h | 34 +
include/asm-x86_64/bugs.h | 42 +
include/asm-x86_64/byteorder.h | 32 +
include/asm-x86_64/cache.h | 13 +
include/asm-x86_64/calling.h | 100 +
include/asm-x86_64/checksum.h | 158 ++
include/asm-x86_64/cpufeature.h | 73 +
include/asm-x86_64/current.h | 29 +
include/asm-x86_64/debugreg.h | 65 +
include/asm-x86_64/delay.h | 20 +
include/asm-x86_64/desc.h | 187 ++
include/asm-x86_64/div64.h | 14 +
include/asm-x86_64/dma.h | 298 +++
include/asm-x86_64/e820.h | 40 +
include/asm-x86_64/elf.h | 120 +
include/asm-x86_64/errno.h | 132 ++
include/asm-x86_64/fcntl.h | 79 +
include/asm-x86_64/fixmap.h | 105 +
include/asm-x86_64/floppy.h | 286 +++
include/asm-x86_64/hardirq.h | 93 +
include/asm-x86_64/hdreg.h | 12 +
include/asm-x86_64/hw_irq.h | 210 ++
include/asm-x86_64/i387.h | 87 +
include/asm-x86_64/ia32.h | 274 +++
include/asm-x86_64/ia32_unistd.h | 251 ++
include/asm-x86_64/ide.h | 128 +
include/asm-x86_64/init.h | 1 +
include/asm-x86_64/io.h | 270 +++
include/asm-x86_64/io_apic.h | 148 ++
include/asm-x86_64/ioctl.h | 75 +
include/asm-x86_64/ioctls.h | 82 +
include/asm-x86_64/ipc.h | 32 +
include/asm-x86_64/ipcbuf.h | 29 +
include/asm-x86_64/irq.h | 35 +
include/asm-x86_64/kdebug.h | 23 +
include/asm-x86_64/keyboard.h | 71 +
include/asm-x86_64/kmap_types.h | 14 +
include/asm-x86_64/ldt.h | 37 +
include/asm-x86_64/linux_logo.h | 29 +
include/asm-x86_64/locks.h | 135 ++
include/asm-x86_64/mc146818rtc.h | 29 +
include/asm-x86_64/mman.h | 39 +
include/asm-x86_64/mmu.h | 13 +
include/asm-x86_64/mmu_context.h | 95 +
include/asm-x86_64/mmx.h | 14 +
include/asm-x86_64/module.h | 12 +
include/asm-x86_64/mpspec.h | 188 ++
include/asm-x86_64/msgbuf.h | 27 +
include/asm-x86_64/msr.h | 140 ++
include/asm-x86_64/mtrr.h | 127 +
include/asm-x86_64/namei.h | 17 +
include/asm-x86_64/page.h | 118 +
include/asm-x86_64/param.h | 24 +
include/asm-x86_64/parport.h | 18 +
include/asm-x86_64/pci.h | 273 +++
include/asm-x86_64/pda.h | 79 +
include/asm-x86_64/pgalloc.h | 258 +++
include/asm-x86_64/pgtable.h | 399 ++++
include/asm-x86_64/poll.h | 25 +
include/asm-x86_64/posix_types.h | 116 +
include/asm-x86_64/prctl.h | 10 +
include/asm-x86_64/processor.h | 463 ++++
include/asm-x86_64/ptrace.h | 114 +
include/asm-x86_64/resource.h | 47 +
include/asm-x86_64/rwlock.h | 84 +
include/asm-x86_64/rwsem.h | 217 ++
include/asm-x86_64/scatterlist.h | 13 +
include/asm-x86_64/segment.h | 19 +
include/asm-x86_64/semaphore.h | 216 ++
include/asm-x86_64/sembuf.h | 25 +
include/asm-x86_64/serial.h | 133 ++
include/asm-x86_64/setup.h | 10 +
include/asm-x86_64/shmbuf.h | 38 +
include/asm-x86_64/shmparam.h | 6 +
include/asm-x86_64/sigcontext.h | 97 +
include/asm-x86_64/siginfo.h | 232 ++
include/asm-x86_64/signal.h | 205 ++
include/asm-x86_64/smp.h | 98 +
include/asm-x86_64/smplock.h | 95 +
include/asm-x86_64/socket.h | 64 +
include/asm-x86_64/socket32.h | 70 +
include/asm-x86_64/sockios.h | 12 +
include/asm-x86_64/softirq.h | 34 +
include/asm-x86_64/spinlock.h | 181 ++
include/asm-x86_64/stat.h | 27 +
include/asm-x86_64/statfs.h | 25 +
include/asm-x86_64/string.h | 38 +
include/asm-x86_64/system.h | 283 +++
include/asm-x86_64/termbits.h | 172 ++
include/asm-x86_64/termios.h | 106 +
include/asm-x86_64/thread_info.h | 116 +
include/asm-x86_64/timex.h | 51 +
include/asm-x86_64/tlb.h | 1 +
include/asm-x86_64/types.h | 47 +
include/asm-x86_64/uaccess.h | 373 +++
include/asm-x86_64/ucontext.h | 12 +
include/asm-x86_64/unaligned.h | 37 +
include/asm-x86_64/unistd.h | 653 ++++++
include/asm-x86_64/user.h | 113 +
include/asm-x86_64/user32.h | 58 +
include/asm-x86_64/vga.h | 20 +
include/asm-x86_64/vsyscall.h | 48 +
include/asm-x86_64/xor.h | 859 +++++++
192 files changed, 48138 insertions(+), 0 deletions(-)

2008-11-28 13:54:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 5/5] x86: entry_64.S - trivial: space, comments fixup


* Cyrill Gorcunov <[email protected]> wrote:

> [Ingo Molnar - Thu, Nov 27, 2008 at 01:02:34PM +0100]
> | > > From: Cyrill Gorcunov <[email protected]>
> | > >
> | > > Impact: cleanup
> | > >
> | > > Signed-off-by: Cyrill Gorcunov <[email protected]>
> | >
> | > Acked-by: Alexander van Heukelum <[email protected]>
> |
> | Cyrill, could you please rework the still relevant bits against latest
> | tip/master? Alexander's wider patch got in the way.
> |
> | Thanks,
> |
> | Ingo
> |
>
> Here is an updated version
> ---
> From: Cyrill Gorcunov <[email protected]>
> Subject: x86: entry_64.S - trivial: space, comments fixup
>
> Impact: cleanup

applied, thanks Cyrill!

Ingo

2008-11-28 18:45:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86: ret_from_fork - get rid of jump back

On Fri, Nov 28, 2008 at 02:47:54PM +0100, Ingo Molnar wrote:

Hi Ingo,

>
> * Andi Kleen <[email protected]> wrote:
>
> > > But it gets worse than that: checking for _TIF_SYSCALL_TRACE is
> > > completely unnecessary here because we clear that flag for every
> >
> > That's true. I found your 2005 changeset which did that, [...]
>
> that's irrelevant, because all the necessary TIF_ flag processing is
> already done in ret_from_sys_call.

Yes, that's true (as I wrote in my email), but it's just not dead because
the audit code will take it (or used to handle both audit and fork before
2005), just unnecessary because the later code checks it as well@). Ok it's
only a very subtle difference given.

It would have been nice if you had pruned at least the fork case
when you changed fork.c to always clear it, but that can be also
done now.

>
> The unnecessary TIF_SYSCALL_TRACE code in the 64-bit ret_from_fork was
> apparently added by you in 2002:

Yes, to be honest I don't remember why I did it this way. Most likely
the additional check was needed in some earlier iteration and then not pruned
away when it became unnecessary. I remember I had some ordering problems
with strace very early on it might have been related to that. Or perhaps
I just goofed up. In the current form it's certainly not great code.

Anyways I have no objections to removing it now, just one has to be
careful that the changed processing order (signals vs trace) -- which
is likely user visible -- won't break anything.

-Andi