2015-06-03 13:59:04

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic

The reason for copying of %r8 to %rcx is quite non-obvious.
Add a comment which explains why it is done.

Fix indentation and trailing whitespace while at it.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---

This is a resend.

arch/x86/ia32/ia32entry.S | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 2ca052e..8e72256 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -562,9 +562,17 @@ GLOBAL(\label)

ALIGN
GLOBAL(stub32_clone)
- leaq sys_clone(%rip),%rax
+ leaq sys_clone(%rip), %rax
+ /*
+ * 32-bit clone API is clone(..., int tls_val, int *child_tidptr).
+ * 64-bit clone API is clone(..., int *child_tidptr, int tls_val).
+ * Native 64-bit kernel's sys_clone() implements the latter.
+ * We need to swap args here. But since tls_val is in fact ignored
+ * by sys_clone(), we can get away with an assignment
+ * (arg4 = arg5) instead of a full swap:
+ */
mov %r8, %rcx
- jmp ia32_ptregs_common
+ jmp ia32_ptregs_common

ALIGN
ia32_ptregs_common:
--
1.8.1.4


2015-06-03 13:59:13

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone

Really swap arguments #4 and #5 in stub32_clone instead of "optimizing"
it into a move.

Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit
more expensive than MOV. But a cycle or two on an expensive syscall like
clone() is way below noise floor, and this optimization is simply not worth
the obfuscation of logic.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Josh Triplett <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---

This is a resend.

There was a patch by Josh Triplett
"x86: Opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit"
sent on May 11,
which does the same thing as part of a bigger cleanup.
He was supportive of this patch because of comments.
He will simply have to drop one hunk from his patch.

arch/x86/ia32/ia32entry.S | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 8e72256..0c302d0 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -567,11 +567,9 @@ GLOBAL(stub32_clone)
* 32-bit clone API is clone(..., int tls_val, int *child_tidptr).
* 64-bit clone API is clone(..., int *child_tidptr, int tls_val).
* Native 64-bit kernel's sys_clone() implements the latter.
- * We need to swap args here. But since tls_val is in fact ignored
- * by sys_clone(), we can get away with an assignment
- * (arg4 = arg5) instead of a full swap:
+ * We need to swap args here:
*/
- mov %r8, %rcx
+ xchg %r8, %rcx
jmp ia32_ptregs_common

ALIGN
--
1.8.1.4

2015-06-03 16:39:15

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone

On Wed, Jun 03, 2015 at 03:58:50PM +0200, Denys Vlasenko wrote:
> Really swap arguments #4 and #5 in stub32_clone instead of "optimizing"
> it into a move.
>
> Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit
> more expensive than MOV. But a cycle or two on an expensive syscall like
> clone() is way below noise floor, and this optimization is simply not worth
> the obfuscation of logic.
[...]
> This is a resend.
>
> There was a patch by Josh Triplett
> "x86: Opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit"
> sent on May 11,
> which does the same thing as part of a bigger cleanup.
> He was supportive of this patch because of comments.
> He will simply have to drop one hunk from his patch.

Strictly speaking, nothing needs this until clone starts paying
attention to its tls argument, which only happens in my cleanup series
that includes this change. So what's the purpose of driving this patch
separately?

- Josh Triplett

2015-06-04 10:07:43

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone

On 06/03/2015 06:38 PM, Josh Triplett wrote:
> On Wed, Jun 03, 2015 at 03:58:50PM +0200, Denys Vlasenko wrote:
>> Really swap arguments #4 and #5 in stub32_clone instead of "optimizing"
>> it into a move.
>>
>> Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit
>> more expensive than MOV. But a cycle or two on an expensive syscall like
>> clone() is way below noise floor, and this optimization is simply not worth
>> the obfuscation of logic.
> [...]
>> This is a resend.
>>
>> There was a patch by Josh Triplett
>> "x86: Opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit"
>> sent on May 11,
>> which does the same thing as part of a bigger cleanup.
>> He was supportive of this patch because of comments.
>> He will simply have to drop one hunk from his patch.
>
> Strictly speaking, nothing needs this until clone starts paying
> attention to its tls argument, which only happens in my cleanup series
> that includes this change. So what's the purpose of driving this patch
> separately?

You wanted my comments in this patch to go in:

On 04/22/2015 07:10 PM, Josh Triplett wrote:
> I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
> this, but I'd like to see the final version of Denys' comment added on
> top of it (with an update to the type and name of the tls argument to
> match the changes to sys_clone).

If your patch will go in first, I'll send a patch adding only the comment.

Since for now your patch did not make it yet, I'm submitting
a patch which adds both a comment and the insn change.

--
vda

2015-06-04 15:58:54

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone

On Thu, Jun 04, 2015 at 12:07:31PM +0200, Denys Vlasenko wrote:
> On 06/03/2015 06:38 PM, Josh Triplett wrote:
> > On Wed, Jun 03, 2015 at 03:58:50PM +0200, Denys Vlasenko wrote:
> >> Really swap arguments #4 and #5 in stub32_clone instead of "optimizing"
> >> it into a move.
> >>
> >> Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit
> >> more expensive than MOV. But a cycle or two on an expensive syscall like
> >> clone() is way below noise floor, and this optimization is simply not worth
> >> the obfuscation of logic.
> > [...]
> >> This is a resend.
> >>
> >> There was a patch by Josh Triplett
> >> "x86: Opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit"
> >> sent on May 11,
> >> which does the same thing as part of a bigger cleanup.
> >> He was supportive of this patch because of comments.
> >> He will simply have to drop one hunk from his patch.
> >
> > Strictly speaking, nothing needs this until clone starts paying
> > attention to its tls argument, which only happens in my cleanup series
> > that includes this change. So what's the purpose of driving this patch
> > separately?
>
> You wanted my comments in this patch to go in:
>
> On 04/22/2015 07:10 PM, Josh Triplett wrote:
> > I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
> > this, but I'd like to see the final version of Denys' comment added on
> > top of it (with an update to the type and name of the tls argument to
> > match the changes to sys_clone).
>
> If your patch will go in first, I'll send a patch adding only the comment.
>
> Since for now your patch did not make it yet, I'm submitting
> a patch which adds both a comment and the insn change.

Ah, I see.

My two-patch series is currently sitting in -mm; would you consider
providing a version of the patch that adds the comment for Andrew to
apply on top of those?

- Josh Triplett

2015-06-05 11:44:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone


* Josh Triplett <[email protected]> wrote:

> On Thu, Jun 04, 2015 at 12:07:31PM +0200, Denys Vlasenko wrote:
> > On 06/03/2015 06:38 PM, Josh Triplett wrote:
> > > On Wed, Jun 03, 2015 at 03:58:50PM +0200, Denys Vlasenko wrote:
> > >> Really swap arguments #4 and #5 in stub32_clone instead of "optimizing"
> > >> it into a move.
> > >>
> > >> Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit
> > >> more expensive than MOV. But a cycle or two on an expensive syscall like
> > >> clone() is way below noise floor, and this optimization is simply not worth
> > >> the obfuscation of logic.
> > > [...]
> > >> This is a resend.
> > >>
> > >> There was a patch by Josh Triplett
> > >> "x86: Opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit"
> > >> sent on May 11,
> > >> which does the same thing as part of a bigger cleanup.
> > >> He was supportive of this patch because of comments.
> > >> He will simply have to drop one hunk from his patch.
> > >
> > > Strictly speaking, nothing needs this until clone starts paying
> > > attention to its tls argument, which only happens in my cleanup series
> > > that includes this change. So what's the purpose of driving this patch
> > > separately?
> >
> > You wanted my comments in this patch to go in:
> >
> > On 04/22/2015 07:10 PM, Josh Triplett wrote:
> > > I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
> > > this, but I'd like to see the final version of Denys' comment added on
> > > top of it (with an update to the type and name of the tls argument to
> > > match the changes to sys_clone).
> >
> > If your patch will go in first, I'll send a patch adding only the comment.
> >
> > Since for now your patch did not make it yet, I'm submitting
> > a patch which adds both a comment and the insn change.
>
> Ah, I see.
>
> My two-patch series is currently sitting in -mm; would you consider providing a
> version of the patch that adds the comment for Andrew to apply on top of those?

So because -mm is based on linux-next, and linux-next includes the x86 tree,
amongst them the x86/asm topic, the dependency is the other way around.

Unrelated to these changes to the clone stub there will be conflicts anyway due to
larger reorganization in the x86 assembly code: so I've picked up Denys's two
patches, thus your patch set in -mm will become a bit smaller in the end, because
it can rely on having these changes applied already.

Thanks,

Ingo

Subject: [tip:x86/asm] x86/asm/entry/32: Explain the stub32_clone logic

Commit-ID: 5cdc683b7d8b3341a3d18e0c5498bc1e4f3fb990
Gitweb: http://git.kernel.org/tip/5cdc683b7d8b3341a3d18e0c5498bc1e4f3fb990
Author: Denys Vlasenko <[email protected]>
AuthorDate: Wed, 3 Jun 2015 15:58:49 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 5 Jun 2015 13:41:27 +0200

x86/asm/entry/32: Explain the stub32_clone logic

The reason for copying of %r8 to %rcx is quite non-obvious.
Add a comment which explains why it is done.

Signed-off-by: Denys Vlasenko <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Drewry <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/entry/ia32entry.S | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/entry/ia32entry.S b/arch/x86/entry/ia32entry.S
index 4bb9f7b..d0c7b28 100644
--- a/arch/x86/entry/ia32entry.S
+++ b/arch/x86/entry/ia32entry.S
@@ -528,6 +528,14 @@ GLOBAL(\label)
ALIGN
GLOBAL(stub32_clone)
leaq sys_clone(%rip), %rax
+ /*
+ * 32-bit clone API is clone(..., int tls_val, int *child_tidptr).
+ * 64-bit clone API is clone(..., int *child_tidptr, int tls_val).
+ * Native 64-bit kernel's sys_clone() implements the latter.
+ * We need to swap args here. But since tls_val is in fact ignored
+ * by sys_clone(), we can get away with an assignment
+ * (arg4 = arg5) instead of a full swap:
+ */
mov %r8, %rcx
jmp ia32_ptregs_common

Subject: [tip:x86/asm] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone

Commit-ID: 7a5a9824c18f93415944c997dc6bb8eecfddd2e7
Gitweb: http://git.kernel.org/tip/7a5a9824c18f93415944c997dc6bb8eecfddd2e7
Author: Denys Vlasenko <[email protected]>
AuthorDate: Wed, 3 Jun 2015 15:58:50 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 5 Jun 2015 13:41:28 +0200

x86/asm/entry/32: Remove unnecessary optimization in stub32_clone

Really swap arguments #4 and #5 in stub32_clone instead of
"optimizing" it into a move.

Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a
little bit more expensive than MOV. But a cycle or two on an
expensive syscall like clone() is way below noise floor, and
this optimization is simply not worth the obfuscation of logic.

[ There's also ongoing work on the clone() ABI by Josh Triplett
that will depend on this change later on. ]

Signed-off-by: Denys Vlasenko <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Drewry <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/entry/ia32entry.S | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/ia32entry.S b/arch/x86/entry/ia32entry.S
index d0c7b28..9558dac 100644
--- a/arch/x86/entry/ia32entry.S
+++ b/arch/x86/entry/ia32entry.S
@@ -529,14 +529,13 @@ GLOBAL(\label)
GLOBAL(stub32_clone)
leaq sys_clone(%rip), %rax
/*
- * 32-bit clone API is clone(..., int tls_val, int *child_tidptr).
- * 64-bit clone API is clone(..., int *child_tidptr, int tls_val).
- * Native 64-bit kernel's sys_clone() implements the latter.
- * We need to swap args here. But since tls_val is in fact ignored
- * by sys_clone(), we can get away with an assignment
- * (arg4 = arg5) instead of a full swap:
+ * The 32-bit clone ABI is: clone(..., int tls_val, int *child_tidptr).
+ * The 64-bit clone ABI is: clone(..., int *child_tidptr, int tls_val).
+ *
+ * The native 64-bit kernel's sys_clone() implements the latter,
+ * so we need to swap arguments here before calling it:
*/
- mov %r8, %rcx
+ xchg %r8, %rcx
jmp ia32_ptregs_common

ALIGN