Reduce protection faults count in copy_user_handle_tail routine by
limiting clear length to the end of page as was suggested by Linus.
Linus, should I add myself "signed-off-by: you" for patches with your
ideas implemented? Clear length calculation was changed a bit for
correct handling of page aligned addresses.
I'm using this systemtap script for sanity testing:
http://people.redhat.com/vmayatsk/copy_user_x8664/
It looks very ugly, but works.
Signed-off-by: Vitaly Mayatskikh <[email protected]>
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index f4df6e7..42baeca 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -161,23 +161,33 @@ EXPORT_SYMBOL(copy_in_user);
/*
* Try to copy last bytes and clear the rest if needed.
* Since protection fault in copy_from/to_user is not a normal situation,
- * it is not necessary to optimize tail handling.
+ * it is not necessary to do low level optimization of tail handling.
*/
unsigned long
-copy_user_handle_tail(char *to, char *from, unsigned len, unsigned zerorest)
+copy_user_handle_tail(char *to, char *from, unsigned len, unsigned clear_rest)
{
char c;
- unsigned zero_len;
+ unsigned clear_len;
- for (; len; --len) {
- if (__get_user_nocheck(c, from++, sizeof(char)))
- break;
- if (__put_user_nocheck(c, to++, sizeof(char)))
+ while (len) {
+ if (__get_user_nocheck(c, from, 1))
break;
+ from++;
+ if (__put_user_nocheck(c, to, 1))
+ /* Fault in destination, nothing to clear */
+ goto out;
+ to++;
+ len--;
}
- for (c = 0, zero_len = len; zerorest && zero_len; --zero_len)
- if (__put_user_nocheck(c, to++, sizeof(char)))
- break;
+ if (clear_rest) {
+ unsigned long addr = (unsigned long)to;
+ clear_len = len;
+ /* Limit clear_len to the rest of the page */
+ if ((addr + len) & PAGE_MASK != addr & PAGE_MASK)
+ clear_len = len - ((addr + len) & ~PAGE_MASK);
+ memset(to, 0, clear_len);
+ }
+out:
return len;
}
diff --git a/include/asm-x86/uaccess_64.h b/include/asm-x86/uaccess_64.h
index 5cfd295..c5c5af0 100644
--- a/include/asm-x86/uaccess_64.h
+++ b/include/asm-x86/uaccess_64.h
@@ -196,6 +196,6 @@ static inline int __copy_from_user_inatomic_nocache(void *dst,
}
unsigned long
-copy_user_handle_tail(char *to, char *from, unsigned len, unsigned zerorest);
+copy_user_handle_tail(char *to, char *from, unsigned len, unsigned clear_rest);
#endif /* ASM_X86__UACCESS_64_H */
--
wbr, Vitaly
On Mon, 28 Jul 2008, Vitaly Mayatskikh wrote:
>
> Reduce protection faults count in copy_user_handle_tail routine by
> limiting clear length to the end of page as was suggested by Linus.
No, you did it wrong.
The page _clearing_ has to be to the end of the copy. Yes, in practice, I
don't think the kernel actually very often does "copy_from_user()" calls
that are page crossers in the kernel, but they do happen.
It's the _copy_ part that needn't cross pages, because we know that
something faulted, and we _know_ that it wasn't the clearing.
Really: the clearing should be just a "memset()" with no limitations. It
can never fault without it being a serious bug, and then we should not fix
it up, but we should oops!
Linus
Linus Torvalds <[email protected]> writes:
> On Mon, 28 Jul 2008, Vitaly Mayatskikh wrote:
>>
>> Reduce protection faults count in copy_user_handle_tail routine by
>> limiting clear length to the end of page as was suggested by Linus.
>
> No, you did it wrong.
>
> The page _clearing_ has to be to the end of the copy. Yes, in practice, I
> don't think the kernel actually very often does "copy_from_user()" calls
> that are page crossers in the kernel, but they do happen.
>
> It's the _copy_ part that needn't cross pages, because we know that
> something faulted, and we _know_ that it wasn't the clearing.
>
> Really: the clearing should be just a "memset()" with no limitations. It
> can never fault without it being a serious bug, and then we should not fix
> it up, but we should oops!
Right. I will take your code discussed previously. The only question is:
what will be a good place for *_IS_USERSPACE and CLEAR_REMAINDER flags
definitions and assembly macro ALIGN_DESTINATION (common for
copy_user_64.S and copy_user_nocache_64.S)? I have a new file
include/asm-x86/copy_user.h at the moment, but not sure if you will
think it's ok to put that stuff here:
include/asm-x86/copy_user.h:
#ifndef _COPY_USER_H
#define _COPY_USER_H
/* Flags for copy_user_handle_tail */
#define CLEAR_REMAINDER 1
#define DEST_IS_USERSPACE 2
#define SOURCE_IS_USERSPACE 4
/* Align destination */
#define FIX_ALIGNMENT 1
#define BYTES_LEFT_IN_PAGE(ptr) \
(unsigned long)((PAGE_MASK & ((long)(ptr) + PAGE_SIZE)) - (long)(ptr))
#ifdef __ASSEMBLY__
.macro ALIGN_DESTINATION
...
#endif /* FIX_ALIGNMENT */
.endm
#endif /* __ASSEMBLY__ */
#endif /* _COPY_USER_H */
--
wbr, Vitaly
Linus Torvalds <[email protected]> writes:
> On Mon, 28 Jul 2008, Vitaly Mayatskikh wrote:
>>
>> Reduce protection faults count in copy_user_handle_tail routine by
>> limiting clear length to the end of page as was suggested by Linus.
>
> No, you did it wrong.
Another try. Added direction and clear remainder flags to let
handle tail routine know how to optimize tail copying and clearing.
BYTES_LEFT_IN_PAGE macro returns PAGE_SIZE, not zero, when the address
is well aligned to page.
Signed-off-by: Vitaly Mayatskikh <[email protected]>
diff --git a/include/asm-x86/uaccess_64.h b/include/asm-x86/uaccess_64.h
index 5cfd295..e0ddedf 100644
--- a/include/asm-x86/uaccess_64.h
+++ b/include/asm-x86/uaccess_64.h
@@ -1,6 +1,16 @@
#ifndef ASM_X86__UACCESS_64_H
#define ASM_X86__UACCESS_64_H
+/* Flags for copy_user_handle_tail */
+#define CLEAR_REMAINDER 1
+#define DEST_IS_USERSPACE 2
+#define SOURCE_IS_USERSPACE 4
+
+#define BYTES_LEFT_IN_PAGE(ptr) \
+ (unsigned)((PAGE_MASK & ((long)(ptr) + PAGE_SIZE)) - (long)(ptr))
+
+#ifndef __ASSEMBLY__
+
/*
* User space memory access functions
*/
@@ -179,23 +189,26 @@ __copy_to_user_inatomic(void __user *dst, const void *src, unsigned size)
}
extern long __copy_user_nocache(void *dst, const void __user *src,
- unsigned size, int zerorest);
+ unsigned size, unsigned flags);
static inline int __copy_from_user_nocache(void *dst, const void __user *src,
unsigned size)
{
might_sleep();
- return __copy_user_nocache(dst, src, size, 1);
+ return __copy_user_nocache(dst, src, size, SOURCE_IS_USERSPACE
+ | CLEAR_REMAINDER);
}
static inline int __copy_from_user_inatomic_nocache(void *dst,
const void __user *src,
unsigned size)
{
- return __copy_user_nocache(dst, src, size, 0);
+ return __copy_user_nocache(dst, src, size, SOURCE_IS_USERSPACE);
}
unsigned long
-copy_user_handle_tail(char *to, char *from, unsigned len, unsigned zerorest);
+copy_user_handle_tail(char *dst, char *src, unsigned remainder, unsigned flags);
+
+#endif /* __ASSEMBLY__ */
#endif /* ASM_X86__UACCESS_64_H */
--
wbr, Vitaly
Linus Torvalds <[email protected]> writes:
> On Mon, 28 Jul 2008, Vitaly Mayatskikh wrote:
>>
>> Reduce protection faults count in copy_user_handle_tail routine by
>> limiting clear length to the end of page as was suggested by Linus.
>
> No, you did it wrong.
Optimize tail handling with regarding to flags passed to handler
routine.
Signed-off-by: Vitaly Mayatskikh <[email protected]>
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index f4df6e7..d793900 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -161,23 +161,32 @@ EXPORT_SYMBOL(copy_in_user);
/*
* Try to copy last bytes and clear the rest if needed.
* Since protection fault in copy_from/to_user is not a normal situation,
- * it is not necessary to optimize tail handling.
+ * it is not necessary to do low level optimization of tail handling.
*/
unsigned long
-copy_user_handle_tail(char *to, char *from, unsigned len, unsigned zerorest)
+copy_user_handle_tail(char *dst, char *src, unsigned remainder, unsigned flags)
{
char c;
- unsigned zero_len;
+ unsigned max_copy = remainder;
- for (; len; --len) {
- if (__get_user_nocheck(c, from++, sizeof(char)))
+ /* Don't even bother trying to cross a page in user space! */
+ if (flags & DEST_IS_USERSPACE)
+ max_copy = min(max_copy, BYTES_LEFT_IN_PAGE(dst));
+ if (flags & SOURCE_IS_USERSPACE)
+ max_copy = min(max_copy, BYTES_LEFT_IN_PAGE(src));
+
+ while (max_copy--) {
+ if (__get_user(c, src))
break;
- if (__put_user_nocheck(c, to++, sizeof(char)))
+ if (__put_user(c, dst))
break;
+ src++;
+ dst++;
+ remainder--;
}
- for (c = 0, zero_len = len; zerorest && zero_len; --zero_len)
- if (__put_user_nocheck(c, to++, sizeof(char)))
- break;
- return len;
+ if (flags & CLEAR_REMAINDER)
+ memset(dst, 0, remainder);
+
+ return remainder;
}
--
wbr, Vitaly
Linus Torvalds <[email protected]> writes:
> On Mon, 28 Jul 2008, Vitaly Mayatskikh wrote:
>>
>> Reduce protection faults count in copy_user_handle_tail routine by
>> limiting clear length to the end of page as was suggested by Linus.
>
> No, you did it wrong.
Pass direction and clear remainder flags to tail handling routine.
Fixed typos with using register r8d instead of ecx. Testl instruction
used for zero checks instead of andl as it generates more effective
microcode.
Signed-off-by: Vitaly Mayatskikh <[email protected]>
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index dfdf428..4fbd9d5 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -9,12 +9,14 @@
#include <linux/linkage.h>
#include <asm/dwarf2.h>
+/* Align destination */
#define FIX_ALIGNMENT 1
#include <asm/current.h>
#include <asm/asm-offsets.h>
#include <asm/thread_info.h>
#include <asm/cpufeature.h>
+#include <asm/uaccess_64.h>
.macro ALTERNATIVE_JUMP feature,orig,alt
0:
@@ -44,15 +46,16 @@
subl $8,%ecx
negl %ecx
subl %ecx,%edx
-100: movb (%rsi),%al
-101: movb %al,(%rdi)
+100: movb (%rsi),%ah /* al is flags */
+101: movb %ah,(%rdi)
incq %rsi
incq %rdi
decl %ecx
jnz 100b
102:
.section .fixup,"ax"
-103: addl %r8d,%edx /* ecx is zerorest also */
+103: addl %ecx,%edx
+ movl %eax,%ecx
jmp copy_user_handle_tail
.previous
@@ -61,7 +64,7 @@
.quad 100b,103b
.quad 101b,103b
.previous
-#endif
+#endif /* FIX_ALIGNMENT */
.endm
/* Standard copy_to_user with segment limit checking */
@@ -73,6 +76,7 @@ ENTRY(copy_to_user)
jc bad_to_user
cmpq TI_addr_limit(%rax),%rcx
jae bad_to_user
+ movl $(DEST_IS_USERSPACE),%ecx
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
@@ -85,18 +89,21 @@ ENTRY(copy_from_user)
jc bad_from_user
cmpq TI_addr_limit(%rax),%rcx
jae bad_from_user
+ movl $(SOURCE_IS_USERSPACE | CLEAR_REMAINDER),%ecx
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
ENDPROC(copy_from_user)
ENTRY(copy_user_generic)
CFI_STARTPROC
+ movl $(SOURCE_IS_USERSPACE | DEST_IS_USERSPACE),%ecx
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
ENDPROC(copy_user_generic)
ENTRY(__copy_from_user_inatomic)
CFI_STARTPROC
+ movl $(SOURCE_IS_USERSPACE),%ecx
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
ENDPROC(__copy_from_user_inatomic)
@@ -125,13 +132,15 @@ ENDPROC(bad_from_user)
* Input:
* rdi destination
* rsi source
- * rdx count
+ * edx count
+ * ecx flags
*
* Output:
* eax uncopied bytes or 0 if successfull.
*/
ENTRY(copy_user_generic_unrolled)
CFI_STARTPROC
+ movl %ecx,%eax /* save flags in eax */
cmpl $8,%edx
jb 20f /* less then 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -169,11 +178,11 @@ ENTRY(copy_user_generic_unrolled)
leaq 8(%rdi),%rdi
decl %ecx
jnz 18b
-20: andl %edx,%edx
+20: testl %edx,%edx
jz 23f
movl %edx,%ecx
-21: movb (%rsi),%al
-22: movb %al,(%rdi)
+21: movb (%rsi),%ah /* al is flags */
+22: movb %ah,(%rdi)
incq %rsi
incq %rdi
decl %ecx
@@ -188,7 +197,8 @@ ENTRY(copy_user_generic_unrolled)
40: lea (%rdx,%rcx,8),%rdx
jmp 60f
50: movl %ecx,%edx
-60: jmp copy_user_handle_tail /* ecx is zerorest also */
+60: movl %eax,%ecx /* get flags back to ecx*/
+ jmp copy_user_handle_tail
.previous
.section __ex_table,"a"
@@ -230,15 +240,15 @@ ENDPROC(copy_user_generic_unrolled)
* Input:
* rdi destination
* rsi source
- * rdx count
+ * edx count
+ * ecx flags
*
* Output:
* eax uncopied bytes or 0 if successful.
*/
ENTRY(copy_user_generic_string)
CFI_STARTPROC
- andl %edx,%edx
- jz 4f
+ movl %ecx,%eax /* save flags */
cmpl $8,%edx
jb 2f /* less than 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -250,12 +260,13 @@ ENTRY(copy_user_generic_string)
2: movl %edx,%ecx
3: rep
movsb
-4: xorl %eax,%eax
+ xorl %eax,%eax
ret
.section .fixup,"ax"
11: lea (%rdx,%rcx,8),%rcx
-12: movl %ecx,%edx /* ecx is zerorest also */
+12: movl %ecx,%edx
+ movl %eax,%ecx /* get flags back to ecx */
jmp copy_user_handle_tail
.previous
diff --git a/arch/x86/lib/copy_user_nocache_64.S b/arch/x86/lib/copy_user_nocache_64.S
index 40e0e30..8703ccd 100644
--- a/arch/x86/lib/copy_user_nocache_64.S
+++ b/arch/x86/lib/copy_user_nocache_64.S
@@ -9,11 +9,13 @@
#include <linux/linkage.h>
#include <asm/dwarf2.h>
+/* Align destination */
#define FIX_ALIGNMENT 1
#include <asm/current.h>
#include <asm/asm-offsets.h>
#include <asm/thread_info.h>
+#include <asm/uaccess_64.h>
.macro ALIGN_DESTINATION
#ifdef FIX_ALIGNMENT
@@ -24,15 +26,16 @@
subl $8,%ecx
negl %ecx
subl %ecx,%edx
-100: movb (%rsi),%al
-101: movb %al,(%rdi)
+100: movb (%rsi),%ah /* al is flags */
+101: movb %ah,(%rdi)
incq %rsi
incq %rdi
decl %ecx
jnz 100b
102:
.section .fixup,"ax"
-103: addl %r8d,%edx /* ecx is zerorest also */
+103: addl %ecx,%edx
+ movl %eax,%ecx
jmp copy_user_handle_tail
.previous
@@ -41,7 +44,7 @@
.quad 100b,103b
.quad 101b,103b
.previous
-#endif
+#endif /* FIX_ALIGNMENT */
.endm
/*
@@ -50,6 +53,7 @@
*/
ENTRY(__copy_user_nocache)
CFI_STARTPROC
+ movl %ecx,%eax /* save flags in eax */
cmpl $8,%edx
jb 20f /* less then 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -87,11 +91,11 @@ ENTRY(__copy_user_nocache)
leaq 8(%rdi),%rdi
decl %ecx
jnz 18b
-20: andl %edx,%edx
+20: testl %edx,%edx
jz 23f
movl %edx,%ecx
-21: movb (%rsi),%al
-22: movb %al,(%rdi)
+21: movb (%rsi),%ah /* al is flags */
+22: movb %ah,(%rdi)
incq %rsi
incq %rdi
decl %ecx
@@ -108,7 +112,7 @@ ENTRY(__copy_user_nocache)
jmp 60f
50: movl %ecx,%edx
60: sfence
- movl %r8d,%ecx
+ movl %eax,%ecx /* get flags back to ecx*/
jmp copy_user_handle_tail
.previous
--
wbr, Vitaly
On Wed, 30 Jul 2008, Vitaly Mayatskikh wrote:
>
> Another try.
Ok, this is starting to look more reasonable. But you cannot split things
up like this per-file, because the end result doesn't _work_ with the
changes separated.
> BYTES_LEFT_IN_PAGE macro returns PAGE_SIZE, not zero, when the address
> is well aligned to page.
Hmm. Why? If the address is aligned, then we shouldn't even tro to copy
any more, should we? We know we got a fault - and regardless of whether it
was because of some offset off the base pointer or not, if the base
pointer was at offset zero, it's going to be in the same page. So why try
to do an operation we know will fault again?
Also, that's a rather inefficient way to do it, isn't it? Maybe the
compiler can figure it out, but the efficient code would be just
PAGE_SIZE - ((PAGE_SIZE-1) &(unsigned long)ptr)
no? That said, exactly because I think we shouldn't even bother to try to
fix up faults that happened at the beginning of a page, I think the right
one is the one I think I posted originally, ie the one that does just
#define BYTES_LEFT_IN_PAGE(ptr) \
(unsigned int)((PAGE_SIZE-1) & -(long)(ptr))
which is a bit simpler (well, it requires some thought to know why it
works, but it generates good code).
In case you wonder why it works, the operation we _want_ do do is
(PAGE_SIZE - offset-in-page) mod PAGE_SIZE
but subtraction is "stable" in modulus calculus (*), so you can write that
as
(PAGE_SIZE mod PAGE_SIZE - offset-in-page) mod PAGE_SIZE
which is just
(0 - (ptr mod PAGE_SIZE)) mod PAGE_SIZE
but again, subtraction is stable in modulus, so you can write that as
(0 - ptr) mod PAGE_SIZE
and so the result is literally just those single 'neg' and 'and'
instructions (in the macro, you then need all the casting and the
parenthesis, which is why it gets ugly again)
And yes, maybe the compiler figures it all out, but judging by past
experience, things often don't work that well.
Linus
(*) Yeah, in math, it's stable in general, in 2's complement arithmetic
it's only stable in mod 2^n, I guess.
Linus Torvalds <[email protected]> writes:
> Ok, this is starting to look more reasonable. But you cannot split things
> up like this per-file, because the end result doesn't _work_ with the
> changes separated.
>
>> BYTES_LEFT_IN_PAGE macro returns PAGE_SIZE, not zero, when the address
>> is well aligned to page.
>
> Hmm. Why? If the address is aligned, then we shouldn't even tro to copy
> any more, should we? We know we got a fault - and regardless of whether it
> was because of some offset off the base pointer or not, if the base
> pointer was at offset zero, it's going to be in the same page. So why try
> to do an operation we know will fault again?
Yes, I see. I was messed, sorry. Your BYTES_LEFT_IN_PAGE works very
well, tested.
Signed-off-by: Vitaly Mayatskikh <[email protected]>
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index dfdf428..4fbd9d5 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -9,12 +9,14 @@
#include <linux/linkage.h>
#include <asm/dwarf2.h>
+/* Align destination */
#define FIX_ALIGNMENT 1
#include <asm/current.h>
#include <asm/asm-offsets.h>
#include <asm/thread_info.h>
#include <asm/cpufeature.h>
+#include <asm/uaccess_64.h>
.macro ALTERNATIVE_JUMP feature,orig,alt
0:
@@ -44,15 +46,16 @@
subl $8,%ecx
negl %ecx
subl %ecx,%edx
-100: movb (%rsi),%al
-101: movb %al,(%rdi)
+100: movb (%rsi),%ah /* al is flags */
+101: movb %ah,(%rdi)
incq %rsi
incq %rdi
decl %ecx
jnz 100b
102:
.section .fixup,"ax"
-103: addl %r8d,%edx /* ecx is zerorest also */
+103: addl %ecx,%edx
+ movl %eax,%ecx
jmp copy_user_handle_tail
.previous
@@ -61,7 +64,7 @@
.quad 100b,103b
.quad 101b,103b
.previous
-#endif
+#endif /* FIX_ALIGNMENT */
.endm
/* Standard copy_to_user with segment limit checking */
@@ -73,6 +76,7 @@ ENTRY(copy_to_user)
jc bad_to_user
cmpq TI_addr_limit(%rax),%rcx
jae bad_to_user
+ movl $(DEST_IS_USERSPACE),%ecx
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
@@ -85,18 +89,21 @@ ENTRY(copy_from_user)
jc bad_from_user
cmpq TI_addr_limit(%rax),%rcx
jae bad_from_user
+ movl $(SOURCE_IS_USERSPACE | CLEAR_REMAINDER),%ecx
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
ENDPROC(copy_from_user)
ENTRY(copy_user_generic)
CFI_STARTPROC
+ movl $(SOURCE_IS_USERSPACE | DEST_IS_USERSPACE),%ecx
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
ENDPROC(copy_user_generic)
ENTRY(__copy_from_user_inatomic)
CFI_STARTPROC
+ movl $(SOURCE_IS_USERSPACE),%ecx
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
ENDPROC(__copy_from_user_inatomic)
@@ -125,13 +132,15 @@ ENDPROC(bad_from_user)
* Input:
* rdi destination
* rsi source
- * rdx count
+ * edx count
+ * ecx flags
*
* Output:
* eax uncopied bytes or 0 if successfull.
*/
ENTRY(copy_user_generic_unrolled)
CFI_STARTPROC
+ movl %ecx,%eax /* save flags in eax */
cmpl $8,%edx
jb 20f /* less then 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -169,11 +178,11 @@ ENTRY(copy_user_generic_unrolled)
leaq 8(%rdi),%rdi
decl %ecx
jnz 18b
-20: andl %edx,%edx
+20: testl %edx,%edx
jz 23f
movl %edx,%ecx
-21: movb (%rsi),%al
-22: movb %al,(%rdi)
+21: movb (%rsi),%ah /* al is flags */
+22: movb %ah,(%rdi)
incq %rsi
incq %rdi
decl %ecx
@@ -188,7 +197,8 @@ ENTRY(copy_user_generic_unrolled)
40: lea (%rdx,%rcx,8),%rdx
jmp 60f
50: movl %ecx,%edx
-60: jmp copy_user_handle_tail /* ecx is zerorest also */
+60: movl %eax,%ecx /* get flags back to ecx*/
+ jmp copy_user_handle_tail
.previous
.section __ex_table,"a"
@@ -230,15 +240,15 @@ ENDPROC(copy_user_generic_unrolled)
* Input:
* rdi destination
* rsi source
- * rdx count
+ * edx count
+ * ecx flags
*
* Output:
* eax uncopied bytes or 0 if successful.
*/
ENTRY(copy_user_generic_string)
CFI_STARTPROC
- andl %edx,%edx
- jz 4f
+ movl %ecx,%eax /* save flags */
cmpl $8,%edx
jb 2f /* less than 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -250,12 +260,13 @@ ENTRY(copy_user_generic_string)
2: movl %edx,%ecx
3: rep
movsb
-4: xorl %eax,%eax
+ xorl %eax,%eax
ret
.section .fixup,"ax"
11: lea (%rdx,%rcx,8),%rcx
-12: movl %ecx,%edx /* ecx is zerorest also */
+12: movl %ecx,%edx
+ movl %eax,%ecx /* get flags back to ecx */
jmp copy_user_handle_tail
.previous
diff --git a/arch/x86/lib/copy_user_nocache_64.S b/arch/x86/lib/copy_user_nocache_64.S
index 40e0e30..8703ccd 100644
--- a/arch/x86/lib/copy_user_nocache_64.S
+++ b/arch/x86/lib/copy_user_nocache_64.S
@@ -9,11 +9,13 @@
#include <linux/linkage.h>
#include <asm/dwarf2.h>
+/* Align destination */
#define FIX_ALIGNMENT 1
#include <asm/current.h>
#include <asm/asm-offsets.h>
#include <asm/thread_info.h>
+#include <asm/uaccess_64.h>
.macro ALIGN_DESTINATION
#ifdef FIX_ALIGNMENT
@@ -24,15 +26,16 @@
subl $8,%ecx
negl %ecx
subl %ecx,%edx
-100: movb (%rsi),%al
-101: movb %al,(%rdi)
+100: movb (%rsi),%ah /* al is flags */
+101: movb %ah,(%rdi)
incq %rsi
incq %rdi
decl %ecx
jnz 100b
102:
.section .fixup,"ax"
-103: addl %r8d,%edx /* ecx is zerorest also */
+103: addl %ecx,%edx
+ movl %eax,%ecx
jmp copy_user_handle_tail
.previous
@@ -41,7 +44,7 @@
.quad 100b,103b
.quad 101b,103b
.previous
-#endif
+#endif /* FIX_ALIGNMENT */
.endm
/*
@@ -50,6 +53,7 @@
*/
ENTRY(__copy_user_nocache)
CFI_STARTPROC
+ movl %ecx,%eax /* save flags in eax */
cmpl $8,%edx
jb 20f /* less then 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -87,11 +91,11 @@ ENTRY(__copy_user_nocache)
leaq 8(%rdi),%rdi
decl %ecx
jnz 18b
-20: andl %edx,%edx
+20: testl %edx,%edx
jz 23f
movl %edx,%ecx
-21: movb (%rsi),%al
-22: movb %al,(%rdi)
+21: movb (%rsi),%ah /* al is flags */
+22: movb %ah,(%rdi)
incq %rsi
incq %rdi
decl %ecx
@@ -108,7 +112,7 @@ ENTRY(__copy_user_nocache)
jmp 60f
50: movl %ecx,%edx
60: sfence
- movl %r8d,%ecx
+ movl %eax,%ecx /* get flags back to ecx*/
jmp copy_user_handle_tail
.previous
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index f4df6e7..d793900 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -161,23 +161,32 @@ EXPORT_SYMBOL(copy_in_user);
/*
* Try to copy last bytes and clear the rest if needed.
* Since protection fault in copy_from/to_user is not a normal situation,
- * it is not necessary to optimize tail handling.
+ * it is not necessary to do low level optimization of tail handling.
*/
unsigned long
-copy_user_handle_tail(char *to, char *from, unsigned len, unsigned zerorest)
+copy_user_handle_tail(char *dst, char *src, unsigned remainder, unsigned flags)
{
char c;
- unsigned zero_len;
+ unsigned max_copy = remainder;
- for (; len; --len) {
- if (__get_user_nocheck(c, from++, sizeof(char)))
+ /* Don't even bother trying to cross a page in user space! */
+ if (flags & DEST_IS_USERSPACE)
+ max_copy = min(max_copy, BYTES_LEFT_IN_PAGE(dst));
+ if (flags & SOURCE_IS_USERSPACE)
+ max_copy = min(max_copy, BYTES_LEFT_IN_PAGE(src));
+
+ while (max_copy--) {
+ if (__get_user(c, src))
break;
- if (__put_user_nocheck(c, to++, sizeof(char)))
+ if (__put_user(c, dst))
break;
+ src++;
+ dst++;
+ remainder--;
}
- for (c = 0, zero_len = len; zerorest && zero_len; --zero_len)
- if (__put_user_nocheck(c, to++, sizeof(char)))
- break;
- return len;
+ if (flags & CLEAR_REMAINDER)
+ memset(dst, 0, remainder);
+
+ return remainder;
}
diff --git a/include/asm-x86/uaccess_64.h b/include/asm-x86/uaccess_64.h
index 5cfd295..ea042f8 100644
--- a/include/asm-x86/uaccess_64.h
+++ b/include/asm-x86/uaccess_64.h
@@ -1,6 +1,16 @@
#ifndef ASM_X86__UACCESS_64_H
#define ASM_X86__UACCESS_64_H
+/* Flags for copy_user_handle_tail */
+#define CLEAR_REMAINDER 1
+#define DEST_IS_USERSPACE 2
+#define SOURCE_IS_USERSPACE 4
+
+#define BYTES_LEFT_IN_PAGE(ptr) \
+ (unsigned int)((PAGE_SIZE-1) & -(long)(ptr))
+
+#ifndef __ASSEMBLY__
+
/*
* User space memory access functions
*/
@@ -179,23 +189,26 @@ __copy_to_user_inatomic(void __user *dst, const void *src, unsigned size)
}
extern long __copy_user_nocache(void *dst, const void __user *src,
- unsigned size, int zerorest);
+ unsigned size, unsigned flags);
static inline int __copy_from_user_nocache(void *dst, const void __user *src,
unsigned size)
{
might_sleep();
- return __copy_user_nocache(dst, src, size, 1);
+ return __copy_user_nocache(dst, src, size, SOURCE_IS_USERSPACE
+ | CLEAR_REMAINDER);
}
static inline int __copy_from_user_inatomic_nocache(void *dst,
const void __user *src,
unsigned size)
{
- return __copy_user_nocache(dst, src, size, 0);
+ return __copy_user_nocache(dst, src, size, SOURCE_IS_USERSPACE);
}
unsigned long
-copy_user_handle_tail(char *to, char *from, unsigned len, unsigned zerorest);
+copy_user_handle_tail(char *dst, char *src, unsigned remainder, unsigned flags);
+
+#endif /* __ASSEMBLY__ */
#endif /* ASM_X86__UACCESS_64_H */
--
wbr, Vitaly
On Thu, 31 Jul 2008, Vitaly Mayatskikh wrote:
>
> Yes, I see. I was messed, sorry. Your BYTES_LEFT_IN_PAGE works very
> well, tested.
>
> Signed-off-by: Vitaly Mayatskikh <[email protected]>
Ok, this latest version _looks_ ok. How well tested is it? I'm a bit
nervous by now. Those earlier versions that got committed had all those
wrone registers etc.
Anyway, Ack as far as I'm concerned, but..
Linus
Linus Torvalds <[email protected]> writes:
>> Yes, I see. I was messed, sorry. Your BYTES_LEFT_IN_PAGE works very
>> well, tested.
>>
>> Signed-off-by: Vitaly Mayatskikh <[email protected]>
>
> Ok, this latest version _looks_ ok. How well tested is it? I'm a bit
> nervous by now. Those earlier versions that got committed had all those
> wrone registers etc.
>
> Anyway, Ack as far as I'm concerned, but..
Well, I have sanity test for it (at http://people.redhat.com/vmayatsk/copy_user_x8664/),
but unfortunately it shows what I want to see, not the truth. I'm going
to make a large regression testing next week.
--
wbr, Vitaly
On Sat, 2 Aug 2008, Vitaly Mayatskikh wrote:
>
> Well, I have sanity test for it (at http://people.redhat.com/vmayatsk/copy_user_x8664/),
> but unfortunately it shows what I want to see, not the truth. I'm going
> to make a large regression testing next week.
I was actually thinking more along the lines of actually doing a
test-suite with the few most relevant system calls (to catch the few cases
that matter), and judicious use of mmap/munmap to make sure to trigger it
in-kernel.
It probably doesn't really need all that many system calls to trigger all
the relevant paths. A "read()" should trigger the "to_user()" case, and a
write() to a file should trigger the 'from_user_nocache()" case. And while
the "from_user()" case with zeroing migth be harder to see (because all
_normal_ users should also look at the error case and return EFAULT), I
think there are a few cases where we just depend on the zeroing.
Doing a
git grep ' copy_from_user('
(that's a tab in that grep thing) shows at least the termios code doing
it, for example. But fewer cases than I expected.
Linus
Linus Torvalds <[email protected]> writes:
> I was actually thinking more along the lines of actually doing a
> test-suite with the few most relevant system calls (to catch the few cases
> that matter), and judicious use of mmap/munmap to make sure to trigger it
> in-kernel.
Well, my test is a kernel module (with help of systemtap). It tests
copy_from_user, copy_to_user, copy_user_generic_unrolled,
copy_user_generic_string and __copy_user_nocache directly. Test
allocates a page in kernel space and uses a page in user space (this is
for range checks in copy_from/to_user) and tries to make a fault in
every possible place of these routines. I don't know how to get a
userspace page from kernel module easily, and did a dirty hack with
available pages and sys_mprotect. However, it works.
> It probably doesn't really need all that many system calls to trigger all
> the relevant paths. A "read()" should trigger the "to_user()" case, and a
> write() to a file should trigger the 'from_user_nocache()" case. And while
> the "from_user()" case with zeroing migth be harder to see (because all
> _normal_ users should also look at the error case and return EFAULT), I
> think there are a few cases where we just depend on the zeroing.
I found it hard to do a good testing for copy_user from user space
program and have used systemtap.
> Doing a
>
> git grep ' copy_from_user('
>
> (that's a tab in that grep thing) shows at least the termios code doing
> it, for example. But fewer cases than I expected.
>
> Linus
It might be a good idea to find such potentially faulty places.
--
wbr, Vitaly