Andrew Morton wrote ...
>It seems that movsl works acceptably with all alignments on AMD
>hardware, although this needs to be checked with more recent machines.
>movsl is a (bad) loss on PII and PIII for all alignments except 8&8.
>Don't know about P4 - I can test that in a day or two.
>I expect that a minimal, 90% solution would be just:
>fancy_copy_to_user(dst, src, count)
> if (arch_has_sane_movsl || ((dst|src) & 7) == 0)
> movsl_copy_to_user(dst, src, count);
> movl_copy_to_user(dst, src, count);
>#define fancy_copy_to_user copy_to_user
>and we really only need fancy_copy_to_user in a handful of
>places - the bulk copies in networking and filemap.c. For all
>the other call sites it's probably more important to keep the
>code footprint down than it is to squeeze the last few drops out
>of the copy speed.
>Mala Anand has done some work on this. See
><searches> Yes, I have a copy of Mala's patch here which works
>against 2.5.current. Mala's patch will cause quite an expansion
>of kernel size; we would need an implementation which did not
>use inlining. This work was discussed at OLS2002. See
I will move the code from uaccess.h (inline) to usercopy.c (routine)
and will post it soon. It is in my list of things to do.
IBM Linux Technology Center - Kernel Performance