by Oliver Swede

[permalink] [raw]

Subject: Re: [PATCH v4 00/14] arm64: Optimise and update memcpy, user copy and string routines

Hi Will and Catalin,

Thank you both for having a look at this, the coupling between the
usercopy routines and the fixup routines was my main concern as well and
I understand the desire to take an approach that avoids this dependency
when it comes to importing new copy routines.

On 9/11/20 12:29 PM, Catalin Marinas wrote:
>
> I was a bit more optimistic about this series until I saw the
> copy_user_fixup.S changes (patches 12 to 14). I have a suspicion it's
> only Oli (and maybe Robin) who understands it, so from a maintainer's
> perspective it doesn't really scale. It's also very fragile with any
> minor tweak in the actual copy routine potentially breaking the fixup
> code.
>

I was wondering if it would be possible to check that an alternative
expression of the logic involved (e.g. diverted to a C function - if
technically feasible - that is easier to understand and modify) would
not really be desirable either?

On 9/11/20 12:29 PM, Catalin Marinas wrote:
> On Mon, Sep 07, 2020 at 11:10:03AM +0100, Will Deacon wrote:
>>
>> I'm inclined to say that cortex-strings is probably not a good basis for
>> our uaccess routines. The code needs to be adapted in a
non-straightforward
>> way so that we lose pretty much all of the benefits we'd usually get
from
>> adopted an existing implementation; we can't pull in fixes or
improvements
>> without a lot of manual effort, we can't reuse existing testing
infrastructure
>> (see below) and we end up being a "second-class" user of the routines
>> because of the discrepancies in implementation.
>
> I was a bit more optimistic about this series until I saw the
> copy_user_fixup.S changes (patches 12 to 14). I have a suspicion it's
> only Oli (and maybe Robin) who understands it, so from a maintainer's
> perspective it doesn't really scale. It's also very fragile with any
> minor tweak in the actual copy routine potentially breaking the fixup
> code.
>

An alternative approach to the fixup routines came up recently in
discussions; it could potentially enable the copy template to be
preserved in its current form and allow the uaccess routines to re-use a
shared copy algorithm (including cortex-strings and/or future optimized
out-of-order algorithms), and is based on one of Robin's suggestions
from a few weeks ago (I found some time to prototype this locally the
other day). The algorithm-specific backtracking would be dropped in
favour of first- and second-stage fixups, with each of these being
relatively short & straightforward, and (together) likely compatible
with future copy routines.
More specifically, within the fixup path invoked after the initial
fault, it could jump back N bytes and then begin an in-order byte-wise
copy from within the first fixup, and then intentionally fault for the
second time at either the same or another fault address. As the
intermediate scan-copy should continue right up to the 'true' fault
address, and ignore any out-of-order techniques used by the optimized
algorithm, the fixup for the second fault could then contain an
accurate, conclusive calculation of the return value.

A technique for this which would need to be carried forward from patch
11 in v4 would be to redirect the PC to a different 'intermediate' fixup
(though only two choices this time) depending on whether the faulting
instruction is a load or store, so it can establish if the minimum
address to jump back to should be src or buf when comparing to
(faddr-N). Additionally, for this the parameters to the usercopy
functions would still need to be stored on the stack (patch 10 in v4).
Otherwise, it would involve a slight rewrite of v4 that could provide a
more maintainable approach; the same fixups would be shared for all copy
sizes & fault addresses (versus in v4) and the only variable subject to
re-evaluation with new imports might be N (currently for copy sizes over
64B it is guaranteed anything below the previous 64B will have been
copied). It could also remove the need to special-case UAO systems when
compared to v4.

If it's possible for pages to be swapped out in the middle of a usercopy
function then this method might not provide the extra implicit knowledge
used by the fixups in v4 about the stage of the algorithm that has been
reached (deduced from the fixup that has been routed to, implied by the
fault-instruction-to-fixup mapping) to compensate for this, so it could
under-report the bytes copied if it faults at an earlier address during
the in-order (secondary) copy. This could be outweighed by the potential
advantages mentioned above, and I mainly include this as v4 should
account for this scenario (but perhaps unnecessarily if the scenario
isn't possible or too unlikely :) ), but as you rightly point out it
would be tiresome to re-write these routines.

On Mon, Sep 07, 2020 at 11:10:03AM +0100, Will Deacon wrote:
>
> So why don't we use cortex-strings as a basis for the in-kernel routines
> only, preferably in a form where the code can be used directly and
updated
> with a script (e.g. similar to how we pull in arch/arm64/crypto routines
> from OpenSSL). We can then roll our own uaccess routines, using a
slightly
> more straight-forward implementation which is more amenable to handling
> user faults and doesn't do things like over copying.

I was wondering if the proposal described above is something that you
would want to look into, or if the use of separate copy routines
(script-based or otherwise) for in-kernel/uaccess is the preferred approach?

On Mon, Sep 07, 2020 at 11:10:03AM +0100, Will Deacon wrote:
>
> Can we put this test module into the kernel source tree, please, maybe as
> part of lkdtm? Given the control flow of these optimised functions, I
think
> we absolutely need targeted testing to make sure we're getting complete
> coverage.

I will post the standalone module in its current state later today/early
next week, the tests are exhaustive but if it would be useful to include
a mechanism for specifying certain ranges of faults/copy sizes or
improve the error reporting, I'd be happy to add these features (and/or
integrate it into lkdtm).

Many thanks,
Oli

On 9/11/20 12:29 PM, Catalin Marinas wrote:
> On Mon, Sep 07, 2020 at 11:10:03AM +0100, Will Deacon wrote:
>> On Wed, Jul 01, 2020 at 09:12:49AM +0100, Oli Swede wrote:
>>>> Version 3 addressed this but I later found some issues with the fixup
>>>> correctness after further testing, and have partially re-written them
>>>> here, and addressed some other behaviours of the copy algorithm.
>>
>> [...]
>>
>>> I am waiting on access to the relevant machine before posting the benchmark
>>> results for this optimized memcpy, but Sam reported the following with the
>>> similar (but now slightly older) cortex-strings version:
>>> * copy_from_user: 13.17%
>>> * copy_to_user: 4.8%
>>> * memcpy: 27.88%
>>> * copy_in_user: Didn't appear in the test results.
>>> This machine will also be used to check the fixups are accurate on a system
>>> with UAO - they appear to be exact on a non-UAO system with PAN that I've
>>> been working on locally.
>>
>> I'm inclined to say that cortex-strings is probably not a good basis for
>> our uaccess routines. The code needs to be adapted in a non-straightforward
>> way so that we lose pretty much all of the benefits we'd usually get from
>> adopted an existing implementation; we can't pull in fixes or improvements
>> without a lot of manual effort, we can't reuse existing testing infrastructure
>> (see below) and we end up being a "second-class" user of the routines
>> because of the discrepancies in implementation.
>
> I was a bit more optimistic about this series until I saw the
> copy_user_fixup.S changes (patches 12 to 14). I have a suspicion it's
> only Oli (and maybe Robin) who understands it, so from a maintainer's
> perspective it doesn't really scale. It's also very fragile with any
> minor tweak in the actual copy routine potentially breaking the fixup
> code.
>
>> So why don't we use cortex-strings as a basis for the in-kernel routines
>> only, preferably in a form where the code can be used directly and updated
>> with a script (e.g. similar to how we pull in arch/arm64/crypto routines
>> from OpenSSL). We can then roll our own uaccess routines, using a slightly
>> more straight-forward implementation which is more amenable to handling
>> user faults and doesn't do things like over copying.
>
> I think that's probably the best option. I wouldn't mind replacing the
> in-kernel memcpy/strcpy etc. with these patches since the work was done
> already but definitely not for the uaccess and fixup routines (we still
> have the original implementation in the git log).
>
> A script would work even better. Do we have any issue with licensing
> though? Cortex Strings is BSD (3-clause IIRC) and copyright owned by
> Linaro. I got them to officially agree with relicensing (dual license)
> the latest copy under GPLv2 so that we can contribute it to the kernel.
> But since the project license is still BSD, any future updates in there
> are BSD-only.
>
> Maybe someone who understands this stuff can confirm that it's ok to
> regularly grab the Cortex Strings files into a GPLv2 codebase without
> asking for Linaro's permission.
>
> BTW, you could pick the kprobes patch in here, that explicit fixup call
> is not necessary.
>