> On May 14, 2021, at 12:46 AM, Andy Lutomirski <[email protected]> wrote:
>
>
>
> On Wed, May 12, 2021, at 11:33 AM, Dave Hansen wrote:
>> On 5/12/21 12:41 AM, Peter Zijlstra wrote:
>> > On Tue, May 11, 2021 at 01:05:02PM -0400, Jon Kohler wrote:
>> >> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
>> >> index 8d33ad80704f..5bc4df3a4c27 100644
>> >> --- a/arch/x86/include/asm/fpu/internal.h
>> >> +++ b/arch/x86/include/asm/fpu/internal.h
>> >> @@ -583,7 +583,13 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
>> >> if (pk)
>> >> pkru_val = pk->pkru;
>> >> }
>> >> - __write_pkru(pkru_val);
>> >> +
>> >> + /*
>> >> + * WRPKRU is relatively expensive compared to RDPKRU.
>> >> + * Avoid WRPKRU when it would not change the value.
>> >> + */
>> >> + if (pkru_val != rdpkru())
>> >> + wrpkru(pkru_val);
>> > Just wondering; why aren't we having that in a per-cpu variable? The
>> > usual per-cpu MSR shadow approach avoids issuing any 'special' ops
>> > entirely.
>>
>> It could be a per-cpu variable. When I wrote this originally I figured
>> that a rdpkru would be cheaper than a load from memory (even per-cpu
>> memory).
>>
>> But, now that I think about it, assuming that 'prku_val' is in %rdi, doing:
>>
>> cmp %gs:0x1234, %rdi
>>
>> might end up being cheaper than clobbering a *pair* of GPRs with rdpkru:
>>
>> xor %ecx,%ecx
>> rdpkru
>> cmp %rax, %rdi
>>
>> I'm too lazy to go figure out what would be faster in practice, though.
>> Does anyone care?
Strictly from a profiling perspective, my observation is that the rdpkru
is pretty quick, its the wrpkru that seems heavier under the covers, so
any speedup in rdpkru would likely go unnoticed by comparison. Now
that said if this per cpu variable would somehow get rid of the underlying
instruction and just emulate the whole thing, that might be interesting.
From an incremental change perspective though, this patch puts
us in a better spot, happy to take a look at future work if y’all have
some tips on top of this.
>
> RDPKRU gets bonus points for being impossible to get out of sync.