2023-10-20 00:44:23

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event


On 20/10/23 11:13, Sean Christopherson wrote:
> On Fri, Oct 20, 2023, Alexey Kardashevskiy wrote:
>>
>> On 20/10/23 01:57, Sean Christopherson wrote:
>>> On Thu, Oct 19, 2023, Alexey Kardashevskiy wrote:
>>>>> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
>>>>> return 0;
>>>>> }
>>>>
>>>> This should work the KVM stored certs nicely but not for the global certs.
>>>> Although I am not all convinced that global certs is all that valuable but I
>>>> do not know the history of that, happened before I joined so I let others to
>>>> comment on that. Thanks,
>>>
>>> Aren't the global certs provided by userspace too though? If all certs are
>>> ultimately controlled by userspace, I don't see any reason to make the kernel a
>>> middle-man.
>>
>> The max blob size is 32KB or so and for 200 VMs it is:
>
> Not according to include/linux/psp-sev.h:
>
> #define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
>
> Ugh, and I see in another patch:
>
> Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
> for an extra certificate.
>
> -#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
> +#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */
>
> That's gross and just asking for ABI problems, because then there's this:
>
> +::
> +
> + struct kvm_sev_snp_set_certs {
> + __u64 certs_uaddr;
> + __u64 certs_len
> + };
> +
> +The certs_len field may not exceed SEV_FW_BLOB_MAX_SIZE.
>
>> - 6.5MB, all in the userspace so swappable vs
>> - 32KB but in the kernel so not swappable.
>> Sure, a box capable of running 200 VMs must have plenty of RAM but still :)
>
> That's making quite a few assumptions.
>
> 1) That the global cert will be 32KiB (which clearly isn't the case today).
> 2) That every VM will want the global cert.
> 3) That userspace can't figure out a way to share the global cert.
>
> Even in that absolutely worst case scenario, I am not remotely convinced that it
> justifies taking on the necessary complexity to manage certs in-kernel.
>
>> Plus, GHCB now has to go via the userspace before talking to the PSP which
>> was not the case so far (though I cannot think of immediate implication
>> right now).
>
> Any argument along the lines of "because that's how we've always done it" is going
> to fall on deaf ears. If there's a real performance bottleneck with kicking out
> to userspace, then I'll happily work to figure out a solution. If.

No, not performance, I was trying to imagine what can go wrong if
multiple vcpus are making this call, all exiting to QEMU, in a loop,
racing, something like this.



--
Alexey