2023-10-02 16:05:25

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] misc: Add Nitro Secure Module driver

Hey Greg,

On 30.09.23 08:20, Greg Kroah-Hartman wrote:
> On Fri, Sep 29, 2023 at 09:26:16PM +0200, Alexander Graf wrote:
>> Hi Arnd!
>>
>> On 29.09.23 19:28, Arnd Bergmann wrote:
>>> On Fri, Sep 29, 2023, at 09:33, Alexander Graf wrote:
>>>> When running Linux inside a Nitro Enclave, the hypervisor provides a
>>>> special virtio device called "NSM". This device has 2 main functions:
>>>>
>>>> 1) Provide attestation reports
>>>> 2) Modify PCR state
>>>> 3) Provide entropy
>>>>
>>>> This patch adds the core NSM driver that exposes a /dev/nsm device node
>>>> which user space can use to request attestation documents and influence
>>>> PCR states. A follow up patch will add a hwrng driver to feed its entropy
>>>> into the kernel.
>>>>
>>>> Originally-by: Petre Eftime <[email protected]>
>>>> Signed-off-by: Alexander Graf <[email protected]>
>>> Hi Alex,
>>>
>>> I've taken a first look at this driver and have some minor comments.
>>
>> Thanks a bunch!
>>
>>
>>> The main point here is that I think we need to look at possible
>>> alternatives for the user space interface, and (if possible) change
>>> to a set of higher-level ioctl commands from the simple passthrough.
>>
>> I'm slightly torn on that bit. I think in hindsight the NSM device probably
>> should have been a reserved vsock CID and the hwrng one should have just
>> been virtio-rng.
>>
>> The problem is that Nitro Enclaves were launched in 2020 and since an
>> ecosystem developed in multiple languages to support building code inside:
>>
>> https://github.com/aws/aws-nitro-enclaves-nsm-api/blob/main/src/driver/mod.rs#L66
>> https://github.com/donkersgoed/aws-nsm-interface/blob/main/aws_nsm_interface/__init__.py#L264-L274
>> https://github.com/hf/nsm/blob/main/nsm.go#L99-L129
>>
>>
>> All of these use the (downstream) ioctl that this patch also implements. We
>> could change it, but instead of making it easier for user space to adapt the
>> device node, it would probably hurt more.
>>
>> I agree that this is not a great place to be in. This driver absolutely
>> should have been upstreamed 3 years ago. But I can't turn back time (yet)
>> :).
> As you know, this is no excuse to put an api in the kernel that isn't
> correct or good for the long-term. Just because people do foolish
> things outside of the kernel tree never means we have to accept them in
> our tree. Instead we can ask them to fix them properly as part of us
> taking the code.
>
> So please, work on doing this right.


Sorry if my message above came over as a push to put an "incorrect api"
into the kernel.

In situations like this where you can either give user space full access
to the device's command space through a generic API or you can create
command awareness in the kernel and make it the kernel's task to learn
about each command, IMHO it's never a clear cut on which one is better.
Especially in virtual environments where the set of commands can change
quickly over time.

So what I was trying to say above is that *if* we consider both paths
equally viable, I'd err on the one that enables the existing ecosystem.
However if there are good reasons to not do command pass-through, I'm
all for abstracting it away :)

Looking at prior art, the most similar implementations to this are TPMs
and virtio-vsock. With virtio-vsock, kernel space has no idea what it
talks to on the other hand and makes it 100% user space's problem. With
TPMs, you typically use /dev/tpm0 to gain raw command access to the
target device. So while we could engineer something smarter here, I'm
not convinced yet it's a net win.


Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879