2022-05-02 23:17:28

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH 2/2] random: add fork_event sysctl for polling VM forks


On 02.05.22 20:04, Jason A. Donenfeld wrote:
> Hey Lennart,
>
> On Mon, May 02, 2022 at 06:51:19PM +0200, Lennart Poettering wrote:
>> On Mo, 02.05.22 18:12, Jason A. Donenfeld ([email protected]) wrote:
>>
>>>>> In order to inform userspace of virtual machine forks, this commit adds
>>>>> a "fork_event" sysctl, which does not return any data, but allows
>>>>> userspace processes to poll() on it for notification of VM forks.
>>>>>
>>>>> It avoids exposing the actual vmgenid from the hypervisor to userspace,
>>>>> in case there is any randomness value in keeping it secret. Rather,
>>>>> userspace is expected to simply use getrandom() if it wants a fresh
>>>>> value.
>>>> Wouldn't it make sense to expose a monotonic 64bit counter of detected
>>>> VM forks since boot through read()? It might be interesting to know
>>>> for userspace how many forks it missed the fork events for. Moreover it
>>>> might be interesting to userspace to know if any fork happened so far
>>>> *at* *all*, by checking if the counter is non-zero.
>>> "Might be interesting" is different from "definitely useful". I'm not
>>> going to add this without a clear use case. This feature is pretty
>>> narrowly scoped in its objectives right now, and I intend to keep it
>>> that way if possible.
>> Sure, whatever. I mean, if you think it's preferable to have 3 API
>> abstractions for the same concept each for it's special usecase, then
>> that's certainly one way to do things. I personally would try to
>> figure out a modicum of generalization for things like this. But maybe
>> that' just me…
>>
>> I can just tell you, that in systemd we'd have a usecase for consuming
>> such a generation counter: we try to provide stable MAC addresses for
>> synthetic network interfaces managed by networkd, so we hash them from
>> /etc/machine-id, but otoh people also want them to change when they
>> clone their VMs. We could very nicely solve this if we had a
>> generation counter easily accessible from userspace, that starts at 0
>> initially. Because then we can hash as we always did when the counter
>> is zero, but otherwise use something else, possibly hashed from the
>> generation counter.
> This doesn't work, because you could have memory-A split into memory-A.1
> and memory-A.2, and both A.2 and A.1 would ++counter, and wind up with
> the same new value "2". The solution is to instead have the hypervisor
> pass a unique value and a counter. We currently have a 16 byte unique
> value from the hypervisor, which I'm keeping as a kernel space secret
> for the RNG; we're waiting on a word-sized monotonic counter interface
> from hypervisors in the future. When we have the latter, then we can
> start talking about mmapable things. Your use case would probably be
> served by exposing that 16-byte unique value (hashed with some constant
> for safety I suppose), but I'm hesitant to start going down that route
> all at once, especially if we're to have a more useful counter in the
> future.


Michael, since we already changed the CID in the spec, can we add a
property to the device that indicates the first 4 bytes of the UUID will
always be different between parent and child?

That should give us the ability to mmap the vmgenid directly to user
space and act based on a simple u32 compare for clone notification, no?


Thanks;

Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879