Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp3367852pxu; Mon, 19 Oct 2020 10:16:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwogfeN9GyAAzBBJJQY/Ilj4B7K5Vdlwj91cglqcvVD4/nKoE6mqBQ+NKLkqMnt3BoGcoB/ X-Received: by 2002:a50:d654:: with SMTP id c20mr915232edj.54.1603127793753; Mon, 19 Oct 2020 10:16:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603127793; cv=none; d=google.com; s=arc-20160816; b=kJ1ZVFEuwcESmC1NGqckf01/nwoiFiCR1iritbRBpC2JEW1TopUvyVt7TTzIllllhI IWdl+hTXHeLP5dnevmVLpUc6wlR3zPWlOdUGQJwCnHvwnkbAkYxVYBcd2Ta4a5DAg57o LXxJWXs751PbStdPfk3t4OrE1oDG6aRIavVYcDKlGyJvO/bXxuEI8ye9Qydfyk1nFErf eM0FdsnJzESZx3MLionGvEc0s82oSyGRjwy8ClmvnhWUBx25koMqzd8cpOUPWRTR35MV aSE5l4fgLUvfQFZ7oY5M7WUxsJDy3vBkTiOf5FdvBLFEvDJXMGsmvPOcSGSUmP9e3SC4 qqjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=a4Kyq9ggmnqJgL1OMZgiL3YI8Vq7lTIeBQc5QZoN7D4=; b=e7DY8IYjP0my/fcnEg9PnAnl27gEWjLyOkAnHYBmIxxSQq2McjMY7yWB7J0LvgXhiP K+lQ9N0C9x6eqVCFCwb03huFHAx0gM5fwyrxT5C3h7msaER+c3YZWn2KparCUr9+T9LG NC+mlyvxprsMNW+u2RwROF5j4l02nyt2JmkhaR6JypZipqZl1lFShGcdufdkTanYzRl7 pOLQ8v6WVfIguOOAcrv5ba32l2U3qkO4fGcs2r7al+kDtrgsEGXs47fhoyptE7Wss15w sak3pvrdOed6jrWiIvExLreB+6lapCIMXUXG+w7Em6XeRDuOEroet4LieJ4DriWaIC+m lyPw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=Rleo5KpY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z21si194309edl.534.2020.10.19.10.16.11; Mon, 19 Oct 2020 10:16:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=Rleo5KpY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730892AbgJSRPF (ORCPT + 99 others); Mon, 19 Oct 2020 13:15:05 -0400 Received: from mail.efficios.com ([167.114.26.124]:34144 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730646AbgJSRPF (ORCPT ); Mon, 19 Oct 2020 13:15:05 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id AADCB2707BD; Mon, 19 Oct 2020 13:15:03 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id QTruwX3BYTn2; Mon, 19 Oct 2020 13:15:03 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 284FE270752; Mon, 19 Oct 2020 13:15:03 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 284FE270752 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1603127703; bh=a4Kyq9ggmnqJgL1OMZgiL3YI8Vq7lTIeBQc5QZoN7D4=; h=Date:From:To:Message-ID:MIME-Version; b=Rleo5KpYLxJL9bE+eS/IDKaOFoEmvkq70bRu+JU0FvBo/l/SmYGCBDQs+xaFOFVHh BJLQKLA6tCK8ZaIUWSrsP9eNGLLZGdYFBrCmdjHdnSkKFJRiWBCddq7bINqO1JrS6z LiDZeRjCGch2dLFOuXwqHnn+Ivxs2sOrIF5lAwNdQIXqAC1yn7Wcp2M95eK+hTAjh2 2bA+wXNULdWuS9DpiktZT7bNyAwruDfMklfP+qTv7VM9LXYilPo5xXvaq8H58JbRBX xy5xAXA9Jfvdh96AGWs2I075RgU0Y4TmN5l6CqHVGnXdNU7ZWMEVZ3OrX5Vg/g/cFz WJ9MFFce5aMuQ== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id yP-ztTyztNwI; Mon, 19 Oct 2020 13:15:03 -0400 (EDT) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id 0FA782707BC; Mon, 19 Oct 2020 13:15:03 -0400 (EDT) Date: Mon, 19 Oct 2020 13:15:02 -0400 (EDT) From: Mathieu Desnoyers To: Andy Lutomirski Cc: Jann Horn , "Catangiu, Adrian Costin" , Jason Donenfeld , Theodore Tso , Willy Tarreau , Eric Biggers , "open list, DOCUMENTATION" , linux-kernel , virtualization@lists.linux-foundation.org, "Graf (AWS), Alexander" , "MacCarthaigh, Colm" , "Woodhouse, David" , bonzini@gnu.org, "Singh, Balbir" , "Weiss, Radu" , oridgar@gmail.com, ghammer@redhat.com, Jonathan Corbet , Greg Kroah-Hartman , mst@redhat.com, qemu-devel@nongnu.org, KVM list , Michal Hocko , "Rafael J. Wysocki" , Pavel Machek , linux-api Message-ID: <476895871.28084.1603127702969.JavaMail.zimbra@efficios.com> In-Reply-To: References: <788878CE-2578-4991-A5A6-669DCABAC2F2@amazon.com> Subject: Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_3968 (ZimbraWebClient - FF81 (Linux)/8.8.15_GA_3968) Thread-Topic: drivers/virt: vmgenid: add vm generation id driver Thread-Index: LGqdszkWAp7MsnjacXz6ZDq20p5wzw== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Oct 17, 2020, at 2:10 PM, Andy Lutomirski luto@kernel.org wrote: > On Fri, Oct 16, 2020 at 6:40 PM Jann Horn wrote: >> >> [adding some more people who are interested in RNG stuff: Andy, Jason, >> Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this >> concerns some pretty fundamental API stuff related to RNG usage] >> >> On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin >> wrote: >> > - Background >> > >> > The VM Generation ID is a feature defined by Microsoft (paper: >> > http://go.microsoft.com/fwlink/?LinkId=260709) and supported by >> > multiple hypervisor vendors. >> > >> > The feature is required in virtualized environments by apps that work >> > with local copies/caches of world-unique data such as random values, >> > uuids, monotonically increasing counters, etc. >> > Such apps can be negatively affected by VM snapshotting when the VM >> > is either cloned or returned to an earlier point in time. >> > >> > The VM Generation ID is a simple concept meant to alleviate the issue >> > by providing a unique ID that changes each time the VM is restored >> > from a snapshot. The hw provided UUID value can be used to >> > differentiate between VMs or different generations of the same VM. >> > >> > - Problem >> > >> > The VM Generation ID is exposed through an ACPI device by multiple >> > hypervisor vendors but neither the vendors or upstream Linux have no >> > default driver for it leaving users to fend for themselves. >> > >> > Furthermore, simply finding out about a VM generation change is only >> > the starting point of a process to renew internal states of possibly >> > multiple applications across the system. This process could benefit >> > from a driver that provides an interface through which orchestration >> > can be easily done. >> > >> > - Solution >> > >> > This patch is a driver which exposes the Virtual Machine Generation ID >> > via a char-dev FS interface that provides ID update sync and async >> > notification, retrieval and confirmation mechanisms: >> > >> > When the device is 'open()'ed a copy of the current vm UUID is >> > associated with the file handle. 'read()' operations block until the >> > associated UUID is no longer up to date - until HW vm gen id changes - >> > at which point the new UUID is provided/returned. Nonblocking 'read()' >> > uses EWOULDBLOCK to signal that there is no _new_ UUID available. >> > >> > 'poll()' is implemented to allow polling for UUID updates. Such >> > updates result in 'EPOLLIN' events. >> > >> > Subsequent read()s following a UUID update no longer block, but return >> > the updated UUID. The application needs to acknowledge the UUID update >> > by confirming it through a 'write()'. >> > Only on writing back to the driver the right/latest UUID, will the >> > driver mark this "watcher" as up to date and remove EPOLLIN status. >> > >> > 'mmap()' support allows mapping a single read-only shared page which >> > will always contain the latest UUID value at offset 0. >> >> It would be nicer if that page just contained an incrementing counter, >> instead of a UUID. It's not like the application cares *what* the UUID >> changed to, just that it *did* change and all RNGs state now needs to >> be reseeded from the kernel, right? And an application can't reliably >> read the entire UUID from the memory mapping anyway, because the VM >> might be forked in the middle. >> >> So I think your kernel driver should detect UUID changes and then turn >> those into a monotonically incrementing counter. (Probably 64 bits >> wide?) (That's probably also a little bit faster than comparing an >> entire UUID.) >> >> An option might be to put that counter into the vDSO, instead of a >> separate VMA; but I don't know how the other folks feel about that. >> Andy, do you have opinions on this? That way, normal userspace code >> that uses this infrastructure wouldn't have to mess around with a >> special device at all. And it'd be usable in seccomp sandboxes and so >> on without needing special plumbing. And libraries wouldn't have to >> call open() and mess with file descriptor numbers. > > The vDSO might be annoyingly slow for this. Something like the rseq > page might make sense. It could be a generic indication of "system > went through some form of suspend". This might indeed fit nicely as an extension of my KTLS prototype (extensible rseq): https://lore.kernel.org/lkml/20200925181518.4141-1-mathieu.desnoyers@efficios.com/ There are a few ways we could wire things up. One might be to add the UUID field into the extended KTLS structure (so it's always updated after it changes on next return to user-space). For this I assume that the Linux scheduler within the guest VM always preempts all threads before a VM is suspended (is that indeed true ?). This leads to one important question though: how is the UUID check vs commit operation made atomic with respect to suspend ? Unless we use rseq critical sections in assembly, where the kernel will abort the rseq critical section on preemption, I don't see how we can ensure that the UUID value does not change right after it has been checked, before the "commit" side-effect. And what is the expected "commit" side-effect ? Is it a store to a variable in user-space memory, or is it issuing a system call which sends a packet over the network ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com