Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp1431027pxm; Thu, 24 Feb 2022 03:04:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJyNRjd/G/ORtdnC1VpC/TQ63+rhYUCuP2wj4K99OiKXqm1763pkMaeBuR3tItsavCgFQHeF X-Received: by 2002:a17:907:b92:b0:6b9:717b:2348 with SMTP id ey18-20020a1709070b9200b006b9717b2348mr1813619ejc.137.1645700676927; Thu, 24 Feb 2022 03:04:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645700676; cv=none; d=google.com; s=arc-20160816; b=JDTaA8KcHgC+XzITRSESQzq8SR/ssHEZ7jDr/kPb/0OhWS6x/en9A6v/tBwdzQA5n7 swBrfOwt1evTBg2IPIP2l38jldF6Cy2i6KljsqRlV8cDL5C2vmtDWG+DZr+3E82fSqLa IQlWfVyhWeHxiyFJWUZdtmmYQpm0R0fYln9XqEhDdxtL8914m107TzdRCJZJk3/dkluC zwUqXB76o9uAOz/IFbKz/CvhLziTWQ8+tXxLCfEX0VwF53EouwuEw7zWot6NE4bxvUYu WrjIcnWIxGXNvPUydVEWbPYqrdWRS7LKZC5Uc6TXqJLfQDQgHT4R5ie8aVn3Wi4c8Rc7 ukLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=PMrO1wHK+f0i5l49rTbD4HIL+P3tUpsU4mhMhEe7j9I=; b=I81WIgWGdrxPm64N2M/2D8cm5J118Hxk7xNGUlqWNCg3hPLzMH0bpl7dNai9A2WvRi 2El5W/BgETvblyzNwVXM0AjUp05gmFDdeIUfc4ofFNqFkTY4jxG9Ql99MkG28XxFf2CL cLmnwFnogaCukTkjmyonYzaDOIj58ejMAC3+hyfwBf/O6YpqyzhLsbLcQu9f7Jk7XEyg bohUO0dWeHUxYMIy8HGCUciuaKgdA+LcfH7TXRW0cMAmKZsPLqjkcj6NXBGYmf5ErYtE peF0Huaz4JI7g3pMypNKqGOC0bXS75goXlaV0OdpnMVAawnLfj5sYG72kVNFR5pJM7P9 nboA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AM57WqUA; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g11si1329557edr.343.2022.02.24.03.04.01; Thu, 24 Feb 2022 03:04:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AM57WqUA; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233559AbiBXKoM (ORCPT + 99 others); Thu, 24 Feb 2022 05:44:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233521AbiBXKoK (ORCPT ); Thu, 24 Feb 2022 05:44:10 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 643FA1A39E8 for ; Thu, 24 Feb 2022 02:43:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645699420; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PMrO1wHK+f0i5l49rTbD4HIL+P3tUpsU4mhMhEe7j9I=; b=AM57WqUAZx65eQmP0VntBtMC27z46a/5PCcAvfM0Fp5VQX9pXk9qzWQmcycCaiV5h9Y5wk yosuT9hI73BmzCGciGRn2tesBmjnC/TDAE180dqegRXgxmHjqbrUNnDX4mIzOxJQYGoYHK 3fY8Oj/4cbIHauQ434/mN9VzhlkTZMw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-279--UbWN4hpN0-INrWLYK-Htg-1; Thu, 24 Feb 2022 05:43:35 -0500 X-MC-Unique: -UbWN4hpN0-INrWLYK-Htg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 831FE5123; Thu, 24 Feb 2022 10:43:32 +0000 (UTC) Received: from redhat.com (unknown [10.33.36.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DCE9E83197; Thu, 24 Feb 2022 10:43:28 +0000 (UTC) Date: Thu, 24 Feb 2022 10:43:26 +0000 From: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= To: Alexander Graf Cc: "Jason A. Donenfeld" , linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, adrian@parity.io, dwmw@amazon.co.uk, acatan@amazon.com, colmmacc@amazon.com, sblbir@amazon.com, raduweis@amazon.com, jannh@google.com, gregkh@linuxfoundation.org, tytso@mit.edu Subject: Re: [PATCH RFC v1 0/2] VM fork detection for RNG Message-ID: Reply-To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= References: <20220223131231.403386-1-Jason@zx2c4.com> <234d7952-0379-e3d9-5e02-5eba171024a0@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <234d7952-0379-e3d9-5e02-5eba171024a0@amazon.com> User-Agent: Mutt/2.1.5 (2021-12-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Thu, Feb 24, 2022 at 09:53:59AM +0100, Alexander Graf wrote: > Hey Jason, > > On 23.02.22 14:12, Jason A. Donenfeld wrote: > > This small series picks up work from Amazon that seems to have stalled > > out later year around this time: listening for the vmgenid ACPI > > notification, and using it to "do something." Last year, that something > > involved a complicated userspace mmap chardev, which seems frought with > > difficulty. This year, I have something much simpler in mind: simply > > using those ACPI notifications to tell the RNG to reinitialize safely, > > so we don't repeat random numbers in cloned, forked, or rolled-back VM > > instances. > > > > This series consists of two patches. The first is a rather > > straightforward addition to random.c, which I feel fine about. The > > second patch is the reason this is just an RFC: it's a cleanup of the > > ACPI driver from last year, and I don't really have much experience > > writing, testing, debugging, or maintaining these types of drivers. > > Ideally this thread would yield somebody saying, "I see the intent of > > this; I'm happy to take over ownership of this part." That way, I can > > focus on the RNG part, and whoever steps up for the paravirt ACPI part > > can focus on that. > > > > As a final note, this series intentionally does _not_ focus on > > notification of these events to userspace or to other kernel consumers. > > Since these VM fork detection events first need to hit the RNG, we can > > later talk about what sorts of notifications or mmap'd counters the RNG > > should be making accessible to elsewhere. But that's a different sort of > > project and ties into a lot of more complicated concerns beyond this > > more basic patchset. So hopefully we can keep the discussion rather > > focused here to this ACPI business. > > > The main problem with VMGenID is that it is inherently racy. There will > always be a (short) amount of time where the ACPI notification is not > processed, but the VM could use its RNG to for example establish TLS > connections. > > Hence we as the next step proposed a multi-stage quiesce/resume mechanism > where the system is aware that it is going into suspend - can block network > connections for example - and only returns to a fully functional state after > an unquiesce phase: > >   https://github.com/systemd/systemd/issues/20222 The downside of course is precisely that the guest now needs to be aware and involved every single time a snapshot is taken. Currently with virt the act of taking a snapshot can often remain invisible to the VM with no functional effect on the guest OS or its workload, and the host OS knows it can complete a snapshot in a specific timeframe. That said, this transparency to the VM is precisely the cause of the race condition described. With guest involvement to quiesce the bulk of activity for time period, there is more likely to be a negative impact on the guest workload. The guest admin likely needs to be more explicit about exactly when in time it is reasonable to take a snapshot to mitigate the impact. The host OS snapshot operations are also now dependant on co-operation of a guest OS that has to be considered to be potentially malicious, or at least crashed/non-responsive. The guest OS also needs a way to receive the triggers for snapshot capture and restore, most likely via an extension to something like the QEMU guest agent or an equivalent for othuer hypervisors. Despite the above, I'm not against the idea of co-operative involvement of the guest OS in the acts of taking & restoring snapshots. I can't see any other proposals so far that can reliably eliminate the races in the general case, from the kernel right upto user applications. So I think it is neccessary to have guest cooperative snapshotting. > What exact use case do you have in mind for the RNG/VMGenID update? Can you > think of situations where the race is not an actual concern? Lets assume we do take the approach described in that systemd bug and have a co-operative snapshot process. If the hypervisor does the right thing and guest owners install the right things, they'll have a race free solution that works well in normal operation. That's good. Realistically though, it is never going to be universally and reliably put into practice. So what is our attitude to cases where the preferred solution isn't availble and/or operative ? There are going to be users who continue to build their guest disk images without the QEMU guest agent (or equivalent for whatever hypervisor they run on) installed because they don't know any better. Or where the guest agent is mis-configured or fails to starts or some other scenario that prevents the quiesce working as desired. The host mgmt could refuse to take a snapshot in these cases. More likely is that they are just going to go ahead and do a snapshot anyway because lack of guest agent is a very common scenario today and users want their snapshots. There are going to be virt management apps / hypervisors that don't support talking to any guest agent across their snapshot operation in the first place, so systemd gets no way to trigger the required quiesce dance on snapshot, but they likely have VMGenID support implemented already. IOW, I could view VMGenID triggered fork detection integrated with the kernel RNG as providing a backup line of defence that is going to "just work", albeit with the known race. It isn't as good as the guest co-operative snapshot approach, because it only tries to solve the one specific targetted problem of updating the kernel RNG. Is it still better than doing nothing at all though, for the scenario where guest co-operative snapshot is unavailable ? If it is better than nothing, is it then compelling enough to justify the maint cost of the code added to the kernel ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|