Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp5119192iob; Mon, 9 May 2022 09:00:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyHbGaSu4OGKR2T6ivfjo6xc/HBUHPmIikpW0hrQPSiZkmKa7k2Yw6fUdSeRcMjUkCjIxYl X-Received: by 2002:ac8:59ce:0:b0:2f3:441b:d323 with SMTP id f14-20020ac859ce000000b002f3441bd323mr15400715qtf.457.1652112044926; Mon, 09 May 2022 09:00:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652112044; cv=none; d=google.com; s=arc-20160816; b=vcOjW8MVsna5oCSQjSdYdfp3wbK19yo7kypRF/PCx9p1qioaH1c4UffsHoFD6cnGd7 C7aUOVNr9aEhd4lNvWxpUMhSUI//GH1Os0dwikB50QtZW78P1/BxgFiEAdbqiBYsf8MY +nLX3cmiCfO0R8VhjCx2AH+3GLuh7FOPHnh4Te2NbkJq/7KxC9g4wJcOlEZwmUQyymIu R6sY2IMySKOloJpe2jaRKlFh1yDu7azfW2jdlmqvOCb12WxZBxCG4TGy4+7dSN6Tfv+s 138Ge6LK7vLmthCv12XpxILfGYAuTCNDNtJ3BB53CZ7pV9RczbveKgK/Mc8QCHA0JDqh ezBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=m4TLCMYI42941hu3ardgVNqgp3gOumPVU61RkHYS5Zo=; b=dUX5cMB3irJpFKp5c/Q2iMDL48zvp8ZI3BEjnBFgyxT4pdWYYS1nKsedfNA6ydAEoU SRDFoKPqbGBCvJ2+shBG+p7asPz3cUqpntBV8t+/QnrR56sgZewT/7bqrPKEqM0QjMRC PPPHBY/I7MUuj6YXwha7umsxkw56G2nCR2Izb/UQ+c0dq3b4knWJXp+xNKX4Tj0rIneT 10IBTYywgfPEHnoOMSgsfgYtslAQ6kMQh8HMQMhw8MdMOmFY4ab9RMivk08jvsu/j7BH 6TtJDFR30WPJSUL74vjJjTpjv3m5nZo9+U4tnCzzOQArg5Tyi9FQXKKgGtz0bkmKhpWC WE0g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cs.nyu.edu Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id bj30-20020a05620a191e00b0067e4be23909si8579711qkb.334.2022.05.09.09.00.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 09:00:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cs.nyu.edu Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B32211EE0B4; Mon, 9 May 2022 08:56:10 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238692AbiEIQAC (ORCPT + 99 others); Mon, 9 May 2022 12:00:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238685AbiEIQAB (ORCPT ); Mon, 9 May 2022 12:00:01 -0400 Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40E4F1B0932; Mon, 9 May 2022 08:56:06 -0700 (PDT) Received: by mail-yw1-f177.google.com with SMTP id 00721157ae682-2f83983782fso149857317b3.6; Mon, 09 May 2022 08:56:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=m4TLCMYI42941hu3ardgVNqgp3gOumPVU61RkHYS5Zo=; b=sn3Mhl0wMW621CEwMwN8w8qSd4SHwc4oGozgpTKr2Ek902IBSZnoI9oJvPKW5py9bd hsY4Kknw6e48KQJ25vFu+DC2k+x4Q4fd80IlqOh5P1nqi+JuSsxD0vNh+6OCMuMCaeOG ToAgVN9cUTKr4K57AQXhYYFkh/XPZ76vHRk2ibCzhim2zphWIWtmyx1VL2wjdEoC+f9Q e6Gx4/driHtWCwgMpIObECLkCaHwdNIUL1g1/K17wAfLC5fXwrmLDJhEiDiqyzuRmTnN reTsI1qN31yxigaNwFptvTebEJOoymflLbRwm5bGcKEDmM2rMgV5JGwESvEXLja7cvCa xf4Q== X-Gm-Message-State: AOAM531iZl/i/MW795G7HEtdfbhxUq5VfKJfY0UCBT9Fqvq+4vzKHvVQ AEMNm6X5+ei13zsJ6vGPRGyVB+Eygn90mTTu+Jk= X-Received: by 2002:a0d:e2ca:0:b0:2f6:6586:df36 with SMTP id l193-20020a0de2ca000000b002f66586df36mr15212846ywe.433.1652111765263; Mon, 09 May 2022 08:56:05 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yevgeniy Dodis Date: Mon, 9 May 2022 11:55:53 -0400 Message-ID: Subject: Re: is "premature next" a real world rng concern, or just an academic exercise? To: "Jason A. Donenfeld" Cc: tytso , Nadia Heninger , Noah Stephens-Dawidowitz , Stefano Tessaro , torvalds@linux-foundation.org, "D. J. Bernstein" , jeanphilippe.aumasson@gmail.com, jann@thejh.net, keescook@chromium.org, gregkh@linuxfoundation.org, Peter Schwabe , linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org resending in plain text... (hope got it right) On Mon, May 9, 2022 at 11:15 AM Yevgeniy Dodis wrote: > > Hi Jason and all. > > Thank you for starting this fascinating discussion. I generally agree with everything Jason said. In particular, I am not > 100% convinced that the extra cost of the premature next defense is justified.(Although Windows and MacOS are adamant it is > worth it :).) > > But let me give some meta points to at least convince you this is not as obvious as Jason makes it sound. > > 1) Attacking RNGs in any model is really hard. Heck, everybody knew for years that /dev/random is a mess > (and we published it formally in 2013, although this was folklore knowledge), but in all these years nobody > (even Nadya's group :)) managed to find a practical attack. So just because the attack seems far-fetched, I do not think we should > lower our standards and do ugly stuff. Otherwise, just leave /dev/random the way it was before Jason started his awesome work. > > 2) As Jason says, there are two distinct attack vectors needed to make the premature next attack. > A) compromising the state > B) (nearly) continuously observing RNG outputs > > I agree with Jason's point that finding places where > -- A)+B) is possible, but > --- A)+A) is not possible, > is tricky. Although Nadya kind of indicated a place like that. VM1 and VM2 start with the same RNG state (for whatever > reason). VM1 is insecure, so can leak the state via A). VM2 is more secure, but obviously allows for B) through system > interface. This does not seem so hypothetical for me, especially in light of my mega-point 1) above -- almost any real-world > RNG attack is hard. > > But I want to look at it from a different angle here. Let's ask if RNGs should be secure against A) or B) individually. > > I think everybody agrees protection from B) is a must. This is the most basic definition of RNG! So let's just take itas > an axiom. > > Protection against A) is trickier. But my read of Jason's email is that all his criticism comes exactly from this point. > If your system allows for state compromise, you have bigger problems than the premature next, etc. But let's ask ourselves > the question. Are we ready to design RNGs without recovery from state compromise? I believe nobody on this list would > be comfortable saying "yes". Because this would mean we don;t need to accumulate entropy beyond system start-up. > Once we reach the point of good initial state, and state compromise is not an issue, just use straight ChaCha or whatever other > stream cipher. > > The point is, despite all arguments Jason puts, we all would feel extremely uncomfortable/uneasy to let continuous > entropy accumulation go, right? > > This means we all hopefully agree that we need protection against A) and B) individually. > > 3) Now comes the question. If we want to design a sound RNG using tools of modern cryptography, and we allow > the attacker an individual capability to enforce A) or B) individually, are we comfortable with the design where we: > * offer protection against A) > * offer protection against B) > * do NOT offer protection against A)+B), because we think it's too expensive given A)+B) is so rare? > > I do not have a convincing answer to this question, but it is at least not obvious to me. On a good note, one worry > we might have is how to even have a definition protecting A), protecting B), but not protecting A)+B). > Fortunately, our papers resolve this question (although there are still theoretical annoyances which I do not > want to get into in this email). So, at least from this perspective, we are good. We have a definition with > exactly these (suboptimal) properties. > > Anyway, these are my 2c. > Thoughts? > > Yevgeniy > > On Sun, May 1, 2022 at 7:17 AM Jason A. Donenfeld wrote: >> >> Hi Ted, >> >> That's a useful analysis; thanks for that. >> >> On Sat, Apr 30, 2022 at 05:49:55PM -0700, tytso wrote: >> > On Wed, Apr 27, 2022 at 03:58:51PM +0200, Jason A. Donenfeld wrote: >> > > >> > > 3) More broadly speaking, what kernel infoleak is actually acceptable to >> > > the degree that anybody would feel okay in the first place about the >> > > system continuing to run after it's been compromised? >> > >> > A one-time kernel infoleak where this might seem most likely is one >> > where memory is read while the system is suspended/hibernated, or if >> > you have a VM which is frozen and then replicated. A related version >> > is one where a VM is getting migrated from one host to another, and >> > the attacker is able to grab the system memory from the source "host" >> > after the VM is migrated to the destination "host". >> >> You've identified ~two places where compromises happen, but it's not an >> attack that can just be repeated simply by re-running `./sploit > state`. >> >> 1) Virtual machines: >> >> It seems like after a VM state compromise during migration, or during >> snapshotting, the name of the game is getting entropy into the RNG in a >> usable way _as soon as possible_, and not delaying that. This is >> Nadia's point. There's some inherent tension between waiting some amount >> of time to use all available entropy -- the premature next requirement >> -- and using everything you can as fast as you can because your output >> stream is compromised/duplicated and that's very bad and should be >> mitigated ASAP at any expense. >> >> [I'm also CC'ing Tom Risenpart, who's been following this thread, as he >> did some work regarding VM snapshots and compromise, and what RNG >> recovery in that context looks like, and arrived at pretty similar >> points.] >> >> You mentioned virtio-rng as a mitigation for this. That works, but only >> if the data read from it are actually used rather quickly. So probably >> /waiting/ to use that is suboptimal. >> >> One of the things added for 5.18 is this new "vmgenid" driver, which >> responds to fork/snapshot notifications from hypervisors, so that VMs >> can do something _immediately_ upon resumption/migration/etc. That's >> probably the best general solution to that problem. >> >> Though vmgenid is supported by QEMU, VMware, Hyper-V, and hopefully soon >> Firecracker, there'll still be people that don't have it for one reason >> or another (and it has to be enabled manually in QEMU with `-device >> vmgenid,guid=auto`; perhaps I should send a patch adding that to some >> default machine types). Maybe that's their problem, but I take as your >> point that we can still try to be less bad than otherwise by using more >> entropy more often, and not delaying as the premature next model >> requirements would have us do. >> >> 2) Suspend / hibernation: >> >> This is kind of the same situation as virtual machines, but the >> particulars are a little bit different: >> >> - There's no hypervisor giving us new seed material on resumption like >> we have with VM snapshots and vmgenid; but >> >> - We also always know when it happens, because it's not transparent to >> the OS, so at least we can attempt to do something immediately like >> we do with the vmgenid driver. >> >> Fortunately, most systems that are doing suspend or hibernation these >> days also have a RDRAND-like thing. It seems like it'd be a good idea >> for me to add a PM notifier, mix into the pool both >> ktime_get_boottime_ns() and ktime_get(), in addition to whatever type >> info I get from the notifier block (suspend vs hibernate vs whatever >> else) to account for the amount of time in the sleeping state, and then >> immediately reseed the crng, which will pull in a bunch of >> RDSEED/RDRAND/RDTSC values. This way on resumption, the system is always >> in a good place. >> >> I did this years ago in WireGuard -- clearing key material before >> suspend -- and there are some details around autosuspend (see >> wg_pm_notification() in drivers/net/wireguard/device.c), but it's not >> that hard to get right, so I'll give it a stab and send a patch. >> >> > But if the attacker can actually obtain internal state from one >> > reconstituted VM, and use that to attack another reconstituted VM, and >> > the attacker also knows what the nonce or time seed that was used so >> > that different reconstituted VMs will have unique CRNG streams, this >> > might be a place where the "premature next" attack might come into >> > play. >> >> This is the place where it matters, I guess. It's also where the >> tradeoff's from Nadia's argument come into play. System state gets >> compromised during VM migration / hibernation. It comes back online and >> starts doling out compromised random numbers. Worst case scenario is >> there's no RDRAND or vmgenid or virtio-rng, and we've just got the good >> old interrupt handler mangling cycle counters. Choices: A) recover from >> the compromise /slowly/ in order to mitigate premature next, or B) >> recover from the compromise /quickly/ in order to prevent things like >> nonce reuse. >> >> What is more likely? That an attacker who compromised this state at one >> point in time doesn't have the means to do it again elsewhere in the >> pipeline, will use a high bandwidth /dev/urandom output stream to mount >> a premature next attack, and is going after a high value target that >> inexplicably doesn't have RDRAND/vmgenid/virtio-rng enabled? Or that >> Nadia's group (or that large building in Utah) will get an Internet tap >> and simply start looking for repeated nonces to break? >> >> Jason