Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2662330imm; Wed, 16 May 2018 17:24:35 -0700 (PDT) X-Google-Smtp-Source: AB8JxZq6QYHB7+/i0fPv2jiQJJoWA465y3TNY8r7hSHKcVOKIE4IDsDgFduoizfU/Ix8G0kb4dvX X-Received: by 2002:a62:9c93:: with SMTP id u19-v6mr3105193pfk.74.1526516675284; Wed, 16 May 2018 17:24:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526516675; cv=none; d=google.com; s=arc-20160816; b=XIPask7wEbG0n7ZX/CXgbYu5zvgeZrMKDdKCI9IqXEUYJwt/uzs4YP/WhsjHxFRC0v GONFNp6CjtGUkPOGBVgTLUHg6IRhPu4EL8ucCVpc6Fep6e3JC67ArrDA/i6ODOptJikL AAy4/79MLGISd4cOqvteOkTxPJ0ca1Rb492sgo4oV3WqIqDW5fMU9X9EuKm2daLe6tca HezHZIwdhgdlvZ4/7snYWXp6+my1qKJGgApBZ2PpomlcJXWRSBrTcu2Sb7LoEwHxInAz rSGh/4EE36IGSqGXa3fUtZZMbnqO02thbpG1/b3HZmnBW1ejisWpLFRDb50xRA51m2lh cP5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:arc-authentication-results; bh=O3fSuKr2zU3vjRey1/kOPD7J1lgi2X36kwCtL4QrqFI=; b=HmgjoL4p/ejK3svDMJSoQuBxoHVoqNVwrrC+LLQLU26l0HLzaA/Q8Xm82uTZFD5ppP s2TyWagNpkJBUPMFuppaxyxenkz+8GH3y+r3CuA6DK/+jPbUv8Wcp0EvQ/Oxn4xPazm1 RY7vm4YJUj0Bv7xVnuPBQ5rws3HESYMRLfifuh5+JZKXnQxfS+um7wG4yTCKSccVo9/E LXZOLfAOyLidMtSy/1JSgzaeBkmZH+2LcYL50Xg+eP3mlHwoom4Tzk9koKSeORh6MjkM 3jOHf00Nv5GlOmL6FZQuB0sb920lvmd8UZKda1eu7chDyVDq0e/xQRqYDbXDm40lqUNM WlNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=csail.mit.edu Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e6-v6si3033308pgr.374.2018.05.16.17.24.06; Wed, 16 May 2018 17:24:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=csail.mit.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752009AbeEQAX6 (ORCPT + 99 others); Wed, 16 May 2018 20:23:58 -0400 Received: from outgoing-stata.csail.mit.edu ([128.30.2.210]:33225 "EHLO outgoing-stata.csail.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751259AbeEQAX4 (ORCPT ); Wed, 16 May 2018 20:23:56 -0400 X-Greylist: delayed 994 seconds by postgrey-1.27 at vger.kernel.org; Wed, 16 May 2018 20:23:56 EDT Received: from 67-148-5-85.dia.static.qwest.net ([67.148.5.85] helo=srivatsab-a01.vmware.com) by outgoing-stata.csail.mit.edu with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1fJ6S0-000CDh-7e; Wed, 16 May 2018 20:07:20 -0400 Subject: Re: [PATCH 1/5] random: fix crng_ready() test To: "Theodore Y. Ts'o" , Stephan Mueller , linux-crypto@vger.kernel.org, Linux Kernel Developers List References: <20180413013046.404-1-tytso@mit.edu> <1699469.KmO53oa8XU@tauon.chronox.de> <20180413125313.GA2633@thunk.org> <4393662.RPWnPK42dp@tauon.chronox.de> <20180413170037.GA28721@thunk.org> From: "Srivatsa S. Bhat" Message-ID: <6bb4d4cb-aafa-7440-0dc3-40faf647ec89@csail.mit.edu> Date: Wed, 16 May 2018 17:07:08 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180413170037.GA28721@thunk.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/13/18 10:00 AM, Theodore Y. Ts'o wrote: > On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote: >> >> What I would like to point out that more and more folks change to >> getrandom(2). As this call will now unblock much later in the boot cycle, >> these systems see a significant departure from the current system behavior. >> >> E.g. an sshd using getrandom(2) would be ready shortly after the boot finishes >> as of now. Now it can be a matter minutes before it responds. Thus, is such >> change in the kernel behavior something for stable? [...] > I was a little worried that on VM's this could end up causing things > to block for a long time, but an experiment on a GCE VM shows that > isn't a problem: > > [ 0.000000] Linux version 4.16.0-rc3-ext4-00009-gf6b302ebca85 (tytso@cwcc) (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018 > [ 1.282220] random: fast init done > [ 3.987092] random: crng init done > [ 4.376787] EXT4-fs (sda1): re-mounted. Opts: (null) > > There are some desktops where the "crng_init done" report doesn't > happen until 45-90 seconds into the boot. I don't think I've seen > reports where it takes _minutes_ however. Can you give me some > examples of such cases? On a Photon OS VM running on VMware ESXi, this patch causes a boot speed regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools (rngd) installed. ] [ 1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered [ 1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz [ 1.470707] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x36c65c1a9e1, max_idle_ns: 881590695311 ns [ 1.474249] clocksource: Switched to clocksource tsc [ 1.584427] systemd-journald[216]: Received request to flush runtime journal from PID 1 [ 346.620718] random: crng init done Interestingly, the boot delay is exacerbated on VMs with large amounts of RAM. For example, the delay is not so noticeable (< 30 seconds) on a VM with 2GB memory, but goes up to 5 minutes on an 8GB VM. Also, cloud-init-local.service seems to be the one blocking for entropy here. systemd-analyze critical-chain shows: The time after the unit is active or started is printed after the "@" character. The time the unit takes to start is printed after the "+" character. multi-user.target @6min 1.283s └─vmtoolsd.service @6min 1.282s └─cloud-final.service @6min 366ms +914ms └─cloud-config.service @5min 59.174s +1.190s └─cloud-config.target @5min 59.172s └─cloud-init.service @5min 47.423s +11.744s └─systemd-networkd-wait-online.service @5min 45.999s +1.420s └─systemd-networkd.service @5min 45.975s +21ms └─network-pre.target @5min 45.973s └─cloud-init-local.service @241ms +5min 45.687s └─systemd-remount-fs.service @222ms +13ms └─systemd-fsck-root.service @193ms +26ms └─systemd-journald.socket @188ms └─-.mount @151ms └─system.slice @161ms └─-.slice @151ms It would be great if this CVE can be fixed somehow without causing boot speed to spike from ~20 seconds to 5 minutes, as that makes the system pretty much unusable. I can workaround this by installing haveged, but ideally an in-kernel fix would be better. If you need any other info about my setup or if you have a patch that I can test, please let me know! Thank you very much! Regards, Srivatsa VMware Photon OS