Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp464706iob; Wed, 18 May 2022 06:10:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKjeo74qH4JsHTeodGcASSdo5zRqSLd6iOmYGrLq965fbAOralmZ1MK58Q6hVzbOUVxTED X-Received: by 2002:a17:90a:cc6:b0:1d2:9a04:d29e with SMTP id 6-20020a17090a0cc600b001d29a04d29emr30453412pjt.136.1652879406813; Wed, 18 May 2022 06:10:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652879406; cv=none; d=google.com; s=arc-20160816; b=LtKwfV+eN/cuTRBIC9eKZ7YKmeDHxnEjUHMliR6xPVh5FgpFrx1dQN0NEW3dOpX8Sx M7bxIgNASbhcSVRkNV1+XwUGMaJQRixrm2KsQjsfCZdTYqs8Fiffx31yux21jfaLtK9M FJ+sqHxWjtxfAJGfzSLak22erwQpyE3gmXfa+5XypcDuWZbfHsfOCn9kSthIpwr3gbQr o2xq6XqMZYLTlbT5bv9rBi5U1cwvVcoZ8wVPEKdkTT/mICQBBRmM3WePkmBpeRZh1foK Mz2BRGudaSkLF6QcAMrGya9wuxm9Aa4gjOEGUc2MtIS/BBJs2rIEAihOq9ajKP9T1UNN wStA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=nSL1VQXqpZd0642FdZ4Hh+Wwgy2znoxK/NkE70FKucE=; b=LPuD+ojrYEpeE1tyEeGqHDTEOU7VaIsLbrSQAS3UxRGdRtHb20XRkozSzxAaO+P61E zAPNIYFC9ShI0792AvgpOJ07kVTP8BnrNXeWNwDjqL509ymVQJuO+fg5boSJgo71qvz3 dGmwGo4BiEuKiCQatgoLax5bWvvXopczQztJKGHXey6FKqkKDZ/AeMCpLS5Cynk8Tcg/ 1G1FjwBqL2S6J4V2JNSb+TWiytE8REiu2wO38vJunY5lWuFAiz7BUxTGj9ktmQaqGDKp kPOoXnHe+XZj59YGcw4xil0L0Z4Sy6bo2bcS5YzurMeNxfbGhV5Lc+1TR5pqD2H8wgaD 68fQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=vpWWbTn2; dkim=neutral (no key) header.i=@linutronix.de; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id t19-20020a63dd13000000b003c642710cc1si2412691pgg.725.2022.05.18.06.10.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 May 2022 06:10:06 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=vpWWbTn2; dkim=neutral (no key) header.i=@linutronix.de; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7CEFC17B851; Wed, 18 May 2022 06:10:04 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237598AbiERNJ7 (ORCPT + 99 others); Wed, 18 May 2022 09:09:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237448AbiERNJ6 (ORCPT ); Wed, 18 May 2022 09:09:58 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D133117B851 for ; Wed, 18 May 2022 06:09:56 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1652879394; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nSL1VQXqpZd0642FdZ4Hh+Wwgy2znoxK/NkE70FKucE=; b=vpWWbTn2+Qyhoa8rsTdRFpwkyH1IZp3/1KYQYA6U+D9wd/CL1snBLVLc6qCATpFmNRS/yj rirrwQCl9tWNNlYSlmnn4VFkZtW1a7jyTmJo0AnkUU5mQFgScdqfaRAWzgoVt4+28EDL0q UPjR4FvtgnSHLUQF3lTKeYRNf/JHJYDu2Q2hVzjmq1sgWGAQvcPbbzboJXaLfYCCoApogb hAq8BFObpoESWuWbe8YeXWVJsuOJBvSRUn8Kru4cRVL+Ry5z12QQadF85KP6RveB82ABbk LvxTuFpLMM1vqG1iO3Vfm7vhJR+or0RFkFMUuXMYC+vceM9HKlqNQTQndcyBaw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1652879394; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nSL1VQXqpZd0642FdZ4Hh+Wwgy2znoxK/NkE70FKucE=; b=Vs++B3BoYyxtqvIxD+PRh6GrtBXsD+uTbGmF+HJTVD3vlRwP9CnqNPUAjbThd3p4YZWtoX paYiR6jP8lDpPNCg== To: "Jason A. Donenfeld" Cc: LKML , x86@kernel.org, Filipe Manana , Vadim Galitsin Subject: Re: [patch 0/3] x86/fpu: Prevent FPU state corruption In-Reply-To: References: <20220501192740.203963477@linutronix.de> Date: Wed, 18 May 2022 15:09:54 +0200 Message-ID: <87fsl7j8bh.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 18 2022 at 03:02, Jason A. Donenfeld wrote: > On Wed, May 04, 2022 at 05:40:26PM +0200, Jason A. Donenfeld wrote: >> On Sun, May 01, 2022 at 09:31:42PM +0200, Thomas Gleixner wrote: >> > The recent changes in the random code unearthed a long standing FPU state >> > corruption due do a buggy condition for granting in-kernel FPU usage. >> >> Thanks for working that out. I've been banging my head over [1] for a >> few days now trying to see if it's a mis-bisect or a real thing. I'll >> ask Larry to retry with this patchset. > > So, Larry's debugging was inconsistent and didn't result in anything I > could piece together into basic cause and effect. But luckily Vadim, who > maintains the VirtualBox drivers for Oracle, was able to reproduce the > issue and was able to conduct some real debugging. I've CC'd him here. > From talking with Vadim, here are some findings thus far: > > - Certain Linux guest processes crash under high load. > - Windows kernel guest panics. > > Observation: the Windows kernel uses SSSE3 in their kernel all over the > place, generated by the compiler. > > - Moving the mouse around helps induce the crash. > > Observation: add_input_randomness() -> .. -> kernel_fpu_begin() -> blake2s_compress(). > > - The problem exhibits itself in rc7, so this patchset does not fix > the issue. > - Applying https://xn--4db.cc/ttEUSvdC fixes the issue. > > Observation: the problem is definitely related to using the FPU in a > hard IRQ. > > I went reading KVM to get some idea of why KVM does *not* have this > problem, and it looks like there's some careful code there about doing > xsave and such around IRQs. So my current theory is that VirtualBox's > VMM just forgot to do this, and until now this bug went unnoticed. That's a very valid assumption. I audited all places which fiddle with FPU in Linus tree and with the fix applied they're all safe. > Since VirtualBox is out of tree (and extremely messy of a codebase), and > this appears to be an out of tree module problem rather than a kernel > problem, I'm inclined to think that there's not much for us to do, at > least until we receive information to the contrary of this presumption. Agreed in all points. > But in case you do want to do something proactively, I don't have any > objections to just disabling the FPU in hard IRQ for 5.18. And in 5.19, > add_input_randomness() isn't going to hit that path anyway. But also, > doing nothing and letting the VirtualBox people figure out their bug > would be fine with me too. Either way, just wanted to give you a heads > up. That virtualborx bug has to be fixed in any case as this problem exists forever and there have been drivers using FPU in hard interrupt context in the past sporadically, so it's sheer luck that this didn't explode before. AFAICT all of this has been moved to softirq context over the years, so the random code is probably the sole in hard interrupt user in mainline today. In the interest of users we should probably bite the bullet and just disable hard interrupt FPU usage upstream and Cc stable. The stable kernel updates probably reach users faster. Thanks, tglx