Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1362786imu; Fri, 9 Nov 2018 15:27:20 -0800 (PST) X-Google-Smtp-Source: AJdET5fhXV9NZn9TfcZ9kCrxKJcDPEckFapgOmpGu1Hu9YgOQd0cFZxiQWXh3xTjmKi9pIA8ke5Z X-Received: by 2002:a63:504d:: with SMTP id q13mr9340952pgl.319.1541806040237; Fri, 09 Nov 2018 15:27:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541806040; cv=none; d=google.com; s=arc-20160816; b=NZVGVFNNDaeVvtVRcnmYcn65NWqNjTctYqM8VF2Ik0lw+z4ABJVZM964z4yhAm4i36 8A7UR9Kl6q3wxsNPaSnIpf3dZmBWxGf+ymTFCFCoigSF0gZpiW9kR1ThEUxoGVkmwkfs FICEonLpTz2Ej9bgUFSR0fYaC/oqONvnr6i6X3NSzfn43x6+dtFRsOb7mANeJ3OuWMGP x/MtkWDPCIPuFxDJTGgjBk4QJJ32vs1zD7FCSJ65RzJkm9uhpyw/KkDNK9mrQllDqBf6 kbGRvVGBajJSovw8o89SJVr68s4hYlDGGcg1rUaMn0ct7hMKgs15NKPtfbt0pqgYCUF6 HNGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=OV13cwIT3R1teWEg6J9yK6F97G9QJDvtgHZgDd8Syz8=; b=0UZoBhXbM5nRSLN1bbR9YBm8GNSoIBoc3Vq8QBut3CErRdtJYHzC2+fDxZLqYUv12E TqAyb6K9erJoZnjvLOVz5dG/p/WnZfUJNqsGpaMPeG3J8XGJh1v5Qa9p/FzorlKzVEdG zdGh642aR1Z+/VLERxaz1gpKDeRjYAXT0smR9puFvTHsV5P8Xwf0FOxGleTLIoHY9eO/ gxVZQdND9fb9FlB5evAmxTs325dTHf//6f9+ycmo4QLAb6G2YooEfUvI7et/1m9SshB7 OGnF0Q0ef0BCbs/Z/Xy66dDAJk7Lw8JfiqXohl0dpNcFXpygiekkTPyEsC3UBiikwLPH a5rA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q23-v6si9007966pll.178.2018.11.09.15.27.04; Fri, 09 Nov 2018 15:27:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728796AbeKJJIP convert rfc822-to-8bit (ORCPT + 99 others); Sat, 10 Nov 2018 04:08:15 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:47991 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728434AbeKJJIO (ORCPT ); Sat, 10 Nov 2018 04:08:14 -0500 Received: from bigeasy by Galois.linutronix.de with local (Exim 4.80) (envelope-from ) id 1gLG9S-0006KO-09; Sat, 10 Nov 2018 00:25:22 +0100 Date: Sat, 10 Nov 2018 00:25:21 +0100 From: Sebastian Andrzej Siewior To: Borislav Petkov Cc: Ingo Molnar , linux-kernel@vger.kernel.org, x86@kernel.org, Andy Lutomirski , Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , kvm@vger.kernel.org, "Jason A. Donenfeld" , Rik van Riel , Dave Hansen Subject: Re: [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Message-ID: <20181109232521.l2ll2n3coxygkxv4@linutronix.de> References: <20181107194858.9380-1-bigeasy@linutronix.de> <20181107194858.9380-3-bigeasy@linutronix.de> <20181108145721.GC7543@zn.tnic> <20181109173521.2m6iijp5wkncgi77@linutronix.de> <20181109185202.GF21243@zn.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: <20181109185202.GF21243@zn.tnic> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-11-09 19:52:02 [+0100], Borislav Petkov wrote: > On Fri, Nov 09, 2018 at 06:35:21PM +0100, Sebastian Andrzej Siewior wrote: > > fpu__drop() stets ->initialized to 0. As a result the context switch > > "... the context switch path landing in switch_fpu_prepare()... " is what you > mean, right? I mean both. switch_fpu_prepare() while the task is leaving and then switch_fpu_finish() while the task is coming back. But yes. > > will not save current FPU registers and so _not_ write to fpu->state. > > This also means that CPU's FPU register will be random (inherited from > > the last context) > > You mean, the FPU regs will have random values, yes. correct. Same like for kernel threads. > > after the context switch. This is also true for usage > > in softirq via kernel_fpu_begin(). > > So far so good. > > Except maybe because I'm dense about FPU, I still am missing something. > > You have this path: > > __fpu__restore_sig > |-> fpu__clear > |-> fpu__drop > > and that happens on the sigreturn() path. > > Now, the context switch happens ... when exactly? > > After the sigreturn is done? Is fpu__clear() correct here? If so, a context switch after setting ->initialized has been set to 1 wouldn't matter because in the end the register state is restored from init_fpstate and not from task's FPU struct. > > It must be because then you'd get that ->state corruption after > ->initialized has been cleared. > > Right? I might got your question wrong. If you quote the code and try again and I do so, too :) > > > > So. The fix would be: > > @@ -344,10 +344,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) > > sanitize_restored_xstate(tsk, &env, xfeatures, fx_only); > > } > > > > + local_bh_disable(); > > fpu->initialized = 1; > > - preempt_disable(); > > fpu__restore(fpu); > > - preempt_enable(); > > + local_bh_enable(); > > > > return err; > > } else { > > > > local_bh_disable() due to possible kernel_fpu_begin() usage in softirq. > > How much do we care here about a theoretical race on 32bit anyway? I > > don't think someone complained :) I would have to rebase my queue… > > otherwise… > > Funny, you should mention that. > > But this very much rings a bell about a very elusive bug we had on > 32-bit at the time. See attached mbox (yeah, the web archives were crap > and couldn't find the links so I'm sending you the whole thread). > > And at the time Ingo said that there's something still missing about > *why* it would happen. > > And I think it is this context switch happening right after the > sigreturn - *AFAICT* - which would cause this. > > I could very well be off but this smells very similar to your thing. So checking out v4.5-rc3-15-g58122bf1d856a and __fpu__restore_sig() is something like this: | fpu__drop(fpu); … | fpu->fpstate_active = 1; X | if (use_eager_fpu()) { | preempt_disable(); | fpu__restore(fpu); | preempt_enable(); | } fpu__drop() sets fpstate_active & fpregs_active to 0[¹]. A context switch at X would _not_ save current FPU registers and overwrite what was prepared because fpregs_active should still be zero. Now on the switch back to the task, fpstate_active was set which means fpu.preload might be true. If so it would load the FPU registers and set fpregs_active to 1. Later fpu__restore() would try the same and fpregs_activate() would trigger the warning because fpregs_active was already set to 1. > Hmmm. So I just came up with a possible hard to trigger case and a robot triggered it already a while back. Well, CONFIG_PREEMPT=y is also there so it matches this part of the story. But you connected the dots. [¹] side note: in my early research it took a while to notice that fpstate_active and fpregs_active were two different things. My brain used fp.*_active for matching. It also helped my confusion that those were renamed and removed… Sebastian