Received: by 10.213.65.68 with SMTP id h4csp4934imn; Thu, 15 Mar 2018 07:57:51 -0700 (PDT) X-Google-Smtp-Source: AG47ELtLjtPlvqjlTGJtV+YYsqUs3B4dr9s7bnWlOHSuf9Gau5JmtIEkURkaUoZZ5cVW4GPfKD2b X-Received: by 10.101.97.139 with SMTP id c11mr7099839pgv.447.1521125871107; Thu, 15 Mar 2018 07:57:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521125871; cv=none; d=google.com; s=arc-20160816; b=qLjiev+3TSAVAVOZ45n6XyNTzNCUJOTxYEO2UcL2laRK4lb31iXXEa2Nn65dLdRiUY PkXOI8Bj9qE2Wu7bhVPB2ZblAwpzg1wceSGqD59267xZakzaqKdUo1a0y5PxumRtpCZJ /jVNdUEeRHeOlMGcCilKQx7seWGQjfWeFPW6UfZiARCyuO3xup/bKOR858Hobd1bWGHE Bt9ltUGEDMtm+aYAOUMU3sVLGdg7foYm+Nghyug+b9Tw3I0w3b6KNdpI+fO3mp+zJsXH X729rF9ciP6g3/nsVbumm7QBVGGtxQ4AtkuujNLaipu/hsD7dlTbRSQ1zsNlJcGGjNyx Eozw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:dkim-signature :arc-authentication-results; bh=mEeIsnyL2bKqfqXxjwTe7IHMHsh94PF5cFPM09Sdmu0=; b=0BrFYDKk1JKq7UyWr4kFwej5Xa7eUqv8KWHKYhiz8MYFo/rHFD0BHW+pzxVvqQnu2T UV4ALCqw0yQc3SC0QVkSBHpJJ784Z8i4qmsdJAO3rciPxeCqwwZYXVrx2CYObqfC2wfF eeWWm0gsXwCHchqxWbVgjTsX7mm89ZrkNAn9wVnHVtxjtuReq8UZ8ODXiD7zfSuwZpYc mtTgIkkqyRbr0ratCls654GZ4OXw2xgqm0XoONEJYpJvpo/QstRqO+hNXfHGM7Xg73yF gw7DHj12rKPPhAYqcMac8NrSpy00QU2e0cY8nhUfusXbE+yePUwy5rEAz/G7RQPHHWHY 9QXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@hmh.eng.br header.s=fm2 header.b=NiHMketT; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=PPSOzcpt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33-v6si4024441plk.576.2018.03.15.07.57.36; Thu, 15 Mar 2018 07:57:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@hmh.eng.br header.s=fm2 header.b=NiHMketT; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=PPSOzcpt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752667AbeCOOy2 (ORCPT + 99 others); Thu, 15 Mar 2018 10:54:28 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:47317 "EHLO out5-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752218AbeCOOy0 (ORCPT ); Thu, 15 Mar 2018 10:54:26 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id EF4AB20DFD; Thu, 15 Mar 2018 10:54:25 -0400 (EDT) Received: from frontend2 ([10.202.2.161]) by compute5.internal (MEProxy); Thu, 15 Mar 2018 10:54:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hmh.eng.br; h=cc :content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=mEeIsnyL2bKqfqXxjwTe7IHMHsh94PF5cFPM09Sdmu0=; b=NiHMketT 8qlfJab1cHpql0W0OlUdVG4/DWKr3aYQD0ci2C0z6q5i3vwl/A2eA53xVj9VA1RV PD/vA6Z8Db+W4fXCPjODEW12c3jjHXxjQbYLXdi27z8XLMjY0e+JClSoaZaXlejr NpGmXOeQxi+hcOtpnI9mH7rb7n2IAMykbbKRCJ2chbLG8U0ZP+R8Xx3l1ZFFuzJi xRtXnfOgJexPdmEU0epYD444JUzAKwuUasScqYDcaiKKdTEC7ryl4bfb188RaVDe gv9eWfOYPtTu2A1gap7bsi0+MwqY2DoCycjTPRWnzE8O1wRQgpe5e+qmv8IwiY+S lelt2eVNZaqQgw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; bh=mEeIsnyL2bKqfqXxjwTe7IHMHsh94 PF5cFPM09Sdmu0=; b=PPSOzcptDCS1npL48HxsT5qtt1AXq8ylOUBumLBta8teJ W+uayRznA01NIkrUg28fTvsWTGbjkK4JchikwJqKzOinsfhGB882MXMK7I+DfDSA h7q42WakFOuCdBBSO3EYlwmTe9MPajc24Heix7b446Nmr4zQCWIiIWDbwXZmt6XF yn2ZjHe4JfpAvMGoVHNegMr3JWcbs7NxtlVjIptsqtBUem1KRl5izoEMvNNZWkDk PoW4JHi5gVPD4r1ob4uab9WWEbayaWH7RSsVSOTdGTHXyagGyUcTTZOgZlh0Z1Rp B2dhBKYKQdgTdMtJEgdROsHB/jZW0nOPCmAIyGI7w== X-ME-Sender: Received: from khazad-dum.debian.net (unknown [201.82.128.91]) by mail.messagingengine.com (Postfix) with ESMTPA id 5ADAE2428E; Thu, 15 Mar 2018 10:54:25 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by localhost.khazad-dum.debian.net (Postfix) with ESMTP id A4114340131B; Thu, 15 Mar 2018 11:54:23 -0300 (-03) X-Virus-Scanned: Debian amavisd-new at khazad-dum.debian.net Received: from khazad-dum.debian.net ([127.0.0.1]) by localhost (khazad-dum2.khazad-dum.debian.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 7nSZIsCoRDBX; Thu, 15 Mar 2018 11:54:21 -0300 (-03) Received: by khazad-dum.debian.net (Postfix, from userid 1000) id E10AA3401311; Thu, 15 Mar 2018 11:54:21 -0300 (-03) Date: Thu, 15 Mar 2018 11:54:21 -0300 From: Henrique de Moraes Holschuh To: Borislav Petkov Cc: X86 ML , Emanuel Czirai , Ashok Raj , Tom Lendacky , LKML , Arjan Van De Ven Subject: Re: [PATCH 2/2] x86/microcode: Fix CPU synchronization routine Message-ID: <20180315145421.7n6rky4b5vaqnrxt@khazad-dum.debian.net> References: <20180314183615.17629-1-bp@alien8.de> <20180314183615.17629-2-bp@alien8.de> <20180315010014.xsedkarzrgqiunxf@khazad-dum.debian.net> <20180315010152.GE11061@pd.tnic> <20180315040132.m3i3ozykkbjrxa66@khazad-dum.debian.net> <20180315095803.GB27816@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180315095803.GB27816@pd.tnic> X-GPG-Fingerprint1: 4096R/0x0BD9E81139CB4807: C467 A717 507B BAFE D3C1 6092 0BD9 E811 39CB 4807 User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 15 Mar 2018, Borislav Petkov wrote: > it is injecting faults and attempting to manipulate some size field - > I'm guessing the encrypted data size. And I'm also guessing that if you > manipulate that size, it would simply take a lot longer to attempt to > decrypt and verify that it is broken microcode and reject it. So it is > not actually a real update - it is just taking a lot longer to reject > it. That paper measures the sucessful updates too (see below). That the fault injection tests took less cycles than the sucessful updates did, is not explicitly written in the paper. IMO, it is implied by "fig 7" and also by the text nearby and immediately after "fig 8" (looking at the html version of the paper), though. > Now, I'm talking about genuine microcode updates. And that paper also > claims that they take thousands of cycles. "Observation #4" in the paper is the sucessful non-fault-injection, i.e. regular microcode update measurement. Here's the numbers from the paper for a regular, sucessfull update on a Core i5 M460 (microcode revisions were not disclosed by the paper): Average: 488953 cycles Std. Dev: 12270 cycles. What I observed on my Xeon X5550 (signature 0x106a5, microcode update is ~10KiB in size) matches what the paper describes: the Xeon X5550 has microcode updates that are a little bigger than the ones for the Core i5 M460 (signature 0x20655, microcode update is ~3KiB in size). This is not a potshot at Intel. Their microcode update loader does a lot of work, so it would make sense for it to take a lot of cycles to do it. AFAIK, it is doing RSA-2048 decription in microcode, updating several internal subsystems as well as the MCRAM, etc. > Now let's look at your previous, hm, "statement": > > > Intel takes anything from twenty thousand cycles to several *million* > > cycles per core, proportional to microcode update size. > > So you took a fault injection measurement out of context to claim that > *regular* microcode updates take millions of cycles. I tested a *few* interations of sucessful, regular microcode updates on a Xeon X5550, and it matched the magnitude of the number of cycles given in the paper for a successful, regular microcode update. I claimed an intel microcode update would take from 20000 to millions of cycles depending on update size, based on: 1. the paper's description and measurements of how the loader does the signature validation; 2. the fact that a 10KiB update took ~800000 cycles on a Xeon X5550 core (the BSP, data I measured myself), and that a 3KiB update took ~480000 cycles in average on a Core i5 M460 (data from the inertiawar paper). Now, this might be comparing two different subspecies of oranges, so I did try to run my measurement patch at a Core i5-2500 I had access to, so as to compare *regular*, sucessfull microcode early updates using the same kernel on two different processors. I recall the results on the i5-2500 were coherent with the paper's findings and the results for the Xeon X5550. Unfortunately, I have misplaced the results for the i5-2500, or I would have already provided them. According to the notes I found, these tests were done circa august 2014, and I was not exactly considering them to be important stuff that needs to be documented and archived properly. > So you had to say something - doesn't matter if it is apples and oranges > - as long as it is dramatic. Fuck the truth. I do believe that [intel microcode updates on recent processors take hundreds of thousands of cycles up to millions of cycles in the worst case] to be the truth from both my reading of that paper, and my own simple attempts to verify the time it took for a sucessful, regular microcode update. If I am wrong about this, it will be because I measured things incorrectly, and that would also be true for the inertiawar paper as far as I can tell. I am certainly not doing anything out of malice, or trying to be dramatic. Please consider that you presently believe that an Intel microcode update [on recent processors] takes on the order of thousands of cycles per core, where I presently believe it takes at least a hundred times more cycles than that. Wouldn't that difference in beliefs, regardless of which one is correct, account for any of my comments related to microcode update cycle cost appearing to be overly dramatic to you? It was not my intention to annoy or upset you, and I had no idea we expected intel microcode updates to have cycle costs that are two to three orders of magnitude apart, so I never took that into account on any of my comments. > > When I measured my Xeon X5550 workstation doing an early update, the > > Xeon took about 1M cycles for the BSP, and 800k cycles for the APs (see > > below). > > > > To measure that, as far as I recall I just did a rdtsc right before the > > RDTSC gets executed speculatively, so you need barriers around it. I > hope you added them. I searched for, and found some old notes I took at the time. FWIW: I did not have speculation barriers before the WRMSR that does the microcode update. There was a compiler barrier() only, as in "barrier(); time1 = get_cycles(); WRMSR". But that WRMSR is supposed to be a serializing instruction, and it does seem to behave like that. I did have a compiler barrier() right after the second rdtsc, and a sync_core() fully serializing barrier a bit later in the code flow, but before I used any of the rdtsc results. I did look at the resulting object code at the time, and the rdtsc calls were at the proper place before and after the wrmsr instruction, as expected given the use of barrier() to get the compiler to behave. And the CPUID implementing sync_core() was between the rdtsc, and the code that used the result. From my notes, I did not use local_irq_disable(), since I was instrumenting the early loader. That might have skewed the results for the BSP if interrupts were already being serviced at that point. The fact that I did not have a sync_core() right after the second rdtsc might have added some systematic error, but I would not expect it to be large enough to matter since the noise was already above 10000 cycles. Sibling hyperthreads were properly skipped by the early loader as being already up-to-date. The cpu ordering on that box also ensured the other hyperthread of a core was still "offline" (in whatever state it was left by the BIOS -- no UEFI on that box) while the first was doing the microcode update: CPU 0-3 were the first hyperthreads of cores 0 to 3, and CPU 4-7 were the second hyperthreads of cores 0 to 3. -- Henrique Holschuh