Received: by 10.223.176.5 with SMTP id f5csp74224wra; Tue, 30 Jan 2018 08:16:17 -0800 (PST) X-Google-Smtp-Source: AH8x226Q25fzo5AsuoxlYYQcw6//lmiWykZtrRi+e/JNseqKQGD9lcqBw0Zp8Wr3MHBxs2yczoxC X-Received: by 2002:a17:902:658f:: with SMTP id c15-v6mr25396391plk.412.1517328977046; Tue, 30 Jan 2018 08:16:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517328976; cv=none; d=google.com; s=arc-20160816; b=yAzoe9aCcPePF+vko37fgutVF5qC6fcNecRvQTNWdYni1gH41smnIIEQKbU0Sk0Uwv KBxLUegaLF9n2klX3rr3jo/MFJICrw41ZMaUhknifv8H4fasuaRhIJGtjcyAGruEb4E0 kZIZ4IDldyuImgLVU+kALkl/M7YmpVABUQWRmm+C7Qm6xG29/kwCNJjgSU/htI0Cwntr vSouhoAJEqOYAPjoosbuhUV3Cir3hSGjCFm/v3w+7Xb6ccy8mTxsqcUEWqu6uTIQya1E ZgzAKWf1xFe8oVOB7UVvBQnrB2m+f63gm58mSEKy1UqqbzcALOnWOCKMuCuMa1lm6N6m I8cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject:arc-authentication-results; bh=x+8RJBnQGX4i4Kr7PtVvILm8vJj7hAwt+6R1zTLv9nQ=; b=sDpYbsbmZjZkE5Ya9scWezjNx4fTtFwNLD6llHRy2OolIMXdDsI1uilp/FwYH3VYqG 9YbvRwHt1x0Kq+SsgBGNO/RdWlXOxfFpydG0s0U6Pa3FTCDCvamm6McuFL1iwIT3UT5D 5VMpZwnp71FXkdoM5P6rWPlPPGCh40JbF3L6o8gAZ38q53E2w2pDCB4jZlhDVYxeEBK0 TmNhYaPKVTdGcYQ/clrQywt9Q3mgIwRBw28mNukDPowuAzsVo7LdaXzYOlkqs9BD6s55 CKRPl8nDNJsQYmFo6Slo3aOAfYX9v4u5cD/6tu1lgoNsVyXmdgxzpsnx6Lcwp2SQ6VV0 Hn9w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j71si1537127pgd.138.2018.01.30.08.16.00; Tue, 30 Jan 2018 08:16:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752062AbeA3Ow0 (ORCPT + 99 others); Tue, 30 Jan 2018 09:52:26 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:55546 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751668AbeA3OwX (ORCPT ); Tue, 30 Jan 2018 09:52:23 -0500 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0UEmJvr031681 for ; Tue, 30 Jan 2018 09:52:22 -0500 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 2ftt5m2a9c-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 30 Jan 2018 09:52:22 -0500 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 30 Jan 2018 14:52:18 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 30 Jan 2018 14:52:10 -0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w0UEqAaa45613264; Tue, 30 Jan 2018 14:52:10 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0EB6D4203F; Tue, 30 Jan 2018 14:45:18 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2AE1442045; Tue, 30 Jan 2018 14:45:17 +0000 (GMT) Received: from oc7330422307.ibm.com (unknown [9.152.224.107]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 30 Jan 2018 14:45:17 +0000 (GMT) Subject: Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure To: Christophe de Dinechin Cc: Linus Torvalds , David Woodhouse , Arjan van de Ven , Eduardo Habkost , KarimAllah Ahmed , Linux Kernel Mailing List , Andi Kleen , Andrea Arcangeli , Andy Lutomirski , Ashok Raj , Asit Mallick , Borislav Petkov , Dan Williams , Dave Hansen , Greg Kroah-Hartman , "H . Peter Anvin" , Ingo Molnar , Janakarajan Natarajan , Joerg Roedel , Jun Nakajima , Laura Abbott , Masami Hiramatsu , Paolo Bonzini , Peter Zijlstra , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Thomas Gleixner , Tim Chen , Tom Lendacky , KVM list , the arch/x86 maintainers , "Dr. David Alan Gilbert" References: <1516476182-5153-6-git-send-email-karahmed@amazon.de> <20180129201404.GA1588@localhost.localdomain> <1517257022.18619.30.camel@infradead.org> <20180129204256.GV25150@localhost.localdomain> <31415b7f-9c76-c102-86cd-6bf4e23e3aee@linux.intel.com> <1517259759.18619.38.camel@infradead.org> <56a33b36-5568-5d6e-a858-3b22ea335bcb@de.ibm.com> From: Christian Borntraeger Date: Tue, 30 Jan 2018 15:52:09 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 18013014-0012-0000-0000-000005A97158 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18013014-0013-0000-0000-000019250F48 Message-Id: <6a2713b1-74e7-53db-527d-d77cc4394f61@de.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-30_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801300187 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/30/2018 03:46 PM, Christophe de Dinechin wrote: > > >> On 30 Jan 2018, at 13:11, Christian Borntraeger wrote: >> >> >> >> On 01/30/2018 01:23 AM, Linus Torvalds wrote: >> [...] >>> >>> So I actually have a _different_ question to the virtualization >>> people. This includes the vmware people, but it also obviously >>> incldues the Amazon AWS kind of usage. >>> >>> When you're a hypervisor (whether vmware or Amazon), why do you even >>> end up caring about these things so much? You're protected from >>> meltdown thanks to the virtual environment already having separate >>> page tables. And the "big hammer" approach to spectre would seem to >>> be to just make sure the BTB and RSB are flushed at vmexit time - and >>> even then you might decide that you really want to just move it to >>> vmenter time, and only do it if the VM has changed since last time >>> (per CPU). >>> >>> Why do you even _care_ about the guest, and how it acts wrt Skylake? >>> What you should care about is not so much the guests (which do their >>> own thing) but protect guests from each other, no? >>> >>> So I'm a bit mystified by some of this discussion within the context >>> of virtual machines. I think that is separate from any measures that >>> the guest machine may then decide to partake in. >>> >>> If you are ever going to migrate to Skylake, I think you should just >>> always tell the guests that you're running on Skylake. That way the >>> guests will always assume the worst case situation wrt Specte. >>> >>> Maybe that mystification comes from me missing something. >> >> I can only speak for KVM, but I think the hypervisor issues come from >> the fact that for migration purposes the hypervisor "lies" to the guest >> in regard to what kind of CPU is running. (it has to lie, see below). >> >> This is to avoid random guest crashes by not announcing features. For >> example if you want to migrate forth and back between a system that >> has AVX512 and another one that has not you must tell the guest that >> AVX512 is not available - even if it runs on the capable system. >> >> To protect against new features the hypervisor only announces features >> that it understands. >> So you essentially start a VM in QEMU of a given CPU type that is >> constructed of a base cpu type plus extra features. Before migration, >> it is checked if he target system can run a guest of given type - >> otherwise migration is rejected. >> >> The management stack also knows things like baselining - basically >> creating the best possible guest CPU given a set of hosts. >> >> The problem now is: If you have lets say Broadwell and Skylakes. >> What kind of CPU type are you telling your guest? If you claim >> broadwell but run on skylake then you prevent that the guest can >> protect itself, because the guest does not know that it should do >> something special. If you say skylake the guest might start using >> features that broadwell does not understand. > > I believe that Linus’ question was whether it makes sense to defer > the entirety of the protection to the host kernel, although I was a bit > confused by his suggestion to always assume Skylake. > > In other words, is it safe enough to rely on the host kernel countermeasure > to protect guest kernels and their applications? In which case having > the guest believe it runs on Broadwell would not be that problematic. > > Aren’t there enough vmexits on the guest kernel context switch > to enforce protection on its behalf? Even if it’s > > a) some old kernel that without mitigation code > > or > > b) some new kernel that thinks it runs on an old CPU and disabled mitigation > I think it is not safe to just protect the host. CPU bound workload in the guest will switch a lot between guest user and guest kernel without triggering an exit.