Received: by 10.223.176.5 with SMTP id f5csp27435wra; Tue, 30 Jan 2018 07:35:04 -0800 (PST) X-Google-Smtp-Source: AH8x224Som8wnO4h79KkWn6Eh2k0+J5B2B0dvsnVJa05cJXeyHodydvHw2TF7bvpObpH6H46IKDD X-Received: by 2002:a17:902:274a:: with SMTP id j10-v6mr22871610plg.107.1517326503944; Tue, 30 Jan 2018 07:35:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517326503; cv=none; d=google.com; s=arc-20160816; b=B/FkX4e3h6VZQfml+/HADb17uUsLM7sFXhYLkNlTVs5B48XWrvsm3scR2wfIfSJ01y F9LUIYdWa+ZGKW8O4WXYRw21TvfgUdBAGlOieCAkRlNfI0NQnzn+ewIjg+m4yD/tF9B8 5KddDWUgwhKAuYgJfwZve90i9RZqI1rJ/A+wrVr4xgjwPExSWs+fhdmxUj+fvoYzT8uu VhwwZIiOt3ZWXuaa8lwnJaUHAUr7pZy0n+RIEQA71kuJdl5Fw56D47YLtTLk1jhIcbwW mlZnsfdhXwrIldTmJSIs64CZGHj8o5H1gkHCROyYyOlhXIVskmv8/Giixq/PrAFLFVMq UsJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject:arc-authentication-results; bh=yjG1aAJsa4ScZ0Zy5554Ywz0EzUEzgmV5BVCs/YYwQM=; b=hC6cupMJoS68I2iKFUOXUx4X02j7RKxQkoNOq+bFn5u/xwQasBHMDZtM45rYDxQzPx 8kO3cShD57di4Ieq5N8Vwhr4t7wn8hDFZoVsiMwYn6uap7ZmJHwdkZDRSKPAF87T4y6k c+Hm59A3IrtuR4U979mG5wO4r4HMJKOzvHh7dEuQWwEKx6nBvBknpSDntOeqUEizZcKQ jUhQ+ZWQLkjdd/c8m0qwe6vUJS8s4p9Q8eSHLXgogyc8g8xnJLEwG7owHBnNUH2yfWLu jxdn07zB/u1I1bInkVz0nMThEM2fdNwFNz++ZWtSXR4ENvie3EbjMSV9VYGeJFezSH4Y QtQw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u4-v6si2116782pls.458.2018.01.30.07.34.49; Tue, 30 Jan 2018 07:35:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753206AbeA3Pdp (ORCPT + 99 others); Tue, 30 Jan 2018 10:33:45 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:36170 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752645AbeA3Pdo (ORCPT ); Tue, 30 Jan 2018 10:33:44 -0500 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0UFUdU2005733 for ; Tue, 30 Jan 2018 10:33:43 -0500 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 2fttm5u42t-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 30 Jan 2018 10:33:43 -0500 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 30 Jan 2018 15:33:40 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 30 Jan 2018 15:33:32 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w0UFXWDi48365608; Tue, 30 Jan 2018 15:33:32 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5232D4C040; Tue, 30 Jan 2018 15:27:30 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 293CA4C044; Tue, 30 Jan 2018 15:27:29 +0000 (GMT) Received: from oc7330422307.ibm.com (unknown [9.145.73.56]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 30 Jan 2018 15:27:29 +0000 (GMT) Subject: Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure To: Christophe de Dinechin Cc: Linus Torvalds , David Woodhouse , Arjan van de Ven , Eduardo Habkost , KarimAllah Ahmed , Linux Kernel Mailing List , Andi Kleen , Andrea Arcangeli , Andy Lutomirski , Ashok Raj , Asit Mallick , Borislav Petkov , Dan Williams , Dave Hansen , Greg Kroah-Hartman , "H . Peter Anvin" , Ingo Molnar , Janakarajan Natarajan , Joerg Roedel , Jun Nakajima , Laura Abbott , Masami Hiramatsu , Paolo Bonzini , Peter Zijlstra , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Thomas Gleixner , Tim Chen , Tom Lendacky , KVM list , the arch/x86 maintainers , "Dr. David Alan Gilbert" References: <1516476182-5153-6-git-send-email-karahmed@amazon.de> <20180129201404.GA1588@localhost.localdomain> <1517257022.18619.30.camel@infradead.org> <20180129204256.GV25150@localhost.localdomain> <31415b7f-9c76-c102-86cd-6bf4e23e3aee@linux.intel.com> <1517259759.18619.38.camel@infradead.org> <56a33b36-5568-5d6e-a858-3b22ea335bcb@de.ibm.com> <6a2713b1-74e7-53db-527d-d77cc4394f61@de.ibm.com> <1C632B01-E0BC-4853-8CF3-4F4EDE800F8A@dinechin.org> From: Christian Borntraeger Date: Tue, 30 Jan 2018 16:33:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <1C632B01-E0BC-4853-8CF3-4F4EDE800F8A@dinechin.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 18013015-0008-0000-0000-000004C783FF X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18013015-0009-0000-0000-00001E5B1AE0 Message-Id: <6bd12fe9-9f95-f995-bc21-f292246a59c6@de.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-30_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801300193 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/30/2018 03:56 PM, Christophe de Dinechin wrote: > > >> On 30 Jan 2018, at 15:52, Christian Borntraeger wrote: >> >> >> >> On 01/30/2018 03:46 PM, Christophe de Dinechin wrote: >>> >>> >>>> On 30 Jan 2018, at 13:11, Christian Borntraeger wrote: >>>> >>>> >>>> >>>> On 01/30/2018 01:23 AM, Linus Torvalds wrote: >>>> [...] >>>>> >>>>> So I actually have a _different_ question to the virtualization >>>>> people. This includes the vmware people, but it also obviously >>>>> incldues the Amazon AWS kind of usage. >>>>> >>>>> When you're a hypervisor (whether vmware or Amazon), why do you even >>>>> end up caring about these things so much? You're protected from >>>>> meltdown thanks to the virtual environment already having separate >>>>> page tables. And the "big hammer" approach to spectre would seem to >>>>> be to just make sure the BTB and RSB are flushed at vmexit time - and >>>>> even then you might decide that you really want to just move it to >>>>> vmenter time, and only do it if the VM has changed since last time >>>>> (per CPU). >>>>> >>>>> Why do you even _care_ about the guest, and how it acts wrt Skylake? >>>>> What you should care about is not so much the guests (which do their >>>>> own thing) but protect guests from each other, no? >>>>> >>>>> So I'm a bit mystified by some of this discussion within the context >>>>> of virtual machines. I think that is separate from any measures that >>>>> the guest machine may then decide to partake in. >>>>> >>>>> If you are ever going to migrate to Skylake, I think you should just >>>>> always tell the guests that you're running on Skylake. That way the >>>>> guests will always assume the worst case situation wrt Specte. >>>>> >>>>> Maybe that mystification comes from me missing something. >>>> >>>> I can only speak for KVM, but I think the hypervisor issues come from >>>> the fact that for migration purposes the hypervisor "lies" to the guest >>>> in regard to what kind of CPU is running. (it has to lie, see below). >>>> >>>> This is to avoid random guest crashes by not announcing features. For >>>> example if you want to migrate forth and back between a system that >>>> has AVX512 and another one that has not you must tell the guest that >>>> AVX512 is not available - even if it runs on the capable system. >>>> >>>> To protect against new features the hypervisor only announces features >>>> that it understands. >>>> So you essentially start a VM in QEMU of a given CPU type that is >>>> constructed of a base cpu type plus extra features. Before migration, >>>> it is checked if he target system can run a guest of given type - >>>> otherwise migration is rejected. >>>> >>>> The management stack also knows things like baselining - basically >>>> creating the best possible guest CPU given a set of hosts. >>>> >>>> The problem now is: If you have lets say Broadwell and Skylakes. >>>> What kind of CPU type are you telling your guest? If you claim >>>> broadwell but run on skylake then you prevent that the guest can >>>> protect itself, because the guest does not know that it should do >>>> something special. If you say skylake the guest might start using >>>> features that broadwell does not understand. >>> >>> I believe that Linus’ question was whether it makes sense to defer >>> the entirety of the protection to the host kernel, although I was a bit >>> confused by his suggestion to always assume Skylake. >>> >>> In other words, is it safe enough to rely on the host kernel countermeasure >>> to protect guest kernels and their applications? In which case having >>> the guest believe it runs on Broadwell would not be that problematic. >>> >>> Aren’t there enough vmexits on the guest kernel context switch >>> to enforce protection on its behalf? Even if it’s >>> >>> a) some old kernel that without mitigation code >>> >>> or >>> >>> b) some new kernel that thinks it runs on an old CPU and disabled mitigation >>> >> I think it is not safe to just protect the host. CPU bound workload in the guest >> will switch a lot between guest user and guest kernel without triggering an >> exit. > > But that’s only if the guest does not take any page faults. Is it possible to run any > of the known approaches to spectre and meltdown without ever faulting? Sure, after you have faulted in everything you can still flush the cache without refaulting, And if you need a fault, it will be GUEST fault - no hypervisor involvment, Everything else would be too slow and is pre NPT. > If the workload is not faulting, then it’s reading only stuff it’s allowed to, isn’t it? The point is: The hypervisor will not try to fix the guest userspace against guest kernel space or other guest userspaces. This is clearly the task of the guest operating system (you are also not asking the hypervisor build a guest kpti is the guest is too old). The hypervisors task is to isolate guests against other guests and against the host. At the same time the hypervisor will try to _enable_ the guest to also protect itself.