Received: by 10.223.176.5 with SMTP id f5csp4408957wra; Tue, 30 Jan 2018 06:57:18 -0800 (PST) X-Google-Smtp-Source: AH8x225Yl85mW6ct6ndIuG0yx/FyFSwQLegO5dTGX8okB/rZaV8vJLHAa8pnFQHdEV1IWXuucAIg X-Received: by 2002:a17:902:32a2:: with SMTP id z31-v6mr25903924plb.345.1517324238185; Tue, 30 Jan 2018 06:57:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517324238; cv=none; d=google.com; s=arc-20160816; b=G+G7zglJgd0ZLucurvE0vSBysoylVCi6o0gU1ihAKOUDXPJIQUXh1GDxQ7LzhbJXBx e/rTibE/53WCoD1Q94iSyO19svCvyBqsDJvtgI/hbW4T5LVwZ6JKl3enE40H+KxeZvec YQjdFN/D4hhbDr4j+C6GFUJmM+zvChx2vAK3p+MqoXdYUFNF2AtE0db2TPbXX+9SOkFr Qj7EZtMS9Qs5I5NR8YiHb4O8h5eXEmTHx0LXreq7qQflco41/r8plhC93dD/tmPLSX6/ f+y7pjsMag9Wxp3zAv6DfXJ8wLl9g+RAAJCItA8Qw9ZPZLKeLMjuumdffndIHiD8bBz4 Cmvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:subject:mime-version :from:dkim-signature:arc-authentication-results; bh=Zs924vlWfUmlROiWGWkTkUF1/NBM79X0x2VZlgK/LPQ=; b=fPA/8+EesCaAd3NipxC2SbGObOmX12I/g141lYfw3d/kQW13Bx9ya8XDFR/PJAIAjO C5ongToFlcPT0l+GTM54fRVPBqMegpmrtFQQ7ivnGw6mxDRVBfGCES4NPDkkeOl4kjVi Ilx1m1ue+/J5Mpa/GnIkq5zoAze/M5cPsr2a7ylAFxFIJ7HCI49ZK3chKDZ9Zxn1w7Uz 7qTWec/EIb5jkMkNtQPPpPcFj7B/iS5V1a9Tum9AfNgQNWcaLg8dHzLeNjQKqhGHXftA gfugU86TpGyByRe7jeDPCuWP4jumPn4uYbMJMMgXSgc+Lps9vNsSvRINFNTE9vGLTudE sAyA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=T8H6QJzX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a3-v6si39988plc.552.2018.01.30.06.57.03; Tue, 30 Jan 2018 06:57:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=T8H6QJzX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752012AbeA3O4Y (ORCPT + 99 others); Tue, 30 Jan 2018 09:56:24 -0500 Received: from mail-wm0-f48.google.com ([74.125.82.48]:37465 "EHLO mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751575AbeA3O4W (ORCPT ); Tue, 30 Jan 2018 09:56:22 -0500 Received: by mail-wm0-f48.google.com with SMTP id v71so1727681wmv.2; Tue, 30 Jan 2018 06:56:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:mime-version:subject:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=Zs924vlWfUmlROiWGWkTkUF1/NBM79X0x2VZlgK/LPQ=; b=T8H6QJzXndjzjkdbylcPRl7WAbNkIEVFdN3SHUN9fnfh4cnCxESB8n+CSzp1iU7A77 xK1pQuAqx6Yh7YkYZjsCyNdLf+MdsumS0KWX/VSwtdVwb2MbCax25RDUXM3rChonu96l Wrc/qnKz8WJe7MAw4UnFLUrwr1eokCGYD1/FaJgckcWI95wTTufGRJitWwot9z7a4JFv SALAv7SN/oFjM2e0OjdKxcqZ/pZr3XUQZqi4/84RLggvOTWFruG/BmYVWlRvAKKCT223 2C7cVWP7l55YHiRuBZ23zoJO63k+R0TpbRI9ZRs5GW/rHzJqVo5s8mIx70d+ueNcE1ok NVQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:subject:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=Zs924vlWfUmlROiWGWkTkUF1/NBM79X0x2VZlgK/LPQ=; b=fy+mHoI54NPYuhJJWrs8mjGn/D7lcdaGRRBN3kL5VB3Mx/m3kJY43s862Sqnu8Fxql V2b+plXpAYWVXBHXCymlz9rzSl+CtwgGFqnmaQBY+hU1A9Hm/swYjbD+SEVuKY3OBv+J gAVlpH7FwoT2Sdcpzlhs/SckrikLmPCQD3ne2+qAJ1IvGhbelgvQReA67ab8UfDrIduE AAFk18RTPYEKQ97Q1G/6oFVWoeaBzQ2vsW+ZrL8rWh1+1WZ0JJQNuavM0PigIL3OJNXH Lo+KjeCFkXa5sqnWUEBuMhNL7KNgx2bpbWaMXTjx64K/FNUPQQJ2icm17kJuI1kto83t lUAA== X-Gm-Message-State: AKwxyteSYpyz6yoOSaJ1huNlUAoCB09afByQkjYMGlchhGggJH0Z7j/L EHxJBRFXv+Dcg7MOEJmzBQc= X-Received: by 10.28.21.73 with SMTP id 70mr19802381wmv.152.1517324180524; Tue, 30 Jan 2018 06:56:20 -0800 (PST) Received: from [192.168.77.22] (val06-1-88-182-161-34.fbx.proxad.net. [88.182.161.34]) by smtp.gmail.com with ESMTPSA id x91sm15894950wrb.77.2018.01.30.06.56.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Jan 2018 06:56:19 -0800 (PST) From: Christophe de Dinechin X-Google-Original-From: Christophe de Dinechin Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) Subject: Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure In-Reply-To: <6a2713b1-74e7-53db-527d-d77cc4394f61@de.ibm.com> Date: Tue, 30 Jan 2018 15:56:17 +0100 Cc: Christophe de Dinechin , Linus Torvalds , David Woodhouse , Arjan van de Ven , Eduardo Habkost , KarimAllah Ahmed , Linux Kernel Mailing List , Andi Kleen , Andrea Arcangeli , Andy Lutomirski , Ashok Raj , Asit Mallick , Borislav Petkov , Dan Williams , Dave Hansen , Greg Kroah-Hartman , "H . Peter Anvin" , Ingo Molnar , Janakarajan Natarajan , Joerg Roedel , Jun Nakajima , Laura Abbott , Masami Hiramatsu , Paolo Bonzini , Peter Zijlstra , =?utf-8?B?UmFkaW0gS3LEjW3DocWZ?= , Thomas Gleixner , Tim Chen , Tom Lendacky , KVM list , the arch/x86 maintainers , "Dr. David Alan Gilbert" Content-Transfer-Encoding: quoted-printable Message-Id: <1C632B01-E0BC-4853-8CF3-4F4EDE800F8A@dinechin.org> References: <1516476182-5153-6-git-send-email-karahmed@amazon.de> <20180129201404.GA1588@localhost.localdomain> <1517257022.18619.30.camel@infradead.org> <20180129204256.GV25150@localhost.localdomain> <31415b7f-9c76-c102-86cd-6bf4e23e3aee@linux.intel.com> <1517259759.18619.38.camel@infradead.org> <56a33b36-5568-5d6e-a858-3b22ea335bcb@de.ibm.com> <6a2713b1-74e7-53db-527d-d77cc4394f61@de.ibm.com> To: Christian Borntraeger X-Mailer: Apple Mail (2.3445.5.20) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On 30 Jan 2018, at 15:52, Christian Borntraeger = wrote: >=20 >=20 >=20 > On 01/30/2018 03:46 PM, Christophe de Dinechin wrote: >>=20 >>=20 >>> On 30 Jan 2018, at 13:11, Christian Borntraeger = wrote: >>>=20 >>>=20 >>>=20 >>> On 01/30/2018 01:23 AM, Linus Torvalds wrote: >>> [...] >>>>=20 >>>> So I actually have a _different_ question to the virtualization >>>> people. This includes the vmware people, but it also obviously >>>> incldues the Amazon AWS kind of usage. >>>>=20 >>>> When you're a hypervisor (whether vmware or Amazon), why do you = even >>>> end up caring about these things so much? You're protected from >>>> meltdown thanks to the virtual environment already having separate >>>> page tables. And the "big hammer" approach to spectre would seem = to >>>> be to just make sure the BTB and RSB are flushed at vmexit time - = and >>>> even then you might decide that you really want to just move it to >>>> vmenter time, and only do it if the VM has changed since last time >>>> (per CPU). >>>>=20 >>>> Why do you even _care_ about the guest, and how it acts wrt = Skylake? >>>> What you should care about is not so much the guests (which do = their >>>> own thing) but protect guests from each other, no? >>>>=20 >>>> So I'm a bit mystified by some of this discussion within the = context >>>> of virtual machines. I think that is separate from any measures = that >>>> the guest machine may then decide to partake in. >>>>=20 >>>> If you are ever going to migrate to Skylake, I think you should = just >>>> always tell the guests that you're running on Skylake. That way the >>>> guests will always assume the worst case situation wrt Specte. >>>>=20 >>>> Maybe that mystification comes from me missing something. >>>=20 >>> I can only speak for KVM, but I think the hypervisor issues come = from >>> the fact that for migration purposes the hypervisor "lies" to the = guest >>> in regard to what kind of CPU is running. (it has to lie, see = below). >>>=20 >>> This is to avoid random guest crashes by not announcing features. = For >>> example if you want to migrate forth and back between a system that >>> has AVX512 and another one that has not you must tell the guest that >>> AVX512 is not available - even if it runs on the capable system. >>>=20 >>> To protect against new features the hypervisor only announces = features >>> that it understands. >>> So you essentially start a VM in QEMU of a given CPU type that is >>> constructed of a base cpu type plus extra features. Before = migration,=20 >>> it is checked if he target system can run a guest of given type -=20= >>> otherwise migration is rejected.=20 >>>=20 >>> The management stack also knows things like baselining - basically >>> creating the best possible guest CPU given a set of hosts. >>>=20 >>> The problem now is: If you have lets say Broadwell and Skylakes. >>> What kind of CPU type are you telling your guest? If you claim >>> broadwell but run on skylake then you prevent that the guest can=20 >>> protect itself, because the guest does not know that it should do=20 >>> something special. If you say skylake the guest might start using >>> features that broadwell does not understand. >>=20 >> I believe that Linus=E2=80=99 question was whether it makes sense to = defer >> the entirety of the protection to the host kernel, although I was a = bit >> confused by his suggestion to always assume Skylake. >>=20 >> In other words, is it safe enough to rely on the host kernel = countermeasure >> to protect guest kernels and their applications? In which case having >> the guest believe it runs on Broadwell would not be that problematic. >>=20 >> Aren=E2=80=99t there enough vmexits on the guest kernel context = switch >> to enforce protection on its behalf? Even if it=E2=80=99s >>=20 >> a) some old kernel that without mitigation code >>=20 >> or >>=20 >> b) some new kernel that thinks it runs on an old CPU and disabled = mitigation >>=20 > I think it is not safe to just protect the host. CPU bound workload in = the guest > will switch a lot between guest user and guest kernel without = triggering an > exit. But that=E2=80=99s only if the guest does not take any page faults. Is = it possible to run any of the known approaches to spectre and meltdown without ever faulting? If the workload is not faulting, then it=E2=80=99s reading only stuff = it=E2=80=99s allowed to, isn=E2=80=99t it? Christophe