Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5621007yba; Mon, 13 May 2019 14:11:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqyvRZAWi2mWhokKRgun8vZ7eWcab2Syo7Oa0QrKoFrT+1AunGmT6cRcOjMrjkN5L6saRBYT X-Received: by 2002:a63:7413:: with SMTP id p19mr31823579pgc.259.1557781891927; Mon, 13 May 2019 14:11:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557781891; cv=none; d=google.com; s=arc-20160816; b=Jp6S6aop+tDE5eRRKTHaBy0Pkc1MMPznFbvwcDvQqm2QItb7jS3lSNxwnK8k9TA2Ue R/+XANs6+5cLt/whWpT4sTekC9qjv2W137EylhN7MKkG/9xqpLnK1lZp6GeOlvFSecg6 HpYZ2bHLviay+tFDwnxIquBtdCVxpOigaA3dFcTMH6Cz6ny5GlDM850PVsiYJ4ypuvSs UC5yLYQQ5IjCRzWKHu8lDQa05PHtaHIa3XrGQnBE0SbcbAgn5oh1qs5uGfoPwvQtpzRn UF7Lk8v4+f24dMsbHPyK9cePfAcRRN3vUZ9DXol9Ot25tYh+BhkqZuYqg58fMnPRi0R/ sBxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=etxC2fIO+k7WqJJbgmpaQKX6DDfUoMBtn3qTvkFV+tE=; b=PnaOtARcqaEwpNJOgy5//b5zHfwR4Xn2/SmEbBqqtrSNDVudz1cc9F7N1Leg9fJV2p BF6bPCH7JmSIwt0MpVkiFnVQ4N8GwoykpwmQoZ8EuugP75ELkfH6u5XGKh9NUDhHhr1F bNJvWN9CrZFtvMxVVhQy+elCHrLksAPGMvnC9gg29Zy7NsWZ699s4ibVV3nSWs30RyBE LP0Hdytzk6qMkUaUPwwiH4U/YlyIWHomYMoGbJg703gQdkXhzruJsOF4DlTvMRou+dc9 JxXDtFsdGGbMuBYbfsSlnTyx3ETGCXS8RciS8lcuPOfZk/Lt4MZgrl8eAQXdkIjWQ8S0 ZpSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=FSfBXyoh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id go17si18016171plb.357.2019.05.13.14.11.15; Mon, 13 May 2019 14:11:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=FSfBXyoh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726403AbfEMVJd (ORCPT + 99 others); Mon, 13 May 2019 17:09:33 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:50704 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726272AbfEMVJc (ORCPT ); Mon, 13 May 2019 17:09:32 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x4DL3koI146771; Mon, 13 May 2019 21:08:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2018-07-02; bh=etxC2fIO+k7WqJJbgmpaQKX6DDfUoMBtn3qTvkFV+tE=; b=FSfBXyohrOkcDcJUQjU9IbWERTb6TScchNOdnSNATyPYzF7SZ0/RAhT0ZrdjyrPMnhQj h4RPVcu565duvAFebok/LJcd3c+xOgFjoqRauIXFjbEJdhkRIGze08wA7ECcSvhk9i93 E6X7epEf32cPFEeU2/FPfvjsJ9PLMW1Sy5Pn/KT6EkP/xN5fE+ebLUMdOpjI3KJ6CHpK TmS0Hmd3gx43vcCfYcH8WTla1Bup+aUHzerBHUIbKaD8IgaIATMHWplVrL8bftDDajWD cNf3OemKPk7HniAOw7L0LkBrGsprAuU21bairPt9pBdhGlGDr+TYe4lgM+ZDHyPjmbeR nA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2130.oracle.com with ESMTP id 2sdkwdj031-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 13 May 2019 21:08:42 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x4DL7aVk089303; Mon, 13 May 2019 21:08:42 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3020.oracle.com with ESMTP id 2se0tvskr8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 13 May 2019 21:08:42 +0000 Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x4DL8TlR001785; Mon, 13 May 2019 21:08:36 GMT Received: from [192.168.14.112] (/79.180.238.224) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 13 May 2019 14:08:29 -0700 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.1 \(3445.4.7\)) Subject: Re: [RFC KVM 00/27] KVM Address Space Isolation From: Liran Alon In-Reply-To: Date: Tue, 14 May 2019 00:08:23 +0300 Cc: Alexandre Chartre , Paolo Bonzini , Radim Krcmar , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Dave Hansen , Peter Zijlstra , kvm list , X86 ML , Linux-MM , LKML , Konrad Rzeszutek Wilk , jan.setjeeilers@oracle.com, Jonathan Adams Content-Transfer-Encoding: quoted-printable Message-Id: References: <1557758315-12667-1-git-send-email-alexandre.chartre@oracle.com> To: Andy Lutomirski X-Mailer: Apple Mail (2.3445.4.7) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9256 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1905130141 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9256 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1905130140 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On 13 May 2019, at 21:17, Andy Lutomirski wrote: >=20 >> I expect that the KVM address space can eventually be expanded to = include >> the ioctl syscall entries. By doing so, and also adding the KVM page = table >> to the process userland page table (which should be safe to do = because the >> KVM address space doesn't have any secret), we could potentially = handle the >> KVM ioctl without having to switch to the kernel pagetable (thus = effectively >> eliminating KPTI for KVM). Then the only overhead would be if a = VM-Exit has >> to be handled using the full kernel address space. >>=20 >=20 > In the hopefully common case where a VM exits and then gets re-entered > without needing to load full page tables, what code actually runs? > I'm trying to understand when the optimization of not switching is > actually useful. >=20 > Allowing ioctl() without switching to kernel tables sounds... > extremely complicated. It also makes the dubious assumption that user > memory contains no secrets. Let me attempt to clarify what we were thinking when creating this patch = series: 1) It is never safe to execute one hyperthread inside guest while it=E2=80= =99s sibling hyperthread runs in a virtual address space which contains = secrets of host or other guests. This is because we assume that using some speculative gadget (such as = half-Spectrev2 gadget), it will be possible to populate *some* CPU core = resource which could then be *somehow* leaked by the hyperthread running = inside guest. In case of L1TF, this would be data populated to the L1D = cache. 2) Because of (1), every time a hyperthread runs inside host kernel, we = must make sure it=E2=80=99s sibling is not running inside guest. i.e. We = must kick the sibling hyperthread outside of guest using IPI. 3) =46rom (2), we should have theoretically deduced that for every = #VMExit, there is a need to kick the sibling hyperthread also outside of = guest until the #VMExit is completed. Such a patch series was = implemented at some point but it had (obviously) significant performance = hit. 4) The main goal of this patch series is to preserve (2), but to avoid = the overhead specified in (3). The way this patch series achieves (4) is by observing that during the = run of a VM, most #VMExits can be handled rather quickly and locally = inside KVM and doesn=E2=80=99t need to reference any data that is not = relevant to this VM or KVM code. Therefore, if we will run these = #VMExits in an isolated virtual address space (i.e. KVM isolated address = space), there is no need to kick the sibling hyperthread from guest = while these #VMExits handlers run. The hope is that the very vast majority of #VMExit handlers will be able = to completely run without requiring to switch to full address space. = Therefore, avoiding the performance hit of (2). However, for the very few #VMExits that does require to run in full = kernel address space, we must first kick the sibling hyperthread outside = of guest and only then switch to full kernel address space and only once = all hyperthreads return to KVM address space, then allow then to enter = into guest. =46rom this reason, I think the above paragraph (that was added to my = original cover letter) is incorrect. I believe that we should by design treat all exits to userspace VMM = (e.g. QEMU) as slow-path that should not be optimised and therefore ok = to switch address space (and therefore also kick sibling hyperthread). = Similarly, all IOCTLs handlers are also slow-path and therefore it = should be ok for them to also not run in KVM isolated address space. -Liran