Received: by 10.223.176.46 with SMTP id f43csp642098wra; Wed, 24 Jan 2018 03:51:41 -0800 (PST) X-Google-Smtp-Source: AH8x224snAsF5AeyqLK0MRB4hgdm7lMC4Vs32GT+3LpJlPWj0lIsNaSvAgZdqxUr2h4wip1h+Of3 X-Received: by 2002:a17:902:3064:: with SMTP id u91-v6mr7726646plb.421.1516794701685; Wed, 24 Jan 2018 03:51:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516794701; cv=none; d=google.com; s=arc-20160816; b=y7/TbHJK96lJ2gtU+srU/ZCC7vNSKuAkJwEoAi7CmPwelOIxqHShFFwaC67V4Kad9D dYYu0ORrQB2Rg+Wp0EcvWUwXyh33H3RMtUlO394cJGrvh0QebvpqBFrV8PqXtDQbV6rb YC/tW8G54CoC6w0HN8xg1XV+kU/y4ywDgaOb3JcIlcuFVxWgFGPhSQs818B34zqxmn6D T0O7lx7gSiZdFvuBTXJk71YItZIupUIalKvzgFYu/l6N2WbtaReQmflcO/+3iZG/dUaJ HSMN6daZZx/7ihFeGJaf0cStUhWqmGdmyoWnL8qSoxBXfFAuR2tFpfLrO5/eqTGeGt54 pMMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=XqpQdJmdRJqrZZYp1eDGD47zqeRlXRb+xDMFtUho+5Y=; b=N7zf3xc9GRz/EubB+jbxnk9AvxPtojVlalLbN2CNCceRP9ldr/OQglp5IrSEN3MJr5 DLhucN7+/BZ6wcEgUHQHDxUaR1IfIwBo8vUCoJl7BGoD5MFxj+zZGAlq2hAsHcHYOUC3 CGDqdVtMM1wntJ7ZQFJL5Xgz2pO1obo3JAcS0TSMYiY3jmM2KfVmazUOyyQy28MrlMou DFE25c1FvSo+ZJyZ0mF4sZtMzq+gHATYjH0sZCJvAnHc81uV7RHCbsboy3Vj/CgzY7UL QI/Oq5ahLIVgm0u3BBmIKdcZnPTscIAZy1kmz3cRh4dr5DvEoG0LmNcdoIr3VYEeUpzo jYSg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b3si58080pgc.330.2018.01.24.03.51.26; Wed, 24 Jan 2018 03:51:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933415AbeAXLuu (ORCPT + 99 others); Wed, 24 Jan 2018 06:50:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50336 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933261AbeAXLus (ORCPT ); Wed, 24 Jan 2018 06:50:48 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E7CEE68A9; Wed, 24 Jan 2018 11:50:47 +0000 (UTC) Received: from flask (unknown [10.43.2.80]) by smtp.corp.redhat.com (Postfix) with SMTP id 2548865617; Wed, 24 Jan 2018 11:50:36 +0000 (UTC) Received: by flask (sSMTP sendmail emulation); Wed, 24 Jan 2018 12:50:31 +0100 Date: Wed, 24 Jan 2018 12:50:31 +0100 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Martin Schwidefsky Cc: Christian Borntraeger , kvm@vger.kernel.org, Paolo Bonzini , linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Heiko Carstens , Cornelia Huck , David Hildenbrand , Greg Kroah-Hartman , Jon Masters , Marcus Meissner , Jiri Kosina Subject: Re: [PATCH 4/5] s390: define ISOLATE_BP to run tasks with modified branch prediction Message-ID: <20180124115030.GB655@flask> References: <1516712825-2917-1-git-send-email-schwidefsky@de.ibm.com> <1516712825-2917-5-git-send-email-schwidefsky@de.ibm.com> <20180123203223.GA648@flask> <20180124073605.494aceb8@mschwideX1> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180124073605.494aceb8@mschwideX1> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 24 Jan 2018 11:50:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2018-01-24 07:36+0100, Martin Schwidefsky: > On Tue, 23 Jan 2018 21:32:24 +0100 > Radim Krčmář wrote: > > > 2018-01-23 15:21+0100, Christian Borntraeger: > > > Paolo, Radim, > > > > > > this patch not only allows to isolate a userspace process, it also allows us > > > to add a new interface for KVM that would allow us to isolate a KVM guest CPU > > > to no longer being able to inject branches in any host or other guests. (while > > > at the same time QEMU and host kernel can run with full power). > > > We just have to set the TIF bit TIF_ISOLATE_BP_GUEST for the thread that runs a > > > given CPU. This would certainly be an addon patch on top of this patch at a later > > > point in time. > > > > I think that the default should be secure, so userspace will be > > breaking the isolation instead of setting it up and having just one > > place to screw up would be better -- the prctl could decide which > > isolation mode to pick. > > The prctl is one direction only. Once a task is "secured" there is no way back. Good point, I was thinking of reversing the direction and having TIF_NOT_ISOLATE_BP_GUEST prctl, but allowing tasks to subvert security would be even worse. > If we start with a default of secure then *all* tasks will run with limited > branch prediction. Right, because all of them are untrusted. What is the performance impact of BP isolation? This design seems very fragile to me -- we're forcing userspace to care about some arcane hardware implementation and isolation in the system is broken if a task running malicious code doesn't do that for any reason. > > Maybe we can change the conditions and break logical connection between > > TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST, to make a separate KVM > > interface useful. > > The thinking here is that you use TIF_ISOLATE_BP to make use space secure, > but you need to close the loophole that you can use a KVM guest to get out of > the secured mode. That is why you need to run the guest with isolated BP if > TIF_ISOLATE_BP is set. But if you want to run qemu as always and only the > KVM guest with isolataed BP you need a second bit, thus TIF_ISOLATE_GUEST_BP. I understand, I was following the misguided idea where we have reversed logic and then use just TIF_NOT_ISOLATE_GUEST_BP for sie switches. > > > Do you think something similar would be useful for other architectures as well? > > > > It goes against my idea of virtualization, but there probably are users > > that don't care about isolation and still use virtual machines ... > > I expect most architectures to have a fairly similar resolution of > > branch prediction leaks, so the idea should be easily abstractable on > > all levels. (At least x86 is.) > > Yes. > > > > In that case we should try to come up with a cross-architecture interface to enable > > > that. > > > > Makes me think of a generic VM control "prefer performance over > > security", which would also take care of future problems and let arches > > decide what is worth the code. > > VM as in virtual machine or VM as in virtual memory? Virtual machine. (But could be anywhere really, especially the kernel/user split slowed applications down for too long already. :]) > > A main drawback is that this will introduce dynamic branches to the > > code, which are going to slow down the common case to speed up a niche. > > Where would you place these additional branches? I don't quite get the idea. The BP* macros contain a branch in them -- avoidable if we only had isolated virtual machines. Thanks.