Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753369AbdHOQha (ORCPT ); Tue, 15 Aug 2017 12:37:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55114 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752852AbdHOQh2 (ORCPT ); Tue, 15 Aug 2017 12:37:28 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 1A12210F2C Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=alex.williamson@redhat.com Date: Tue, 15 Aug 2017 10:37:17 -0600 From: Alex Williamson To: Robin Murphy Cc: Alexey Kardashevskiy , linuxppc-dev@lists.ozlabs.org, David Gibson , kvm-ppc@vger.kernel.org, kvm@vger.kernel.org, Yongji Xie , Eric Auger , Kyle Mahlkuch , Jike Song , Bjorn Helgaas , Joerg Roedel , Arvind Yadav , Benjamin Herrenschmidt , David Woodhouse , Kirti Wankhede , Mauricio Faria de Oliveira , Neo Jia , Paul Mackerras , Vlad Tsyrklevich , iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table Message-ID: <20170815103717.3b64e10c@w520.home> In-Reply-To: References: <20170807072548.3023-1-aik@ozlabs.ru> <8f5f7b82-3c10-7f39-b587-db4c4424f04c@ozlabs.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 15 Aug 2017 16:37:27 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3648 Lines: 64 On Mon, 14 Aug 2017 14:12:33 +0100 Robin Murphy wrote: > On 14/08/17 10:45, Alexey Kardashevskiy wrote: > > Folks, > > > > Is there anything to change besides those compiler errors and David's > > comment in 5/5? Or the while patchset is too bad? Thanks. > > While I now understand it's not the low-level thing I first thought it > was, so my reasoning has changed, personally I don't like this approach > any more than the previous one - it still smells of abusing external > APIs to pass information from one part of VFIO to another (and it has > the same conceptual problem of attributing something to interrupt > sources that is actually a property of the interrupt target). > > Taking a step back, though, why does vfio-pci perform this check in the > first place? If a malicious guest already has control of a device, any > kind of interrupt spoofing it could do by fiddling with the MSI-X > message address/data it could simply do with a DMA write anyway, so the > security argument doesn't stand up in general (sure, not all PCIe > devices may be capable of arbitrary DMA, but that seems like more of a > tenuous security-by-obscurity angle to me). Besides, with Type1 IOMMU > the fact that we've let a device be assigned at all means that this is > already a non-issue (because either the hardware provides isolation or > the user has explicitly accepted the consequences of an unsafe > configuration) - from patch #4 that's apparently the same for SPAPR TCE, > in which case it seems this flag doesn't even need to be propagated and > could simply be assumed always. > > On the other hand, if the check is not so much to mitigate malicious > guests attacking the system as to prevent dumb guests breaking > themselves (e.g. if some or all of the MSI-X capability is actually > emulated), then allowing things to sometimes go wrong on the grounds of > an irrelevant hardware feature doesn't seem correct :/ While the theoretical security provided by preventing direct access to the MSI-X vector table may be mostly a matter of obfuscation, in practice, I think it changes the problem of creating arbitrary DMA writes from a generic, trivial, PCI spec based exercise to a more device specific challenge. I do however have evidence that there are consumers of the vfio API who would have attempted to program device interrupts by directly manipulating the vector table had they not been prevented from doing so and contacting me to learn about the SET_IRQ ioctl. Therefore I think the behavior also contributes to making the overall API more difficult to use incorrectly. Of course I don't think either of those are worth imposing a performance penalty where we don't otherwise need one. However, if we look at a VM scenario where the guest is following the PCI standard for programming MSI-X interrupts (ie. not POWER), we need some mechanism to intercept those MMIO writes to the vector table and configure the host interrupt domain of the device rather than allowing the guest direct access. This is simply part of virtualizing the device to the guest. So even if the kernel allows mmap'ing the vector table, the hypervisor needs to trap it, so the mmap isn't required or used anyway. It's only when you define a non-PCI standard for your guest to program interrupts, as POWER has done, and can therefore trust that the hypervisor does not need to trap on the vector table that having that mmap'able vector table becomes fully useful. AIUI, ARM supports 64k pages too... does ARM have any strategy that would actually make it possible to make use of an mmap covering the vector table? Thanks, Alex