Received: by 10.223.176.5 with SMTP id f5csp1165432wra; Fri, 9 Feb 2018 13:46:45 -0800 (PST) X-Google-Smtp-Source: AH8x227z+LjwG9zGmQT4XRftAhp+Yg6EuFiPGBJwyBmWundhEFKqZeQ8Bq86QvMx7N2tB3JfXHvK X-Received: by 2002:a17:902:2bc5:: with SMTP id l63-v6mr3839231plb.108.1518212805496; Fri, 09 Feb 2018 13:46:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518212805; cv=none; d=google.com; s=arc-20160816; b=0aDOZj/JVJLcogiZR6fQeeotusbWqg0TccrS0FbqDLE3scTABcAHk9TuVtJDX2HC2k CPFJ7sWu4amm3A5ZDF8/sVI+R4/SdQMs74xMhcR58yG3HvI3M3f9xrOeHF/j4Ee6th8l uwO15cecV9wvJiasCU85lhA+1fKXNxAqgOXOrU1GBfG0b4K86Y1UKXz+xhB5YLZydptM i9lCgJKtroLExFF7F37q2KIzGBLhUrJhW0adJHFxhLKdEa4+CDpBbK940spD1m4MW0D2 7IvyITxqQGdy0vE/BV7WBdscJ7hHNYo1/FxcWylTyF5OmGVc6hGKM+NP74YepW7YLcZV pDjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=geIZC1jz76XeD7LLO6PEn2AdO8u/pM7AguP+C6m5PQo=; b=VABDOn2N3QsGBq3G7eEZe+0SezsZqwMvHJvdTEKrnOGJoPV8buR3cjLee8bTn7RPT8 Hp8qGXIAT7+AfPIOrzjX3UjyQZsmWs9HiykO05dN1iNWd/Snvj3hsA+XJJ1rQ0phbO4l 9b7etkBx0SBhbQLk6A3LcUGH99H3XRsONyu3yvno7o1dzcCj3dVx6/igwPNP4JGyFyR3 8S+uJ8CqMpGABJjh52SyeYrHoJj6xW319baCKiHmnQcy6Vxg81GzWxg0C9yerEiXpF4r zzmgejpbymRe1BnjibC4ATKZqh0s5dhzcT2A3gNVRDgG6O5R4rlkb55LZQ0DmEOy/cki WhxQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g11si1852044pgq.80.2018.02.09.13.46.30; Fri, 09 Feb 2018 13:46:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752946AbeBIVpq (ORCPT + 99 others); Fri, 9 Feb 2018 16:45:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40064 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752906AbeBIVpp (ORCPT ); Fri, 9 Feb 2018 16:45:45 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DFE263AD8D; Fri, 9 Feb 2018 21:45:44 +0000 (UTC) Received: from w520.home (ovpn-117-203.phx2.redhat.com [10.3.117.203]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8E8BA3818D; Fri, 9 Feb 2018 21:45:41 +0000 (UTC) Date: Fri, 9 Feb 2018 14:45:41 -0700 From: Alex Williamson To: Peter Xu Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org Subject: Re: [Qemu-devel] [RFC PATCH] vfio/pci: Add ioeventfd support Message-ID: <20180209144541.059554bf@w520.home> In-Reply-To: <20180209070511.GD2783@xz-mi> References: <20180207000731.32764.95992.stgit@gimli.home> <20180209070511.GD2783@xz-mi> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Fri, 09 Feb 2018 21:45:44 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 9 Feb 2018 15:05:11 +0800 Peter Xu wrote: > On Tue, Feb 06, 2018 at 05:08:14PM -0700, Alex Williamson wrote: > > [...] > > > +long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset, > > + uint64_t data, int count, int fd) > > +{ > > + struct pci_dev *pdev = vdev->pdev; > > + loff_t pos = offset & VFIO_PCI_OFFSET_MASK; > > + int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset); > > + struct vfio_pci_ioeventfd *ioeventfd; > > + int (*handler)(void *, void *); > > + unsigned long val; > > + > > + /* Only support ioeventfds into BARs */ > > + if (bar > VFIO_PCI_BAR5_REGION_INDEX) > > + return -EINVAL; > > + > > + if (pos + count > pci_resource_len(pdev, bar)) > > + return -EINVAL; > > + > > + /* Disallow ioeventfds working around MSI-X table writes */ > > + if (bar == vdev->msix_bar && > > + !(pos + count <= vdev->msix_offset || > > + pos >= vdev->msix_offset + vdev->msix_size)) > > + return -EINVAL; > > + > > + switch (count) { > > + case 1: > > + handler = &vfio_pci_ioeventfd_handler8; > > + val = data; > > + break; > > + case 2: > > + handler = &vfio_pci_ioeventfd_handler16; > > + val = le16_to_cpu(data); > > + break; > > + case 4: > > + handler = &vfio_pci_ioeventfd_handler32; > > + val = le32_to_cpu(data); > > + break; > > +#ifdef iowrite64 > > + case 8: > > + handler = &vfio_pci_ioeventfd_handler64; > > + val = le64_to_cpu(data); > > + break; > > +#endif > > + default: > > + return -EINVAL; > > + } > > + > > + ret = vfio_pci_setup_barmap(vdev, bar); > > + if (ret) > > + return ret; > > + > > + mutex_lock(&vdev->ioeventfds_lock); > > + > > + list_for_each_entry(ioeventfd, &vdev->ioeventfds_list, next) { > > + if (ioeventfd->pos == pos && ioeventfd->bar == bar && > > + ioeventfd->data == data && ioeventfd->count == count) { > > + if (fd == -1) { > > + vfio_virqfd_disable(&ioeventfd->virqfd); > > + list_del(&ioeventfd->next); > > + kfree(ioeventfd); > > + ret = 0; > > + } else > > + ret = -EEXIST; > > + > > + goto out_unlock; > > + } > > + } > > + > > + if (fd < 0) { > > + ret = -ENODEV; > > + goto out_unlock; > > + } > > + > > + ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL); > > + if (!ioeventfd) { > > + ret = -ENOMEM; > > + goto out_unlock; > > + } > > + > > + ioeventfd->pos = pos; > > + ioeventfd->bar = bar; > > + ioeventfd->data = data; > > + ioeventfd->count = count; > > + > > + ret = vfio_virqfd_enable(vdev->barmap[ioeventfd->bar] + ioeventfd->pos, > > + handler, NULL, (void *)val, > > + &ioeventfd->virqfd, fd); > > + if (ret) { > > + kfree(ioeventfd); > > + goto out_unlock; > > + } > > + > > + list_add(&ioeventfd->next, &vdev->ioeventfds_list); > > Is there a limit on how many ioeventfds that can be created? > > IIUC we'll create this eventfd "automatically" if a MMIO addr/data > triggered continuously for N=10 times, then would it be safer we have > a limitation on maximum eventfds? Or not sure whether a malicious > guest can consume the host memory by sending: > > - addr1/data1, 10 times > - addr2/data2, 10 times > - ... > > To create unlimited ioeventfds? Thanks, Good question, it is somewhat exploitable in the guest the way it's written, however a user process does have an open file limit and each eventfd consumes a file handle, so unless someone is running QEMU with unlimited file handles, there is a built-in limit. Two problems remain though: First, is it still a bad idea that a user within a guest can target this device page to consume all of the QEMU process' open file handles, even if ultimately they're only harming themselves? What would a reasonable cap of file descriptors for this purpose be? How would we know which are actively used and which could be expired? Currently only 2 are registered, the MSI-ACK address and some unknown secondary one that's low frequency, but enough to trigger the algorithm here (and doesn't seem harmful to let it get enabled). We could therefore arbitrarily pick 5 as an upper limit here, maybe with a warning if the code hits that limit. Second, is there still an exploit in the proposed vfio interface that a user could re-use a single file descriptor for multiple vfio ioeventfds. I don't know. I thought about looking to see whether a file descriptor is re-used, but then I wondered if that might actually be kind of a neat and potentially useful interface that a single eventfd could trigger multiple device writes. It looks like KVM arbitrarily sets a limit of 1000 iobus devices, where each ioeventfd would be such a device. Perhaps we take the same approach and pick an arbitrary high upper limit per vfio device. What's your opinion? Thanks, Alex