Received: by 10.223.185.116 with SMTP id b49csp757211wrg; Sat, 10 Feb 2018 19:12:09 -0800 (PST) X-Google-Smtp-Source: AH8x224UVDmzcsZi9IJ+p9+PQkamrhi1ytCThvEWpz/Mas25/LSBRUSEqW0tu5xeLXyvTpliLKqs X-Received: by 10.98.60.5 with SMTP id j5mr7609535pfa.217.1518318729235; Sat, 10 Feb 2018 19:12:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518318729; cv=none; d=google.com; s=arc-20160816; b=l2PG1Ulxowx6VVSWkPdIiFLe+kgcvdBsn7nNpV3Nfr1whwnKzZGUG6ow2WnAJfoRlB 14NDmRKCnU9nd9uwZIG6AsLWpOPhYkkvYXBVqQVPGotV6mzFFoBjRqoHOMQIXeXmEjbL VVJ3ebxxKhaoCAjTadtOpmPaJawQ6rgWeu2Q9H7Whv+K0xblAYfmwnR/F4ql9o4ZjrXy uQb/cHyrdwyDCSJjVsO8zFmmiSI0G6Kg/zS3qwLMkL6/clzVE3wQLDqGwasmgYK/J2ZV nTssAdRWqEGRGy0gv5dbGIzBS2WHhYOLMuM2PHN9DDcI3mIoRPGJYz12NyXVTkYV1KMU esiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=JwFZsO392cJ43BplM5BZELXopV61vyv6bgL0daDS0g0=; b=gz4MkEzHaqawzikjeDOR3aR7Izy+2GhC+USOUIpcED4th3lz2VUXLvAigFkfWwZkqb 4hCC2V+IT8rqwZhH5vuiCQJPHJVb2DGY5MJ4QfpRQoM6ffKHTPumWnCA9RSCW7Y6pdC9 lzJSXoXP6ffxi5FA6asBIege21T9bue8fRzCczXY1wdArc4/elMsPRKo9GWdfiv6QOc2 KhHllNXW3R1KeMiUA+oB4HR1Q53T9dPiVUxTyeAHFT7s9fc/AQoPMMy+HyHnZ+wE5ovj EPtl/AqOy96C8obvkiXH822YvZN+ned4HK/hua6UxasHoXLcX3BDmkJnfBf5dwwdoSd8 Z+HA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l26si4179375pfj.401.2018.02.10.19.11.24; Sat, 10 Feb 2018 19:12:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752197AbeBKDJr (ORCPT + 99 others); Sat, 10 Feb 2018 22:09:47 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60798 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751149AbeBKDJq (ORCPT ); Sat, 10 Feb 2018 22:09:46 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DF4138424E; Sun, 11 Feb 2018 03:09:45 +0000 (UTC) Received: from xz-mi (dhcp-14-120.nay.redhat.com [10.66.14.120]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9F9A610498C3; Sun, 11 Feb 2018 03:09:39 +0000 (UTC) Date: Sun, 11 Feb 2018 11:09:36 +0800 From: Peter Xu To: Alex Williamson Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org Subject: Re: [Qemu-devel] [RFC PATCH] vfio/pci: Add ioeventfd support Message-ID: <20180211030936.GH2783@xz-mi> References: <20180207000731.32764.95992.stgit@gimli.home> <20180209070511.GD2783@xz-mi> <20180209144541.059554bf@w520.home> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180209144541.059554bf@w520.home> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Sun, 11 Feb 2018 03:09:45 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Sun, 11 Feb 2018 03:09:45 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 09, 2018 at 02:45:41PM -0700, Alex Williamson wrote: > On Fri, 9 Feb 2018 15:05:11 +0800 > Peter Xu wrote: > > > On Tue, Feb 06, 2018 at 05:08:14PM -0700, Alex Williamson wrote: > > > > [...] > > > > > +long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset, > > > + uint64_t data, int count, int fd) > > > +{ > > > + struct pci_dev *pdev = vdev->pdev; > > > + loff_t pos = offset & VFIO_PCI_OFFSET_MASK; > > > + int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset); > > > + struct vfio_pci_ioeventfd *ioeventfd; > > > + int (*handler)(void *, void *); > > > + unsigned long val; > > > + > > > + /* Only support ioeventfds into BARs */ > > > + if (bar > VFIO_PCI_BAR5_REGION_INDEX) > > > + return -EINVAL; > > > + > > > + if (pos + count > pci_resource_len(pdev, bar)) > > > + return -EINVAL; > > > + > > > + /* Disallow ioeventfds working around MSI-X table writes */ > > > + if (bar == vdev->msix_bar && > > > + !(pos + count <= vdev->msix_offset || > > > + pos >= vdev->msix_offset + vdev->msix_size)) > > > + return -EINVAL; > > > + > > > + switch (count) { > > > + case 1: > > > + handler = &vfio_pci_ioeventfd_handler8; > > > + val = data; > > > + break; > > > + case 2: > > > + handler = &vfio_pci_ioeventfd_handler16; > > > + val = le16_to_cpu(data); > > > + break; > > > + case 4: > > > + handler = &vfio_pci_ioeventfd_handler32; > > > + val = le32_to_cpu(data); > > > + break; > > > +#ifdef iowrite64 > > > + case 8: > > > + handler = &vfio_pci_ioeventfd_handler64; > > > + val = le64_to_cpu(data); > > > + break; > > > +#endif > > > + default: > > > + return -EINVAL; > > > + } > > > + > > > + ret = vfio_pci_setup_barmap(vdev, bar); > > > + if (ret) > > > + return ret; > > > + > > > + mutex_lock(&vdev->ioeventfds_lock); > > > + > > > + list_for_each_entry(ioeventfd, &vdev->ioeventfds_list, next) { > > > + if (ioeventfd->pos == pos && ioeventfd->bar == bar && > > > + ioeventfd->data == data && ioeventfd->count == count) { > > > + if (fd == -1) { > > > + vfio_virqfd_disable(&ioeventfd->virqfd); > > > + list_del(&ioeventfd->next); > > > + kfree(ioeventfd); > > > + ret = 0; > > > + } else > > > + ret = -EEXIST; > > > + > > > + goto out_unlock; > > > + } > > > + } > > > + > > > + if (fd < 0) { > > > + ret = -ENODEV; > > > + goto out_unlock; > > > + } > > > + > > > + ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL); > > > + if (!ioeventfd) { > > > + ret = -ENOMEM; > > > + goto out_unlock; > > > + } > > > + > > > + ioeventfd->pos = pos; > > > + ioeventfd->bar = bar; > > > + ioeventfd->data = data; > > > + ioeventfd->count = count; > > > + > > > + ret = vfio_virqfd_enable(vdev->barmap[ioeventfd->bar] + ioeventfd->pos, > > > + handler, NULL, (void *)val, > > > + &ioeventfd->virqfd, fd); > > > + if (ret) { > > > + kfree(ioeventfd); > > > + goto out_unlock; > > > + } > > > + > > > + list_add(&ioeventfd->next, &vdev->ioeventfds_list); > > > > Is there a limit on how many ioeventfds that can be created? > > > > IIUC we'll create this eventfd "automatically" if a MMIO addr/data > > triggered continuously for N=10 times, then would it be safer we have > > a limitation on maximum eventfds? Or not sure whether a malicious > > guest can consume the host memory by sending: > > > > - addr1/data1, 10 times > > - addr2/data2, 10 times > > - ... > > > > To create unlimited ioeventfds? Thanks, > > Good question, it is somewhat exploitable in the guest the way it's > written, however a user process does have an open file limit and each > eventfd consumes a file handle, so unless someone is running QEMU with > unlimited file handles, there is a built-in limit. Two problems remain > though: > > First, is it still a bad idea that a user within a guest can target > this device page to consume all of the QEMU process' open file handles, > even if ultimately they're only harming themselves? What would a > reasonable cap of file descriptors for this purpose be? How would we > know which are actively used and which could be expired? Currently > only 2 are registered, the MSI-ACK address and some unknown secondary > one that's low frequency, but enough to trigger the algorithm here (and > doesn't seem harmful to let it get enabled). We could therefore > arbitrarily pick 5 as an upper limit here, maybe with a warning if the > code hits that limit. > > Second, is there still an exploit in the proposed vfio interface that a > user could re-use a single file descriptor for multiple vfio > ioeventfds. I don't know. I thought about looking to see whether a > file descriptor is re-used, but then I wondered if that might actually > be kind of a neat and potentially useful interface that a single > eventfd could trigger multiple device writes. Do you know any of possible scenario for this? Since that sounds like an interesting idea. > It looks like KVM > arbitrarily sets a limit of 1000 iobus devices, where each ioeventfd > would be such a device. Perhaps we take the same approach and pick an > arbitrary high upper limit per vfio device. What's your opinion? I did a "ulimit -n" and I got 1024 as default. So even if someone granted the QEMU process with more/unlimited file handles, we'll still have a hard limit of ~1000 vfio-ioeventfds, which sounds safe. But I agree that we can still have a even smaller limit for vfio, especially we know explicitly on how many we may need in most cases. I'll follow your choice of number because I totally have no good idea on that. While if we are going to have a limitation, I'm thinking whether it'll be good we have a WARN_ONCE when that limit is reached. Thanks, -- Peter Xu