Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2988073imm; Thu, 17 May 2018 01:10:10 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoIZUgXdPVdMXfF7UfmSkusrOuHzUDY1L0WBG0iUF8PDxM13ncjg2l3U1vND5ldl+G22na+ X-Received: by 2002:a65:5003:: with SMTP id f3-v6mr3317601pgo.433.1526544610885; Thu, 17 May 2018 01:10:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526544610; cv=none; d=google.com; s=arc-20160816; b=THjse/bvzXUL/EcD7OHPai0VNGMZUfYexmyd9F0rLAmFpAJd8rXHl+cFYVkz4OSGkn z/lhPLjZpKPlBbF79iZDypZGFAilOsFsCD+DRqXfpncJUi+GXRLQkjFbJnVB4Dul+d4d ce79kGARKFfEllCa7P+cdbf+ef0+IsHHfZ78BXkbS3C1HWW1FAMqZWFEwvenOa7Ru97u PAAjXhLt7EcC+mInyr6l01xpq9vNKF8+xbF0WZB5zpbrlcGHigCW3ABxkzAnhwUw58vS tHq1kz2pDShN573ZqS2/kZ6H5Pkik1Gt1KHjzNuqF3862Pv4KYtSMTJYBdnVhhwrMsGR mQ0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:arc-authentication-results; bh=E76h64+nyGtMU3P+AH5SxI07CcA3uELcplWGw99mx+4=; b=xEn1Lz8602SsGTtj5Gbv4FvQwMA/PxytqeXp6Xp3r7xGFdvqtqGq7BVHcOY88vIYTW toF69zvvgtjgMQ9cmFM50pZ7KM0NxKS2S60Mc4z0XfJTyET7ELulkTf78UOT8xtauE0w 69gKqTHSISf1aDKWnJlYbGMxY/ktQcymPNYo4ciZX525jg92/gHMHblfrF7OUyAw/7Z8 i6Mb+flDDf8fcimi7LFbGY9ajm4e16n3oRl+wz6Lt1MVzfppxUasKh+0b/UUz6Yp5O1Q GxIu66MUzg1yb4HVrUuucsopn5oy/yX3lIoEfwG4NVFlQEeFd5pF2dLx31yIGGFBZZfh dSZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w14-v6si4486655plp.31.2018.05.17.01.09.56; Thu, 17 May 2018 01:10:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752176AbeEQIJi (ORCPT + 99 others); Thu, 17 May 2018 04:09:38 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55446 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750731AbeEQIJf (ORCPT ); Thu, 17 May 2018 04:09:35 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 657BFFA47A; Thu, 17 May 2018 08:09:34 +0000 (UTC) Received: from gondolin (ovpn-117-100.ams2.redhat.com [10.36.117.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 52EEB8579A; Thu, 17 May 2018 08:09:30 +0000 (UTC) Date: Thu, 17 May 2018 10:09:28 +0200 From: Cornelia Huck To: Alex Williamson Cc: kwankhede@nvidia.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Dong Jia Shi , Halil Pasic Subject: Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices Message-ID: <20180517100928.5949b11e.cohuck@redhat.com> In-Reply-To: <20180517031746.32242.17768.stgit@gimli.home> References: <20180517031746.32242.17768.stgit@gimli.home> Organization: Red Hat GmbH MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 17 May 2018 08:09:34 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 17 May 2018 08:09:34 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'cohuck@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 16 May 2018 21:30:19 -0600 Alex Williamson wrote: > When we create an mdev device, we check for duplicates against the > parent device and return -EEXIST if found, but the mdev device > namespace is global since we'll link all devices from the bus. We do > catch this later in sysfs_do_create_link_sd() to return -EEXIST, but > with it comes a kernel warning and stack trace for trying to create > duplicate sysfs links, which makes it an undesirable response. > > Therefore we should really be looking for duplicates across all mdev > parent devices, or as implemented here, against our mdev device list. > Using mdev_list to prevent duplicates means that we can remove > mdev_parent.lock, but in order not to serialize mdev device creation > and removal globally, we add mdev_device.active which allows UUIDs to > be reserved such that we can drop the mdev_list_lock before the mdev > device is fully in place. > > NB. there was never intended to be any serialization guarantee > provided by the mdev core with respect to creation and removal of mdev > devices, mdev_parent.lock provided this only as a side-effect of the > implementation for locking the namespace per parent. That > serialization is now removed. This is probably fine; but I noted that documentation on the locking conventions and serialization guarantees for mdev is a bit sparse, and this topic also came up during the vfio-ap review. We probably want to add some more concrete documentation; would the kernel doc for the _ops or vfio-mediated-device.txt be a better place for that? [Dong Jia, Halil: Can you please take a look whether vfio-ccw is really ok? I don't think we open up any new races, but I'd appreciate a second or third opinion.] > > Signed-off-by: Alex Williamson > --- > > v3: Rework locking and add a field to mdev_device so we can track > completed instances vs those added to reserve the namespace. > > drivers/vfio/mdev/mdev_core.c | 94 +++++++++++++------------------------- > drivers/vfio/mdev/mdev_private.h | 2 - > 2 files changed, 34 insertions(+), 62 deletions(-) > > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c > index 126991046eb7..55ea9d34ec69 100644 > --- a/drivers/vfio/mdev/mdev_core.c > +++ b/drivers/vfio/mdev/mdev_core.c > @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev) > } > EXPORT_SYMBOL(mdev_uuid); > > -static int _find_mdev_device(struct device *dev, void *data) > -{ > - struct mdev_device *mdev; > - > - if (!dev_is_mdev(dev)) > - return 0; > - > - mdev = to_mdev_device(dev); > - > - if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0) > - return 1; > - > - return 0; > -} > - > -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid) > -{ > - struct device *dev; > - > - dev = device_find_child(parent->dev, &uuid, _find_mdev_device); > - if (dev) { > - put_device(dev); > - return true; > - } > - > - return false; > -} > - > /* Should be called holding parent_list_lock */ > static struct mdev_parent *__find_parent_device(struct device *dev) > { > @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops) > } > > kref_init(&parent->ref); > - mutex_init(&parent->lock); > > parent->dev = dev; > parent->ops = ops; > @@ -304,7 +275,7 @@ static void mdev_device_release(struct device *dev) > int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) > { > int ret; > - struct mdev_device *mdev; > + struct mdev_device *mdev, *tmp; > struct mdev_parent *parent; > struct mdev_type *type = to_mdev_type(kobj); > > @@ -312,21 +283,26 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) > if (!parent) > return -EINVAL; > > - mutex_lock(&parent->lock); > + mutex_lock(&mdev_list_lock); > > /* Check for duplicate */ > - if (mdev_device_exist(parent, uuid)) { > - ret = -EEXIST; > - goto create_err; > + list_for_each_entry(tmp, &mdev_list, next) { > + if (!uuid_le_cmp(tmp->uuid, uuid)) { > + mutex_unlock(&mdev_list_lock); > + return -EEXIST; > + } > } > > mdev = kzalloc(sizeof(*mdev), GFP_KERNEL); > if (!mdev) { > - ret = -ENOMEM; > - goto create_err; > + mutex_unlock(&mdev_list_lock); > + return -ENOMEM; > } > > memcpy(&mdev->uuid, &uuid, sizeof(uuid_le)); > + list_add(&mdev->next, &mdev_list); > + mutex_unlock(&mdev_list_lock); > + > mdev->parent = parent; > kref_init(&mdev->ref); > > @@ -352,21 +328,18 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) > } > > mdev->type_kobj = kobj; > + mdev->active = true; > dev_dbg(&mdev->dev, "MDEV: created\n"); > > - mutex_unlock(&parent->lock); > - > - mutex_lock(&mdev_list_lock); > - list_add(&mdev->next, &mdev_list); > - mutex_unlock(&mdev_list_lock); > - > - return ret; > + return 0; > > create_failed: > device_unregister(&mdev->dev); > > create_err: > - mutex_unlock(&parent->lock); > + mutex_lock(&mdev_list_lock); > + list_del(&mdev->next); > + mutex_unlock(&mdev_list_lock); > mdev_put_parent(parent); > return ret; > } > @@ -377,44 +350,43 @@ int mdev_device_remove(struct device *dev, bool force_remove) > struct mdev_parent *parent; > struct mdev_type *type; > int ret; > - bool found = false; > > mdev = to_mdev_device(dev); > > mutex_lock(&mdev_list_lock); > list_for_each_entry(tmp, &mdev_list, next) { > - if (tmp == mdev) { > - found = true; > + if (tmp == mdev) > break; > - } > } > > - if (found) > - list_del(&mdev->next); > + if (tmp != mdev) { > + mutex_unlock(&mdev_list_lock); > + return -ENODEV; > + } > > - mutex_unlock(&mdev_list_lock); > + if (!mdev->active) { > + mutex_unlock(&mdev_list_lock); > + return -EAGAIN; > + } I'm not sure whether this is 100% watertight. Consider: - device gets registered, we have added it to the list, made it visible in sysfs and have added the remove attribute, but not yet the symlinks - userspace can access the remove attribute and trigger removal - we do an early exit here because not yet active - ??? (If there's any problem, it's a very pathological case, and I don't think anything really bad can happen. I just want to make sure we don't miss anything.) > > - if (!found) > - return -ENODEV; > + mdev->active = false; > + mutex_unlock(&mdev_list_lock); > > type = to_mdev_type(mdev->type_kobj); > parent = mdev->parent; > - mutex_lock(&parent->lock); > > ret = mdev_device_remove_ops(mdev, force_remove); > if (ret) { > - mutex_unlock(&parent->lock); > - > - mutex_lock(&mdev_list_lock); > - list_add(&mdev->next, &mdev_list); > - mutex_unlock(&mdev_list_lock); > - > + mdev->active = true; > return ret; > } > > + mutex_lock(&mdev_list_lock); > + list_del(&mdev->next); > + mutex_unlock(&mdev_list_lock); > + > mdev_remove_sysfs_files(dev, type); > device_unregister(dev); > - mutex_unlock(&parent->lock); > mdev_put_parent(parent); > > return 0; > diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h > index a9cefd70a705..b5819b7d7ef7 100644 > --- a/drivers/vfio/mdev/mdev_private.h > +++ b/drivers/vfio/mdev/mdev_private.h > @@ -20,7 +20,6 @@ struct mdev_parent { > struct device *dev; > const struct mdev_parent_ops *ops; > struct kref ref; > - struct mutex lock; > struct list_head next; > struct kset *mdev_types_kset; > struct list_head type_list; > @@ -34,6 +33,7 @@ struct mdev_device { > struct kref ref; > struct list_head next; > struct kobject *type_kobj; > + bool active; > }; > > #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev) >