Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4901068imm; Fri, 18 May 2018 12:38:49 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoy2WSAOwm9xp/BsU+Z0U1b5nJ71ZqZmKoBLOPWDWYMjOmR7cIZqu3cHfYI8LvVPqd2VzUl X-Received: by 2002:a62:3dc9:: with SMTP id x70-v6mr10441219pfj.85.1526672329420; Fri, 18 May 2018 12:38:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526672329; cv=none; d=google.com; s=arc-20160816; b=sAHBxSxT7Q0e+uffk2bkuqMY0tI9blxJqLjTiUMllKMseOd4s8CYlIbtmewGULDjkg kGnPnwgVGNM6QzFvy1+XEhi/IiYfeiZ8ViAXnUOsBZaUdwsizCDNz+pS6q8+CDZQrUkJ RfmKaNI+f4DMgfpritWRkV+kGIgrLxDsYRRlbJPuUh1D3njKC58UOB+26tSRv3t/vifN GpfqoTt2UKQyfpRTNnl0ztguLi5dxI45FIb1wo6KTxvppNnCLSSCmYf0obM96+6It1Tk 7RtYjscDsR1l0Ma4xEHcTPFwAPg8/9JgYwKsOgTH0xZ7M/TW5wKcGg9cGx/H2Omi2pNE F7RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:date:message-id:from :references:cc:to:subject:arc-authentication-results; bh=P36cYbRqyJ/JCDjvnmbrW/EcvHr+EYmqBYha/wt1wP0=; b=gQQ17uCzCLpqcbgDfOGE2URhr3pDozMbdCFOg42H3ZxJpBVAkPTeurIHXeX92HAmKl 1/d8muXA82EZ7YlswgomkSyVa1ywX5X4dRPrsO4toml354TEN3sbdSUfIcWKK/NhojOJ FtwbRHTthaaujw0dGlzCDpc6LutMos7lUdMv9AGgpocVNlpjrGbF/Ty75v51HkYNrPTe V2dgivYxwTjpJtUfAeZJ9v1pbu3I8YfXYZDJU42zfrbPy5NbdZ6iX4qypxq5pw40X2kn 2ytO1oELhvNlx9kQpYTXqXF9zgNDIif85UEC+Ih1DJVehewpsnFsQDkIN/iXhFdUe/YZ f1wQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g10-v6si1889327plq.523.2018.05.18.12.38.35; Fri, 18 May 2018 12:38:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752263AbeERTiK (ORCPT + 99 others); Fri, 18 May 2018 15:38:10 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:17987 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752193AbeERTiG (ORCPT ); Fri, 18 May 2018 15:38:06 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1, AES128-SHA) id ; Fri, 18 May 2018 12:38:05 -0700 Received: from HQMAIL103.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Fri, 18 May 2018 12:38:06 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Fri, 18 May 2018 12:38:06 -0700 Received: from BGMAIL102.nvidia.com (10.25.59.11) by HQMAIL103.nvidia.com (172.20.187.11) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Fri, 18 May 2018 19:38:04 +0000 Received: from [10.24.70.199] (10.24.70.199) by bgmail102.nvidia.com (10.25.59.11) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Fri, 18 May 2018 19:37:59 +0000 Subject: Re: [PATCH v4 1/2] vfio/mdev: Check globally for duplicate devices To: Alex Williamson CC: , , , References: <20180518190145.3187.7620.stgit@gimli.home> <20180518191025.3187.29141.stgit@gimli.home> X-Nvconfidentiality: public From: Kirti Wankhede Message-ID: Date: Sat, 19 May 2018 01:07:55 +0530 MIME-Version: 1.0 In-Reply-To: <20180518191025.3187.29141.stgit@gimli.home> X-Originating-IP: [10.24.70.199] X-ClientProxiedBy: DRBGMAIL104.nvidia.com (10.18.16.23) To bgmail102.nvidia.com (10.25.59.11) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/19/2018 12:40 AM, Alex Williamson wrote: > When we create an mdev device, we check for duplicates against the > parent device and return -EEXIST if found, but the mdev device > namespace is global since we'll link all devices from the bus. We do > catch this later in sysfs_do_create_link_sd() to return -EEXIST, but > with it comes a kernel warning and stack trace for trying to create > duplicate sysfs links, which makes it an undesirable response. > > Therefore we should really be looking for duplicates across all mdev > parent devices, or as implemented here, against our mdev device list. > Using mdev_list to prevent duplicates means that we can remove > mdev_parent.lock, but in order not to serialize mdev device creation > and removal globally, we add mdev_device.active which allows UUIDs to > be reserved such that we can drop the mdev_list_lock before the mdev > device is fully in place. > > Two behavioral notes; first, mdev_parent.lock had the side-effect of > serializing mdev create and remove ops per parent device. This was > an implementation detail, not an intentional guarantee provided to > the mdev vendor drivers. Vendor drivers can trivially provide this > serialization internally if necessary. Second, review comments note > the new -EAGAIN behavior when the device, and in particular the remove > attribute, becomes visible in sysfs. If a remove is triggered prior > to completion of mdev_device_create() the user will see a -EAGAIN > error. While the errno is different, receiving an error during this > period is not, the previous implementation returned -ENODEV for the > same condition. Furthermore, the consistency to the user is improved > in the case where mdev_device_remove_ops() returns error. Previously > concurrent calls to mdev_device_remove() could see the device > disappear with -ENODEV and return in the case of error. Now a user > would see -EAGAIN while the device is in this transitory state. > > Signed-off-by: Alex Williamson Looks good to me. Reviewed by: Kirti Wankhede > --- > Documentation/vfio-mediated-device.txt | 5 ++ > drivers/vfio/mdev/mdev_core.c | 102 +++++++++++--------------------- > drivers/vfio/mdev/mdev_private.h | 2 - > 3 files changed, 42 insertions(+), 67 deletions(-) > > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt > index 1b3950346532..c3f69bcaf96e 100644 > --- a/Documentation/vfio-mediated-device.txt > +++ b/Documentation/vfio-mediated-device.txt > @@ -145,6 +145,11 @@ The functions in the mdev_parent_ops structure are as follows: > * create: allocate basic resources in a driver for a mediated device > * remove: free resources in a driver when a mediated device is destroyed > > +(Note that mdev-core provides no implicit serialization of create/remove > +callbacks per mdev parent device, per mdev type, or any other categorization. > +Vendor drivers are expected to be fully asynchronous in this respect or > +provide their own internal resource protection.) > + > The callbacks in the mdev_parent_ops structure are as follows: > > * open: open callback of mediated device > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c > index 126991046eb7..0212f0ee8aea 100644 > --- a/drivers/vfio/mdev/mdev_core.c > +++ b/drivers/vfio/mdev/mdev_core.c > @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev) > } > EXPORT_SYMBOL(mdev_uuid); > > -static int _find_mdev_device(struct device *dev, void *data) > -{ > - struct mdev_device *mdev; > - > - if (!dev_is_mdev(dev)) > - return 0; > - > - mdev = to_mdev_device(dev); > - > - if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0) > - return 1; > - > - return 0; > -} > - > -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid) > -{ > - struct device *dev; > - > - dev = device_find_child(parent->dev, &uuid, _find_mdev_device); > - if (dev) { > - put_device(dev); > - return true; > - } > - > - return false; > -} > - > /* Should be called holding parent_list_lock */ > static struct mdev_parent *__find_parent_device(struct device *dev) > { > @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops) > } > > kref_init(&parent->ref); > - mutex_init(&parent->lock); > > parent->dev = dev; > parent->ops = ops; > @@ -297,6 +268,10 @@ static void mdev_device_release(struct device *dev) > { > struct mdev_device *mdev = to_mdev_device(dev); > > + mutex_lock(&mdev_list_lock); > + list_del(&mdev->next); > + mutex_unlock(&mdev_list_lock); > + > dev_dbg(&mdev->dev, "MDEV: destroying\n"); > kfree(mdev); > } > @@ -304,7 +279,7 @@ static void mdev_device_release(struct device *dev) > int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) > { > int ret; > - struct mdev_device *mdev; > + struct mdev_device *mdev, *tmp; > struct mdev_parent *parent; > struct mdev_type *type = to_mdev_type(kobj); > > @@ -312,21 +287,28 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) > if (!parent) > return -EINVAL; > > - mutex_lock(&parent->lock); > + mutex_lock(&mdev_list_lock); > > /* Check for duplicate */ > - if (mdev_device_exist(parent, uuid)) { > - ret = -EEXIST; > - goto create_err; > + list_for_each_entry(tmp, &mdev_list, next) { > + if (!uuid_le_cmp(tmp->uuid, uuid)) { > + mutex_unlock(&mdev_list_lock); > + ret = -EEXIST; > + goto mdev_fail; > + } > } > > mdev = kzalloc(sizeof(*mdev), GFP_KERNEL); > if (!mdev) { > + mutex_unlock(&mdev_list_lock); > ret = -ENOMEM; > - goto create_err; > + goto mdev_fail; > } > > memcpy(&mdev->uuid, &uuid, sizeof(uuid_le)); > + list_add(&mdev->next, &mdev_list); > + mutex_unlock(&mdev_list_lock); > + > mdev->parent = parent; > kref_init(&mdev->ref); > > @@ -338,35 +320,28 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid) > ret = device_register(&mdev->dev); > if (ret) { > put_device(&mdev->dev); > - goto create_err; > + goto mdev_fail; > } > > ret = mdev_device_create_ops(kobj, mdev); > if (ret) > - goto create_failed; > + goto create_fail; > > ret = mdev_create_sysfs_files(&mdev->dev, type); > if (ret) { > mdev_device_remove_ops(mdev, true); > - goto create_failed; > + goto create_fail; > } > > mdev->type_kobj = kobj; > + mdev->active = true; > dev_dbg(&mdev->dev, "MDEV: created\n"); > > - mutex_unlock(&parent->lock); > - > - mutex_lock(&mdev_list_lock); > - list_add(&mdev->next, &mdev_list); > - mutex_unlock(&mdev_list_lock); > - > - return ret; > + return 0; > > -create_failed: > +create_fail: > device_unregister(&mdev->dev); > - > -create_err: > - mutex_unlock(&parent->lock); > +mdev_fail: > mdev_put_parent(parent); > return ret; > } > @@ -377,44 +352,39 @@ int mdev_device_remove(struct device *dev, bool force_remove) > struct mdev_parent *parent; > struct mdev_type *type; > int ret; > - bool found = false; > > mdev = to_mdev_device(dev); > > mutex_lock(&mdev_list_lock); > list_for_each_entry(tmp, &mdev_list, next) { > - if (tmp == mdev) { > - found = true; > + if (tmp == mdev) > break; > - } > } > > - if (found) > - list_del(&mdev->next); > + if (tmp != mdev) { > + mutex_unlock(&mdev_list_lock); > + return -ENODEV; > + } > > - mutex_unlock(&mdev_list_lock); > + if (!mdev->active) { > + mutex_unlock(&mdev_list_lock); > + return -EAGAIN; > + } > > - if (!found) > - return -ENODEV; > + mdev->active = false; > + mutex_unlock(&mdev_list_lock); > > type = to_mdev_type(mdev->type_kobj); > parent = mdev->parent; > - mutex_lock(&parent->lock); > > ret = mdev_device_remove_ops(mdev, force_remove); > if (ret) { > - mutex_unlock(&parent->lock); > - > - mutex_lock(&mdev_list_lock); > - list_add(&mdev->next, &mdev_list); > - mutex_unlock(&mdev_list_lock); > - > + mdev->active = true; > return ret; > } > > mdev_remove_sysfs_files(dev, type); > device_unregister(dev); > - mutex_unlock(&parent->lock); > mdev_put_parent(parent); > > return 0; > diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h > index a9cefd70a705..b5819b7d7ef7 100644 > --- a/drivers/vfio/mdev/mdev_private.h > +++ b/drivers/vfio/mdev/mdev_private.h > @@ -20,7 +20,6 @@ struct mdev_parent { > struct device *dev; > const struct mdev_parent_ops *ops; > struct kref ref; > - struct mutex lock; > struct list_head next; > struct kset *mdev_types_kset; > struct list_head type_list; > @@ -34,6 +33,7 @@ struct mdev_device { > struct kref ref; > struct list_head next; > struct kobject *type_kobj; > + bool active; > }; > > #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev) >