Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3806598imm; Thu, 17 May 2018 15:19:15 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrfhMm+0YfKs47hgDtNLmw4lUuf6Ky0VblUUbmxOZ5rE6ZZXbp184eKNEr0O3q68fWrWtEm X-Received: by 2002:a63:6e8c:: with SMTP id j134-v6mr5339106pgc.218.1526595555050; Thu, 17 May 2018 15:19:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526595555; cv=none; d=google.com; s=arc-20160816; b=nPvlW6yq9NFqX3SzHsBM0yMnWKpaP6YFeo2ge9qCA+QjBIUZbrsD1aRrHHSUPO0j+F Ks7vc9AMfJ+Rg3ohPm0GHjq5qYsvOENiE4BKw09IVDTLMat7PRMJ4gztYeWWrT13fRq/ 29eqllarowO/aUaIfAexH9TsD+0ilj8jKZX2oHTPxHNFTwkrk2XXtm1RHkxs7YVdtRZ5 WbTqHeKUgI+PeW2LtOj56bsJZb4y58AgHXFx+sr6GAn122If9+ZAo6RZIzEu4dFt/wjy DjunDZ4AZ6Q7Vb/bPj7Fw/PcL77uNn51vDyrha/ymBqjjLijxLd6m1/YPUcg+WZsda53 XdZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=3+ts1etqYwleBCdcLmVlT3fo7/Dwo28W72IhjSEE9Lk=; b=WGh6hrImdKMpnW8v6LLXa/q+/DEzZklVzcfDV+9xZwmFfvrlp9S8VQW94bXf0EgL96 LfME1v73/9DGDpoNIL5+IyFjDa3MbUiI92A2Y8Gk3+RkT/MZSdB/Hvkdba/IdmdKx/j+ OELQzC7YhY9zBPuV7U9DnH4jqb8w4E2KbvN1CaEvVZElDpmhr7Pg0CmhnnF7qVFhkd6l dUZGP3ban9pQMTCwz6JHhBoaugS5JLSM98KFZFoEPTwjZ9TjJc8MLgLwd3RTZFDD5hKw jq4+ZYLvn4thOdSSwmeP3i7fgDAyFgAOdvjJlUB1KXTJBPYS80Q86j1IIoV9PhT8+zUB dLhA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k90-v6si6221631pfh.50.2018.05.17.15.19.01; Thu, 17 May 2018 15:19:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752236AbeEQWRx (ORCPT + 99 others); Thu, 17 May 2018 18:17:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46904 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751543AbeEQWRv (ORCPT ); Thu, 17 May 2018 18:17:51 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4831630D07AE; Thu, 17 May 2018 22:17:51 +0000 (UTC) Received: from w520.home (ovpn-116-135.phx2.redhat.com [10.3.116.135]) by smtp.corp.redhat.com (Postfix) with ESMTP id C63CB600C0; Thu, 17 May 2018 22:17:50 +0000 (UTC) Date: Thu, 17 May 2018 16:17:49 -0600 From: Alex Williamson To: Kirti Wankhede Cc: Cornelia Huck , , , Dong Jia Shi , "Halil Pasic" Subject: Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices Message-ID: <20180517161749.6833ff22@w520.home> In-Reply-To: <20180517153737.74028a17@w520.home> References: <20180517031746.32242.17768.stgit@gimli.home> <20180517100928.5949b11e.cohuck@redhat.com> <50191254-4437-8ff3-06db-8d6b2df20b65@nvidia.com> <20180517102000.50656267@w520.home> <3ab0fe44-9290-8c29-5e4c-2599ba0de373@nvidia.com> <20180517153737.74028a17@w520.home> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Thu, 17 May 2018 22:17:51 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 17 May 2018 15:37:37 -0600 Alex Williamson wrote: > On Fri, 18 May 2018 01:56:50 +0530 > Kirti Wankhede wrote: > > > On 5/17/2018 9:50 PM, Alex Williamson wrote: > > > On Thu, 17 May 2018 21:25:22 +0530 > > > Kirti Wankhede wrote: > > > > > >> On 5/17/2018 1:39 PM, Cornelia Huck wrote: > > >>> On Wed, 16 May 2018 21:30:19 -0600 > > >>> Alex Williamson wrote: > > >>> > > >>>> When we create an mdev device, we check for duplicates against the > > >>>> parent device and return -EEXIST if found, but the mdev device > > >>>> namespace is global since we'll link all devices from the bus. We do > > >>>> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but > > >>>> with it comes a kernel warning and stack trace for trying to create > > >>>> duplicate sysfs links, which makes it an undesirable response. > > >>>> > > >>>> Therefore we should really be looking for duplicates across all mdev > > >>>> parent devices, or as implemented here, against our mdev device list. > > >>>> Using mdev_list to prevent duplicates means that we can remove > > >>>> mdev_parent.lock, but in order not to serialize mdev device creation > > >>>> and removal globally, we add mdev_device.active which allows UUIDs to > > >>>> be reserved such that we can drop the mdev_list_lock before the mdev > > >>>> device is fully in place. > > >>>> > > >>>> NB. there was never intended to be any serialization guarantee > > >>>> provided by the mdev core with respect to creation and removal of mdev > > >>>> devices, mdev_parent.lock provided this only as a side-effect of the > > >>>> implementation for locking the namespace per parent. That > > >>>> serialization is now removed. > > >>> > > >> > > >> mdev_parent.lock is to serialize create and remove of that mdev device, > > >> that handles race condition that Cornelia mentioned below. > > > > > > Previously it was stated: > > > > > > On Thu, 17 May 2018 01:01:40 +0530 > > > Kirti Wankhede wrote: > > >> Here lock is not for create/remove routines of vendor driver, its about > > >> mdev device creation and device registration, which is a common code > > >> path, and so is part of mdev core module. > > > > > > So the race condition was handled previously, but as a side-effect of > > > protecting the namespace, aiui. I'm trying to state above that the > > > serialization of create/remove was never intended as a guarantee > > > provided to mdev vendor drivers. I don't see that there's a need to > > > protect "mdev device creation and device registration" beyond conflicts > > > in the UUID namespace, which is done here. Are there others? > > > > > > > Sorry not being elaborative in my earlier response to > > > > > If we can > > > show that vendor drivers handle the create/remove paths themselves, > > > perhaps we can refine the locking granularity. > > > > mdev_device_create() function does : > > - create mdev device > > - register device > > - call vendor driver->create > > - create sysfs files. > > > > mdev_device_remove() removes sysfs files, unregister device and delete > > device. > > > > There is common code in mdev_device_create() and mdev_device_remove() > > independent of what vendor driver does in its create()/remove() > > callback. Moving this code to each vendor driver to handle create/remove > > themselves doesn't make sense to me. > > I don't see where anyone is suggesting that, I'm not. > > > mdev_parent.lock here does take care of race conditions that could occur > > during mdev device creation and deletion in this common code path. > > Exactly what races in the common code path is mdev_parent.lock > preventing? mdev_device_create() calls: > > device_register() > mdev_device_create_ops() > parent->ops->create() > sysfs_create_groups() > mdev_create_sysfs_files() > sysfs_create_files() > sysfs_create_link() > sysfs_create_link() > > mdev_parent.lock is certainly not serializing all calls across the > entire kernel to device_register and sysfs_create_{groups,files,link} > so what is it protecting other than serializing parent->ops->create()? > Locks protect data, not code. The data we're protecting is the shared > mdev_list, there is no shared data once mdev_device_create() has its > mdev_device with uuid reservation placed into that list. > > > What is the urge to remove mdev_parent.lock if that handles all race > > conditions without bothering user to handle -EAGAIN? > > Can you say why -EAGAIN is undesirable? Note that the user is only > going to see this error if they attempt to remove the device in the > minuscule gap between the sysfs remove file being created and the > completion of the write to the create sysfs file. It seems like you're > asking that I decrease the locking granularity, but not too much > because mdev_parent.lock protects "things". If the -EAGAIN is really > so terrible, we can avoid it by spinning until the mdev_device is > either not found in the list or becomes active, we don't need > mdev_parent.lock to solve that, but I don't think that's the best > solution and there's no concrete statement to back -EAGAIN being a > problem. Thanks, Testing whether -EAGAIN is bad, I race the following scripts: # cat remove-it.sh #!/bin/sh UUID=802d71a3-560d-4316-999a-4c2033b7fbb2 DEV=/sys/devices/virtual/mtty/mtty/$UUID REMOVE=$DEV/remove CNT=0 ERR=0 while true; do if [ -e $REMOVE ]; then echo 1 > $REMOVE RET=$? if [ $RET -ne 0 ]; then echo "Remove returned $RET" ERR=$(( $ERR + 1 )) fi CNT=$(( $CNT + 1 )) if [ $(( $CNT % 10000 )) -eq 0 ]; then echo "$CNT/$ERR" fi fi done # cat create-it.sh #!/bin/sh UUID=802d71a3-560d-4316-999a-4c2033b7fbb2 DEV=/sys/devices/virtual/mtty/mtty/$UUID CREATE=/sys/class/mdev_bus/mtty/mdev_supported_types/mtty-1/create CNT=0 while true; do if [ ! -d $DEV ]; then echo $UUID > $CREATE RET=$? if [ $RET -ne 0 ]; then echo "Create returned $RET" fi CNT=$(( $CNT + 1 )) if [ $(( $CNT % 10000 )) -eq 0 ]; then echo $CNT fi fi done After 3 million iterations, this targeted testing is only able to hit the gap 0.0032% of the time and the sum badness is: ./remove-it.sh: line 11: echo: write error: Resource temporarily unavailable Of course if I have one thread of execution that does create then remove, it's impossible to hit this case, so the only code that can hit this is unsynchronized user code stepping on each other and -EAGAIN seems like a completely reasonable and appropriate error, imo. Thanks, Alex