Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp889232rwb; Thu, 12 Jan 2023 13:50:18 -0800 (PST) X-Google-Smtp-Source: AMrXdXutqTxL8OlPaQNHXBNETaHxFf7yzl7qHem5O4ZuTdfoL4oqrAuriX6Of5kGxro+WMjNfKCI X-Received: by 2002:a17:907:a789:b0:7c0:bbab:22e5 with SMTP id vx9-20020a170907a78900b007c0bbab22e5mr71139529ejc.16.1673560217969; Thu, 12 Jan 2023 13:50:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673560217; cv=none; d=google.com; s=arc-20160816; b=sJM9CIU3B0Oymd9bK7x07+JnxUBpPlTiTmR6fRUA+1el2AcE55vxy97PNDwzPDoZ/j XEZANNUh6amMASjg7fmwMEzQi7F6DyR1KrzPe2zYr3MpIiv+yKcAYZdfD7DQOGFXypH8 fgoIF9QqBlTXbbIDl6fgJq7rLh8JNBV/7QGzUJFm2Iz3fRCFRJxp950R819Zo5RneHHT gV20P3nsfQD0i7EwBwexJtiiQxgcfZp/Ni/4mzbgRKSKbvdhstUF9IctODExTkwRKK1d NeQlUCW8UOzvyyjOS915ItBMNJEkGwyFKtSLaOA8QkXVKr5m7YR9Dq31vtTc2d5VOMjC irwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=h6f/xkDFcthJcSJTvgCInVgMfaXKqM0yPFFK06gOaJY=; b=CsbFA4FxWvwJErGv6eiRkjwGE5DIfYkhBLzD3GguHh4m7ANR+FkwHWvxarn0zABCkz Gfq7iWRmDZH/II62d92MNNk/49gX933RIDfpE5A3NV+FUPMIWuuDmTI6sYZJtBDFDRgW QQAmnMhK+eVpDXM3k+L8a49CI4JAFIOmWIZrElHB51IlFhwJ4m8LHjALAAdAvUbZCfQz Et64kcoQVLI4NDGpW/6oI4IhqKfYYaSZOc1WXnA657Cp8p8kYqP4vg7TlpdkVxIJvCEk LNGLl1+Elo/p7Eyseu7yRJ3gnoMZdKxEEn7eqk5bIRfq3ZLjfGmexaxXQp8Gv3ugxrrb RUQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PycuaJQb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ds19-20020a170907725300b00860c51f7de8si4809306ejc.217.2023.01.12.13.50.05; Thu, 12 Jan 2023 13:50:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PycuaJQb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240218AbjALV0v (ORCPT + 50 others); Thu, 12 Jan 2023 16:26:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231748AbjALV0L (ORCPT ); Thu, 12 Jan 2023 16:26:11 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A279682F58 for ; Thu, 12 Jan 2023 13:05:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673557521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h6f/xkDFcthJcSJTvgCInVgMfaXKqM0yPFFK06gOaJY=; b=PycuaJQbxERzObE4747hpGlSSt/IhF3CIX2INuu5ZT5AsCEYJhwR8egmVsnpAqdYBChtj3 TVYaTbI6F1/qZRkBbq28YKdvDHPVIVFH7jdYgN0bPP37A7iZ9i2nxfC664j+dQVNjUQfhT Fg2BYoNiIKtfCevRuthmmhz0Y4QCoDU= Received: from mail-il1-f200.google.com (mail-il1-f200.google.com [209.85.166.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-453-xFEl_BoMN7CBLEzUee4WgA-1; Thu, 12 Jan 2023 16:05:20 -0500 X-MC-Unique: xFEl_BoMN7CBLEzUee4WgA-1 Received: by mail-il1-f200.google.com with SMTP id n15-20020a056e021baf00b0030387c2e1d3so14303613ili.5 for ; Thu, 12 Jan 2023 13:05:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h6f/xkDFcthJcSJTvgCInVgMfaXKqM0yPFFK06gOaJY=; b=o0uXZyWJj3giNhccjVFR8Ae1s54RHmNBhVKKmQ/UOPXpR+nO1Lwu5M+nk7I91gmq+r ITzmfPRAn6VuMBu7efFJMP6XHznXqFCRs1FClGsqyDjdy85iCuV8vbQCq7MezcLpiXJH 8kYnFBzTOHcWHoiYg71/wUuX9bc+0lUQM8zNCn/h3YUfcME4tiBrpoZNsCNPs9zmWFza SfZW856p+6gjaex5nNlmBz7pHanfl/7VzVNtosNlGed46cNY5IeR7hw6lw+AI6i2jNCQ cGjBZ/cxYPIBc82BVbb8HGgmkw3o955/RnOYQ8/VHXGK/xYUA5JA4V0qmzHp564JGwAl RDPQ== X-Gm-Message-State: AFqh2koeFWlwQRSisSkOndedx3FciYBJGYcT4MB/FynnkeK2ceH25Ep8 jrJjQXOHovjcPeC/e5p+fmMSlx2eVodBJor7ht/u57QqIqTomULUiftiCSsSiTk4/nq+KiXKdIP HI6yvAcSZVqv2Cf10kHx3cJlA X-Received: by 2002:a05:6e02:1d88:b0:30d:7cf4:5d5c with SMTP id h8-20020a056e021d8800b0030d7cf45d5cmr31649872ila.31.1673557519888; Thu, 12 Jan 2023 13:05:19 -0800 (PST) X-Received: by 2002:a05:6e02:1d88:b0:30d:7cf4:5d5c with SMTP id h8-20020a056e021d8800b0030d7cf45d5cmr31649859ila.31.1673557519587; Thu, 12 Jan 2023 13:05:19 -0800 (PST) Received: from redhat.com ([38.15.36.239]) by smtp.gmail.com with ESMTPSA id a17-20020a92d591000000b0030be6c79645sm5428334iln.68.2023.01.12.13.05.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Jan 2023 13:05:19 -0800 (PST) Date: Thu, 12 Jan 2023 14:05:17 -0700 From: Alex Williamson To: Matthew Rosato Cc: pbonzini@redhat.com, jgg@nvidia.com, cohuck@redhat.com, farman@linux.ibm.com, pmorel@linux.ibm.com, borntraeger@linux.ibm.com, frankja@linux.ibm.com, imbrenda@linux.ibm.com, david@redhat.com, akrowiak@linux.ibm.com, jjherne@linux.ibm.com, pasic@linux.ibm.com, zhenyuw@linux.intel.com, zhi.a.wang@intel.com, seanjc@google.com, linux-s390@vger.kernel.org, kvm@vger.kernel.org, intel-gvt-dev@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] vfio: fix potential deadlock on vfio group lock Message-ID: <20230112140517.6db5b346.alex.williamson@redhat.com> In-Reply-To: <20230112203844.41179-1-mjrosato@linux.ibm.com> References: <20230112203844.41179-1-mjrosato@linux.ibm.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.35; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 12 Jan 2023 15:38:44 -0500 Matthew Rosato wrote: > Currently it is possible that the final put of a KVM reference comes from > vfio during its device close operation. This occurs while the vfio group > lock is held; however, if the vfio device is still in the kvm device list, > then the following call chain could result in a deadlock: > > kvm_put_kvm > -> kvm_destroy_vm > -> kvm_destroy_devices > -> kvm_vfio_destroy > -> kvm_vfio_file_set_kvm > -> vfio_file_set_kvm > -> group->group_lock/group_rwsem > > Avoid this scenario by having vfio core code acquire a KVM reference > the first time a device is opened and hold that reference until the > device fd is closed, at a point after the group lock has been released. > > Fixes: 421cfe6596f6 ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM") > Reported-by: Alex Williamson > Signed-off-by: Matthew Rosato > --- > Changes from v1: > * Re-write using symbol get logic to get kvm ref during first device > open, release the ref during device fd close after group lock is > released > * Drop kvm get/put changes to drivers; now that vfio core holds a > kvm ref until sometime after the device_close op is called, it > should be fine for drivers to get and put their own references to it. > --- > drivers/vfio/group.c | 6 ++--- > drivers/vfio/vfio_main.c | 48 +++++++++++++++++++++++++++++++++++++--- > include/linux/vfio.h | 1 - > 3 files changed, 48 insertions(+), 7 deletions(-) > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c > index bb24b2f0271e..2b0da82f82f4 100644 > --- a/drivers/vfio/group.c > +++ b/drivers/vfio/group.c > @@ -165,9 +165,9 @@ static int vfio_device_group_open(struct vfio_device *device) > } > > /* > - * Here we pass the KVM pointer with the group under the lock. If the > - * device driver will use it, it must obtain a reference and release it > - * during close_device. > + * Here we pass the KVM pointer with the group under the lock. A > + * reference will be obtained the first time the device is opened and > + * will be held until the device fd is closed. > */ > ret = vfio_device_open(device, device->group->iommufd, > device->group->kvm); > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index 5177bb061b17..c969e2a0ecd3 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -344,6 +345,35 @@ static bool vfio_assert_device_open(struct vfio_device *device) > return !WARN_ON_ONCE(!READ_ONCE(device->open_count)); > } > > +static bool vfio_kvm_get_kvm_safe(struct kvm *kvm) > +{ > + bool (*fn)(struct kvm *kvm); > + bool ret; > + > + fn = symbol_get(kvm_get_kvm_safe); > + if (WARN_ON(!fn)) > + return false; > + > + ret = fn(kvm); > + > + symbol_put(kvm_get_kvm_safe); > + > + return ret; > +} > + > +static void vfio_kvm_put_kvm(struct kvm *kvm) > +{ > + void (*fn)(struct kvm *kvm); > + > + fn = symbol_get(kvm_put_kvm); > + if (WARN_ON(!fn)) > + return; > + > + fn(kvm); > + > + symbol_put(kvm_put_kvm); > +} > + > static int vfio_device_first_open(struct vfio_device *device, > struct iommufd_ctx *iommufd, struct kvm *kvm) > { > @@ -361,16 +391,24 @@ static int vfio_device_first_open(struct vfio_device *device, > if (ret) > goto err_module_put; > > + if (kvm && !vfio_kvm_get_kvm_safe(kvm)) { > + ret = -ENOENT; > + goto err_unuse_iommu; > + } > device->kvm = kvm; This could just as easily be: if (kvm && vfio_kvm_get_kvm_safe(kvm)) device->kvm = kvm; Right? The error path would test device->kvm and we already use device->kvm in the release path. Otherwise, in the off chance userspace hits this error, what's the value in generating a failure here for a device that may or may not have a kvm dependency. A driver with a dependency should fail if device->kvm is NULL. > if (device->ops->open_device) { > ret = device->ops->open_device(device); > if (ret) > - goto err_unuse_iommu; > + goto err_put_kvm; > } > return 0; > > +err_put_kvm: > + if (kvm) { > + vfio_kvm_put_kvm(kvm); > + device->kvm = NULL; > + } > err_unuse_iommu: > - device->kvm = NULL; > if (iommufd) > vfio_iommufd_unbind(device); > else > @@ -387,7 +425,6 @@ static void vfio_device_last_close(struct vfio_device *device, > > if (device->ops->close_device) > device->ops->close_device(device); > - device->kvm = NULL; > if (iommufd) > vfio_iommufd_unbind(device); > else > @@ -465,6 +502,11 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep) > > vfio_device_group_close(device); > > + if (device->open_count == 0 && device->kvm) { > + vfio_kvm_put_kvm(device->kvm); > + device->kvm = NULL; > + } IIUC, device->open_count is protected by device->dev_set->lock. Thanks, Alex > + > vfio_device_put_registration(device); > > return 0; > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index 35be78e9ae57..3ff7e9302cc1 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -46,7 +46,6 @@ struct vfio_device { > struct vfio_device_set *dev_set; > struct list_head dev_set_list; > unsigned int migration_flags; > - /* Driver must reference the kvm during open_device or never touch it */ > struct kvm *kvm; > > /* Members below here are private, not for driver use */