Received: by 2002:a05:7412:2a91:b0:fc:a2b0:25d7 with SMTP id u17csp9503rdh; Tue, 13 Feb 2024 08:00:06 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCV9TTNbSC2yup/Z0LGDMHgSzJieYr2YtVukOQH6D0nNVdLqwoqao9rThtufewiVjIJRld6MHcDG2AJ9JpEIXHJsUDuT9LgFgVpBD1jp7Q== X-Google-Smtp-Source: AGHT+IFeKxj+DmdwWB2xnuZV6kICujRA1362FYds3tYyKxmzz6Ip9JqUjoxU/irH92dSFA21vIyJ X-Received: by 2002:a1f:4ac4:0:b0:4c0:230c:1143 with SMTP id x187-20020a1f4ac4000000b004c0230c1143mr6352525vka.10.1707840005796; Tue, 13 Feb 2024 08:00:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707840005; cv=pass; d=google.com; s=arc-20160816; b=wKIjVJ4BHSNoluBXTcZXxvcy/vrvWRwTqi061TGgy/40N6+Gg5CR9AEl0T5G0gUy3M m20KtJm7pExv7wBChN5Vx9BjF/Qh0hVJFTuKGxs9c7Vb5gigcuROcWFyfkFgUnhlDgUU Iy/Ss94BD6WmcGiMhYGJdXJScVR4zb1bNIfDPSA1N+ECT9kv4AXnB20YGFJLin26QmIa i8wqf0SR0hCOtKBDfdWewTHNqgaF7+7pcXN0N2/CbMXri3zpGSCWJ7bMka65w4J5ix1G i8foqPziQpexIR85RbZNyt+Z5oWJ/3sCqAVHtZf850T461ytj614rqS/BWzIOEZQeWAF 9uSw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:subject:cc:to:from :date:dkim-signature; bh=1IjbId99VjF28rCFpYcRKrbAa9mxZbL3TXD2tkVtMbU=; fh=agkJvOomUr+pa5mQHySXjGYHHWOYViytpdeZTKkkhmk=; b=sRPzeKqixSVHZgsoxVi5B75cCclUlB+KSPL1muJmDIN3s/66r3QC+4hXoD70/qWPcI K+lefaQpeyDcjxgxf/ISvMd4Httqm5Ng6WweQnqm0Z6eiTqb7JLPH6StaXZt7FzN2ATM 8CytiXoVmtR0Xk0xo+thiHz0aaujsr+KGLDv0G01l3xqsMCqLCZgse/iSh1W5Q9lS/ln kOgRy7bbWcpiLOqxEJlrvpm963u4qr8VPGkHhBZCCg6IF+9J0H/hu/JBYxJBVW2YJcV8 L2yfLJ6hqHUM2tIUQahGVZzrGQspSEFNEzQcTWeBMRU5S8ghuXJE2IQBZVUYyi3ETXzu oQJg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=vENYxF3d; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-63851-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-63851-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=2; AJvYcCVhegdh6PpMeeo0LyDRv6XAunE9wCkh8FievNW+IeBr0oCu9tO6WlyANomKGBPdjL/zvT4JfI1xdH2sZDyQHtZ80TcMI3DECMO5HLiItQ== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id iy4-20020a0562140f6400b0068cb0a77fb2si3199459qvb.342.2024.02.13.08.00.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 08:00:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-63851-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=vENYxF3d; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-63851-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-63851-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 452291C23356 for ; Tue, 13 Feb 2024 16:00:04 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E23775F55B; Tue, 13 Feb 2024 15:59:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vENYxF3d" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BF9B5F840; Tue, 13 Feb 2024 15:59:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707839997; cv=none; b=YmbX18FDnub7k3mBz8zWGoVCSHsL9m8xtkXkaJRDUNuCCrO+okc9AyIccX3LD7nfleQEWl6ARFeeEopK+aBVZLwqoCwL8wxWFaYAzLUz+XL2nETN4uP9HItT0sWSj/7ifvfqQNShIve6je8seVEUk7yy9v0mfcPXtcNUdy3OBl8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707839997; c=relaxed/simple; bh=IQN/0szS2m1FsfIyJtqFxZ+lzN0l5trxoCuVY6IdmG8=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition:In-Reply-To; b=WXksbVTY7LEJXrzDuEugjUKZceatQWRHuj8OmUomQr7vrz2RPTZdR0s5+XV347vWuQSb164WKZNx/g59MXs0PU2btfVrdmTLp35mmaSNoKQMaf/YEIhkpWaLrdW04gKzn6Gbxd1XSSQphGRtiRAdHEhYMcjNKbbRqsSNciU7A6M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vENYxF3d; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5B5A3C433C7; Tue, 13 Feb 2024 15:59:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707839996; bh=IQN/0szS2m1FsfIyJtqFxZ+lzN0l5trxoCuVY6IdmG8=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=vENYxF3dKmqGYVRNgRCZ2GNoFRkAzqdVDE5ytZFSSuLzz4PdksLfNMiVfJqAg5xit 1doqBoc6kJQjOZR3DVc75ArD5max8Ox/xTRqwDINi88CjGMjY2J7JFmppxiE7Yl4on o7jXfpNhfr+0xtrwq09J5xX5tOfnzTMtGNH4851/i240HkNP051v4TIVPZSpkRlzeg x6Sv62ZCGzsvRxQuOiF/3kDLP8mxagLfnNY1IGSAu47dmN5XZTHpsrKfP3X7YnqffW 8q0Zk45o7NXwz3uNvuS8LEuzMCcKSWBYn0n1jIHLg3kBD9Gnk+td+GghOpl5qH7ZlX UrG0wbbYYb3+A== Date: Tue, 13 Feb 2024 09:59:54 -0600 From: Bjorn Helgaas To: Leon Romanovsky Cc: Kuppuswamy Sathyanarayanan , Jim Harris , Bjorn Helgaas , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Jason Gunthorpe , Alex Williamson , Pierre =?utf-8?Q?Cr=C3=A9gut?= Subject: Re: [PATCH v2 1/2] PCI/IOV: Revert "PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes" Message-ID: <20240213155954.GA1210633@bhelgaas> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240213073450.GA52640@unreal> On Tue, Feb 13, 2024 at 09:34:50AM +0200, Leon Romanovsky wrote: > On Mon, Feb 12, 2024 at 02:27:14PM -0600, Bjorn Helgaas wrote: > > On Sun, Feb 11, 2024 at 10:48:44AM +0200, Leon Romanovsky wrote: > > > On Fri, Feb 09, 2024 at 07:20:28PM -0800, Kuppuswamy Sathyanarayanan wrote: > > > > On 2/9/24 3:52 PM, Jim Harris wrote: > > > > > If an SR-IOV enabled device is held by vfio, and the device > > > > > is removed, vfio will hold device lock and notify userspace > > > > > of the removal. If userspace reads the sriov_numvfs sysfs > > > > > entry, that thread will be blocked since sriov_numvfs_show() > > > > > also tries to acquire the device lock. If that same thread > > > > > is responsible for releasing the device to vfio, it results > > > > > in a deadlock. > > > > > > > > > > The proper way to detect a change to the num_VFs value is to > > > > > listen for a sysfs event, not to add a device_lock() on the > > > > > attribute _show() in the kernel. > > > > The lock was not about detecting a change; Pierre did this: > > > > ip monitor dev ${DEVICE} | grep --line-buffered "^${id}:" | while read line; do \ > > cat ${path}/device/sriov_numvfs; \ > > > > which I assume works by listening for sysfs events. > > It is not, "ip monitor ..." listens to netlink events emitted by > netdev core and not sysfs events. Sysfs events are not involved in > this case. Thanks for correcting my hasty assumption! > > The problem was that after the event occurred, the sriov_numvfs > > read got a stale value (see https://bugzilla.kernel.org/show_bug.cgi?id=202991). > > Yes, and it is outcome of such cross-subsytem involvement, which > is racy by definition. Someone can come with even simpler example of why > locking sysfs read and write is not a good idea. > > For example, let's consider the following scenario with two CPUs and > locks on sysfs read and write: > > CPU1 CPU2 > echo 1 > ${path}/device/sriov_numvfs > context_switch -> > cat ${path}/device/sriov_numvfs > lock > return 0 > unlock > context_switch <- > lock > set 1 > unlock > > CPU1 CPU2 > echo 1 > ${path}/device/sriov_numvfs > lock > set 1 > unlock > context_switch -> > cat ${path}/device/sriov_numvfs > lock > return 1 > unlock > > So same scenario will return different values if user doesn't protect > such case with external to the kernel lock. > > But if we return back to Pierre report and if you want to provide > completely bullet proof solution to solve cross-subsystem interaction, > you will need to prohibit device probe till sriov_numvfs update is completed. > However, it is overkill for something that is not a real issue. Pierre wanted to detect the configuration change and learn the new num_vfs, which seems like a reasonable thing to do. Is there a way to do both via netlink or some other mechanism? > > So I would drop this sentence because I don't think it accurately > > reflects the reason for 35ff867b7657. > > > > > > Since you are reverting a commit that synchronizes SysFS read > > > > /write, please add some comments about why it is not an > > > > issue anymore. > > > > > > It was never an issue, the idea that sysfs read and write should be > > > serialized by kernel is not correct by definition. > > > > I think it *was* an issue. The behavior Pierre observed at was > > clearly wrong, > > I disagree with this sentence. > > > and we added 35ff867b7657 ("PCI/IOV: Serialize sysfs > > sriov_numvfs reads vs writes") to resolve it. > > > > We should try to avoid reintroducing the problem, so I think we should > > probably squash these two patches and describe it as a deadlock fix > > instead of dismissing 35ff867b7657 as being based on false premises. > > > > It would be awesome if you had time to verify that these patches also > > resolve the problem you saw, Pierre. > > They won't resolve his problem, because he is not listening to sysfs > events, but rely on something from netdev side. I guess that means that if we apply this revert, the problem Pierre reported will return. Obviously the deadlock is more important than the inconsistency Pierre observed, but from the user's point of view this will look like a regression. Maybe listening to netlink and then looking at sysfs isn't the "correct" way to do this, but I don't want to just casually break existing user code. If we do contemplate doing the revert, at the very least we should include specific details about what the user code *should* do instead, at the level of the actual commands to use instead of "ip monitor dev; cat ${path}/device/sriov_numvfs". Bjorn