Received: by 2002:a05:7412:cfc7:b0:fc:a2b0:25d7 with SMTP id by7csp2191672rdb; Tue, 20 Feb 2024 22:56:50 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVPssy/CusZunvpRm04sCP3qqL1CJnqApUsOfWZWJm0EsBMD0Zc7s9t6f3JNB49dxy57R5//vRxUkLMKjP6o9WTXqTGZKPBGkHfHhAKhQ== X-Google-Smtp-Source: AGHT+IElZHSLbS/gJ5ISlosNZturRawDBG4Gf96AT6fNge3+aXDA2SeaFWhwuSawMfgmY+SdU+5r X-Received: by 2002:aca:2109:0:b0:3c0:3c3d:3671 with SMTP id 9-20020aca2109000000b003c03c3d3671mr16287467oiz.30.1708498610233; Tue, 20 Feb 2024 22:56:50 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708498610; cv=pass; d=google.com; s=arc-20160816; b=Z/yGhtVnuSVQuIEQIHM/9CgzzjcxzJz5W/JigSseEfhlQW3fPxbtYuzW4SBhDH9rzB FwQcdzFnzsaLPAJtu82oI8//ttZ/dwiehAnBSSbTeL9ApWY5XRvSskeblIJvMiacfI4D +RQDG7Wr72AIWhSQLP1hk4VDC61q9j3g88KjzRDpLdBYfdyVOYFNbolL+7zD1DwGu6Cf QYJjC7uu9fYcQrzu7TOjSlj5IeAU2uxbZXPSlbK/hxCf49GM1iylR7vePHKr+0aRabp/ cvxzP4aBRczcC/QXIye2oLWsjYo/qSok6MJCwWrDco3tb7Zv6+dhXVFv7G0Uvq1/R2Z/ ZAnw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:content-transfer-encoding:references:in-reply-to:date:cc :to:from:subject:message-id:dkim-signature; bh=g6m076LQ23h6FwkMylUlHWPkBsSKbQX7IRgEVMF1E1A=; fh=+UsgQnvAzQC/xehqmm8frHZ/+TEUhAv9mNq7vaGYxJU=; b=G1kD03QdsSWVWS7yLYCEufuNLQtJxbUD1BX91v44hAvDx3ZBbU1UD6lad+XesKoiC1 lLk6r1+Ref15Y4LLpRl6C9N1Y84DYecGONXk3EfVwKB07Ibf0fMYHwlvZEEmHHU4LxY0 2CysRRUoryLMEH25MfsTE/PJA6T/nYcAppSY2e05kM2htLkk7HLPBzAQ7iN3SMI0dsml tpnb9872xjFd7bB7J8JMqtIz9OXNAXimFE158apa1KD4ncy/SO6MnQS9YQASuHOdgmF9 fWgb3hHIXZje+osR7cQxfmunCyVrdMZXAdNUUDyxPBdCuvIK+Q9RnI5h0hQiB8+/7RlQ FlEQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=EfB31D0n; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-74175-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-74175-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id ca16-20020a056a00419000b006e446d33ca1si5873984pfb.376.2024.02.20.22.56.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Feb 2024 22:56:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-74175-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=EfB31D0n; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-74175-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-74175-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id DFD4228633E for ; Wed, 21 Feb 2024 06:56:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5A3AF3A1C4; Wed, 21 Feb 2024 06:56:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EfB31D0n" Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 631EC365; Wed, 21 Feb 2024 06:56:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708498601; cv=none; b=b6d5GixoZ7EPXGawId2gBem+Ugyp1hBfIX7bv5MoCvfNn2NndI0cmkyeQC+3TyzmOJunp7uvNysI/DPEhUmBw/BOFiDjpaA6xvFouhZJpMIjSpOluMWnSnddJGGeRaGQJnASwDXeTVOMMHP5lSJpYJ1v8nVSsh4/CE1GH9TzvE0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708498601; c=relaxed/simple; bh=g6m076LQ23h6FwkMylUlHWPkBsSKbQX7IRgEVMF1E1A=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=DBo1jetDOJnuozwynV1fWGjqtGWXaGefJNoclMVyIOkQzUoYq4pD4HSVI+TrDQfSRx4yoOWTnoC7p1oHnPl8ifJ46ciE9CDgMjtaqD5/wRi/YLVQJBfAHAYbnFRMIaOR/DMTjcfGStgTiyQXW4xsj/LFRsxd2AOzaxYjWifLKfE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EfB31D0n; arc=none smtp.client-ip=209.85.218.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a3d5e77cfbeso1090149066b.0; Tue, 20 Feb 2024 22:56:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708498597; x=1709103397; darn=vger.kernel.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=g6m076LQ23h6FwkMylUlHWPkBsSKbQX7IRgEVMF1E1A=; b=EfB31D0nwp5JDZaUAP6wk334twvTXdvK2yAdEMOSJSbO0c+HHxKq4yZKw1xfltiyNi YEwo+m8lgkhMsg7JPOGGP8T6iZl0BuWkUuvTN2ZBYhnAsB/LlM27fBp8tQu0VWv8SyTt TKZtWmkBgFrn9UZ91B75lWxuv3KztqyWGKEYI40566qpqmfgRPiOGTvE/5/5FjafJS00 ZkJ2oSUdzlVAn5LEQzD5RljT9UyHDurWw0UmDwrTP/PdAg5PO4F/ydHCkwUPlpPLZGnN IosYf9R924CtUBjuN/dV3bBuXnwX89HFes+cxjOzJP6BROoKZWm9Berz08F19KbjX3+f IXpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708498597; x=1709103397; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=g6m076LQ23h6FwkMylUlHWPkBsSKbQX7IRgEVMF1E1A=; b=n4igMn8xkQ/hYJu+3byTj/H3LmMspiyYdqz0Chhmu1F3ct4ulhMxWoMf7fiyX4lQFR /ySD6qONFxVsIhnLljFf8PXUHmbD3TaRSzxCEb2x553JrwXi7Q4/6bFEcNI3HTUK00O8 VPuCuvebFQjGsdj3NVtwDduKlDjb1GTHLAajiwH5Bc+fcL3idUjHcU7qQtqgdsIvM3lV BQfjVD64kZqiAROep/5qMakK7q6waPKxMnhnVk2/i97W+Z+G/YMv2rAGrAEGGOeCv6Tb AMkO3ArqXP8aIE7ACLYkrpjT65hnbUl1iicJANRSGKWJr+uSYqHPxZHorDKjST5nkcMx ad8Q== X-Forwarded-Encrypted: i=1; AJvYcCVQnVtVpBPfp/jp+2c3/uxraLo/xpu8yAFwHoA6XlZrhA0g6iawHXy3ccRlHMwlfIqx+hGgtUWfZkIp3DTExDT0JjZ14GIiK3gVaJdQ0i2pP80vfdFT/WXy8uQvxpFg4xwjw3sZvwI3Gw== X-Gm-Message-State: AOJu0Yy61kJ5A7aF0c+Uo9vREx5V4cbGJHsio4pDuM1wfYgYmWgMqNWr /YuDli4IHNTd0w0cEqPCtskbcY5b5Ed289A7aPCQcgo/c2nnw2D3 X-Received: by 2002:a17:906:280c:b0:a3c:2f68:54a9 with SMTP id r12-20020a170906280c00b00a3c2f6854a9mr17213107ejc.3.1708498597182; Tue, 20 Feb 2024 22:56:37 -0800 (PST) Received: from ?IPv6:2001:a61:3456:4e01:6ae:b55a:bd1d:57fc? ([2001:a61:3456:4e01:6ae:b55a:bd1d:57fc]) by smtp.gmail.com with ESMTPSA id l22-20020a1709067d5600b00a3e92467f22sm2804937ejp.163.2024.02.20.22.56.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Feb 2024 22:56:36 -0800 (PST) Message-ID: Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal() From: Nuno =?ISO-8859-1?Q?S=E1?= To: Saravana Kannan , Herve Codina Cc: Greg Kroah-Hartman , "Rafael J. Wysocki" , Rob Herring , Frank Rowand , Lizhi Hou , Max Zhen , Sonal Santan , Stefano Stabellini , Jonathan Cameron , linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, Allan Nielsen , Horatiu Vultur , Steen Hegelund , Thomas Petazzoni , Android Kernel Team Date: Wed, 21 Feb 2024 07:56:35 +0100 In-Reply-To: References: <20231130174126.688486-1-herve.codina@bootlin.com> <20231130174126.688486-2-herve.codina@bootlin.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.50.3 (3.50.3-1.fc39) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On Tue, 2024-02-20 at 16:31 -0800, Saravana Kannan wrote: > On Thu, Nov 30, 2023 at 9:41=E2=80=AFAM Herve Codina wrote: > >=20 > > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal") > > introduces a workqueue to release the consumer and supplier devices use= d > > in the devlink. > > In the job queued, devices are release and in turn, when all the > > references to these devices are dropped, the release function of the > > device itself is called. > >=20 > > Nothing is present to provide some synchronisation with this workqueue > > in order to ensure that all ongoing releasing operations are done and > > so, some other operations can be started safely. > >=20 > > For instance, in the following sequence: > > =C2=A0 1) of_platform_depopulate() > > =C2=A0 2) of_overlay_remove() > >=20 > > During the step 1, devices are released and related devlinks are remove= d > > (jobs pushed in the workqueue). > > During the step 2, OF nodes are destroyed but, without any > > synchronisation with devlink removal jobs, of_overlay_remove() can rais= e > > warnings related to missing of_node_put(): > > =C2=A0 ERROR: memory leak, expected refcount 1 instead of 2 > >=20 > > Indeed, the missing of_node_put() call is going to be done, too late, > > from the workqueue job execution. > >=20 > > Introduce device_link_wait_removal() to offer a way to synchronize > > operations waiting for the end of devlink removals (i.e. end of > > workqueue jobs). > > Also, as a flushing operation is done on the workqueue, the workqueue > > used is moved from a system-wide workqueue to a local one. >=20 > Thanks for the bug report and fix. Sorry again about the delay in > reviewing the changes. >=20 > Please add Fixes tag for 80dd33cf72d1. >=20 > > Signed-off-by: Herve Codina > > --- > > =C2=A0drivers/base/core.c=C2=A0=C2=A0=C2=A0 | 26 ++++++++++++++++++++++= +--- > > =C2=A0include/linux/device.h |=C2=A0 1 + > > =C2=A02 files changed, 24 insertions(+), 3 deletions(-) > >=20 > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > index ac026187ac6a..2e102a77758c 100644 > > --- a/drivers/base/core.c > > +++ b/drivers/base/core.c > > @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void); > > =C2=A0static void __fw_devlink_link_to_consumers(struct device *dev); > > =C2=A0static bool fw_devlink_drv_reg_done; > > =C2=A0static bool fw_devlink_best_effort; > > +static struct workqueue_struct *fw_devlink_wq; > >=20 > > =C2=A0/** > > =C2=A0 * __fwnode_link_add - Create a link between two fwnode_handles. > > @@ -530,12 +531,26 @@ static void devlink_dev_release(struct device *de= v) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * It may take a while = to complete this work because of the SRCU > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * synchronization in d= evice_link_release_fn() and if the consumer or > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * supplier devices get dele= ted when it runs, so put it into the "long" > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * workqueue. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * supplier devices get dele= ted when it runs, so put it into the > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * dedicated workqueue. > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 queue_work(system_long_wq, &link-= >rm_work); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 queue_work(fw_devlink_wq, &link->= rm_work); >=20 > This has nothing to do with fw_devlink. fw_devlink is just triggering > the issue in device links. You can hit this bug without fw_devlink too. > So call this device_link_wq since it's consistent with device_link_* APIs= . >=20 I'm not sure if I got this right in my series. I do call devlink_release_qu= eue() to my queue. But on the Overlay side I use fwnode_links_flush_queue() because = it looked more sensible from an OF point of view. And including (in OF code) linux/fw= node.h instead linux/device.h makes more sense to me. > > =C2=A0} > >=20 > > +/** > > + * device_link_wait_removal - Wait for ongoing devlink removal jobs to= terminate > > + */ > > +void device_link_wait_removal(void) > > +{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * devlink removal jobs are = queued in the dedicated work queue. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * To be sure that all remov= al jobs are terminated, ensure that any > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * scheduled work has run to= completion. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 drain_workqueue(fw_devlink_wq); >=20 > Is there a reason this needs to be drain_workqueu() instead of > flush_workqueue(). Drain is a stronger guarantee than we need in this > case. All we are trying to make sure is that all the device link > remove work queued so far have completed. >=20 Yeah, I'm also using flush_workqueue(). > > +} > > +EXPORT_SYMBOL_GPL(device_link_wait_removal); > > + > > =C2=A0static struct class devlink_class =3D { > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .name =3D "devlink", > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .dev_groups =3D devlink_grou= ps, > > @@ -4085,9 +4100,14 @@ int __init devices_init(void) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sysfs_dev_char_kobj =3D kobj= ect_create_and_add("char", dev_kobj); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!sysfs_dev_char_kobj) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 goto char_kobj_err; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 fw_devlink_wq =3D alloc_workqueue= ("fw_devlink_wq", 0, 0); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!fw_devlink_wq) >=20 > Fix the name appropriately here too please. Hi Saravana, Oh, was not aware of this series... Please look at my first patch. It alrea= dy has a review tag by Rafael. I think the creation of the queue makes more sense to= be done in devlink_class_init(). Moreover, Rafael complained in my first version th= at erroring out because we failed to create the queue is too harsh since devli= nks can still work. So, what we do is to schedule the work if we have a queue or to= o call device_link_release_fn() synchronously if we don't have the queue (note tha= t failing to allocate the queue is very unlikely anyways). - Nuno S=C3=A1 >=20