Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3708602pxb; Sat, 13 Feb 2021 06:41:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJxYB3pAWBXjF7C3dBq1KvhlvSoHlChiHHF0nG8AjBmdeFKqIiJSmwNxJNiLUs4liFj/jf8N X-Received: by 2002:a50:ef0f:: with SMTP id m15mr8275464eds.175.1613227280464; Sat, 13 Feb 2021 06:41:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613227280; cv=none; d=google.com; s=arc-20160816; b=C6R1c5HLTFjUFpu5VnmTvUT8iWmelpM9INlIdql2YB3lksC7UIoHa9SYmDvt51YxDx XPPpaWzkbon1HVK5nVkxjyQAKyAiAfO3DK7upAU5fEP25K2Mg4YDs8MCK2X7feu9eDlH mdZZG5GSadz4iTDy0ENPyvCO2J75wgDTxIpYuqVmluDtoFzlRPmBAHmJP/nJEemT/fA8 qCDulumprcOE0kxb67H86Mw6psfK+UFx7TUXs3aTbS1spmLjuA3EN+EpmJsJVvGtaSgF STfiVBdhdKneq1xvFrrLaI9H+zP75QruZn7nqFhYEJDEcwTGrRpU7w61H6pChArkUglm sQUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=i1A8oLUG2I/8/kIRhBL2YlcQBqsattd0/QxapRqeMBk=; b=sMBMsWNXyWBp2pm/zSKfxAuwCX6eXGZjmBc2XI+AHAD+yXQ8Emwrx4eDeWB8pTHYkv /YPYdmJsVz7iY42sAmZyi5aapBg71ikRHd5MLClkRIJMxVyCkxcozcwGRAoxupQRCz/Z SEjSMLI/aZoX/MNgEvIuG0gnL9EVx//XY02VUp2/Ka39NK1zJzjqwzVzSaM7F0QkDs1g lPB4knRtnW4ALTCt3IVgBdhCq87xMMt0Y2QhymeQUlCAxjcHA/oid9j04Ll9Sv+PVxF0 ZaWBfjEdi8owDVWSfcOZr2G1rlrlZIIukZDZmyVjoV01GD5M8UPUO0VhqIcmXD1SZnRQ qwbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=exMxGdbZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y1si5927666ejm.229.2021.02.13.06.40.52; Sat, 13 Feb 2021 06:41:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=exMxGdbZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229703AbhBMOkP (ORCPT + 99 others); Sat, 13 Feb 2021 09:40:15 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:59154 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229649AbhBMOkL (ORCPT ); Sat, 13 Feb 2021 09:40:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613227123; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i1A8oLUG2I/8/kIRhBL2YlcQBqsattd0/QxapRqeMBk=; b=exMxGdbZ2u3+yG/5iZRzdUMEESN9y40EmmzSE7j0SLd3VD+XVFbY+MsJhYzYA/tEaBS0tz geUdtuO149ua3EuxuxlHIyCH1vYOkriVV42yfGxfhCHiu53JgKvs15jz8J7hcA8Mp9z5Ul Xxx1m4cLtSx9jErUL4DPXuti87yqzXI= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-228-Hsi894rOPYWkEFvMXHfZ_g-1; Sat, 13 Feb 2021 09:38:42 -0500 X-MC-Unique: Hsi894rOPYWkEFvMXHfZ_g-1 Received: by mail-ej1-f71.google.com with SMTP id ia14so2306634ejc.8 for ; Sat, 13 Feb 2021 06:38:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=i1A8oLUG2I/8/kIRhBL2YlcQBqsattd0/QxapRqeMBk=; b=boJDcb0RpnR6Y0HvyUyALCxeEiRs23vRfIKyoLTqcmpmrQZym3MTh3UHrpUyZ0/8Z2 87QAvZheF0Kwieic1gNVJ3TvmdjlD/1xWWUWZewzOmXCbPQ6A6kg6ZRIi3wDL0SXU8gG kSGHs9S36ZdokdhsnZHOZAoVbl9NoTUaWNDO42RnhqMkmg9zT8tmqYSWbgFhmrunIWrP kLXj7d0/3CXG8Xd9lRaKe0oKMpQmYK5Xl2/hM8eUv23RvSctKtpmal1uVpmpjEL3PAZU cTXuW8NmvgnKYoum2lPzbBuldEVAOmEym2K8GMd5GwzSeWR2kM5Ku9LHPZnr7rPH+P57 kKqQ== X-Gm-Message-State: AOAM533f72xgctLUahEblOTslFioP/7vbGRr5jmplBqZpWVCjVW/be4X AMzpZKJIdpE9AcTn+QdSEryNW7XcYk3RWDe6BuvwnOuu3d5Y4iQZ02EN1Ss2hwo5GDMU715f1Gk cwvczJQRajy7r+eY2buY3nAJY X-Received: by 2002:a05:6402:20e:: with SMTP id t14mr7983430edv.178.1613227120848; Sat, 13 Feb 2021 06:38:40 -0800 (PST) X-Received: by 2002:a05:6402:20e:: with SMTP id t14mr7983339edv.178.1613227119462; Sat, 13 Feb 2021 06:38:39 -0800 (PST) Received: from x1.localdomain (2001-1c00-0c1e-bf00-1054-9d19-e0f0-8214.cable.dynamic.v6.ziggo.nl. [2001:1c00:c1e:bf00:1054:9d19:e0f0:8214]) by smtp.gmail.com with ESMTPSA id s16sm2157689edr.14.2021.02.13.06.38.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 13 Feb 2021 06:38:38 -0800 (PST) Subject: Re: [RFC PATCH 1/7] drivers: base: Add resource managed version of delayed work init To: Greg Kroah-Hartman Cc: Matti Vaittinen , mazziesaccount@gmail.com, "Rafael J. Wysocki" , MyungJoo Ham , Chanwoo Choi , Andy Gross , Bjorn Andersson , Jean Delvare , Guenter Roeck , Mark Gross , Sebastian Reichel , Chen-Yu Tsai , Liam Girdwood , Mark Brown , Wim Van Sebroeck , Saravana Kannan , Heikki Krogerus , Andy Shevchenko , Joerg Roedel , Dan Williams , Bartosz Golaszewski , linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-hwmon@vger.kernel.org, platform-driver-x86@vger.kernel.org, linux-pm@vger.kernel.org, linux-watchdog@vger.kernel.org References: <1230b0d2ba99ad546d72ab079e76cb1b3df32afb.1613216412.git.matti.vaittinen@fi.rohmeurope.com> <284d4a13-5cc8-e23c-7e99-c03db5415bf1@redhat.com> From: Hans de Goede Message-ID: Date: Sat, 13 Feb 2021 15:38:37 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 2/13/21 2:33 PM, Greg Kroah-Hartman wrote: > On Sat, Feb 13, 2021 at 02:18:06PM +0100, Hans de Goede wrote: >> Hi, >> >> On 2/13/21 1:16 PM, Greg Kroah-Hartman wrote: >>> On Sat, Feb 13, 2021 at 01:58:44PM +0200, Matti Vaittinen wrote: >>>> A few drivers which need a delayed work-queue must cancel work at exit. >>>> Some of those implement remove solely for this purpose. Help drivers >>>> to avoid unnecessary remove and error-branch implementation by adding >>>> managed verision of delayed work initialization >>>> >>>> Signed-off-by: Matti Vaittinen >>> >>> That's not a good idea. As this would kick in when the device is >>> removed from the system, not when it is unbound from the driver, right? >> >> Erm, no devm managed resources get released when the driver is detached: >> drivers/base/dd.c: __device_release_driver() calls devres_release_all(dev); > > Then why do you have to manually call devm_free_irq() in release > callbacks? That is only necessary when there are none devm managed resources / resource-ish things which get manually released in the remove function (because they are not devm managed), while the IRQ handler depends on these. > I thought that was the primary problem with those things. If all resources / resource-ish things used by the IRQ handler are devm managed then there is no need to have a devm_free_irq() call in a drivers release (which is called remove) callback. > I can understand devm_ calls handling resources, but callbacks and > workqueues feels like a big stretch. Things get hairy wrt devm resource handling when some of the resources are not managed by devm. Having devm managed sync-cancel of work items is IMHO actually good to have because it helps avoiding mixing of devm managed resources with non devm managed resources. Some drivers are actually already doing devm managed workqueue cancellation by using the generic devm_add_action() helper which allows calling a driver supplied cleanup function when devres_release_all() runs. This is also useful to make sure the workqueue is cancelled at the right time during tear-down on error-exits from probe(), which also runs devres_release_all(). See drivers/power/supply/axp288_charger.c for an example of this. Note that no-one is going to force people to use this new devm_delayed_work_autocancel() functionality. IMHO it is a useful tool to have in our toolbox. > >>> There is two different lifespans here (well 3). Code and data*2. Don't >>> confuse them as that will just cause lots of problems. >>> >>> The move toward more and more "devm" functions is not the way to go as >>> they just more and more make things easier to get wrong. >>> >>> APIs should be impossible to get wrong, this one is going to be almost >>> impossible to get right. >> >> I have to disagree here devm generally makes it easier to get things right, >> it is when some devm functions are missing and devm and non devm resources >> are mixed that things get tricky. >> >> Lets look for example at the drivers/extcon/extcon-intel-int3496.c code >> from patch 2/7 from this set. The removed driver-remove function looks like >> this: >> >> -static int int3496_remove(struct platform_device *pdev) >> -{ >> - struct int3496_data *data = platform_get_drvdata(pdev); >> - >> - devm_free_irq(&pdev->dev, data->usb_id_irq, data); >> - cancel_delayed_work_sync(&data->work); >> - >> - return 0; >> -} >> - >> >> This is a good example where the mix of devm and non devm (the workqueue) >> resources makes things tricky. The IRQ must be freed first to avoid the >> work potentially getting re-queued after the sync cancel. >> >> In this case using devm for the IRQ may cause the driver author to forget >> about this, leaving a race. >> >> Bit with the new proposed devm_delayed_work_autocancel() function things >> will just work. >> >> This work gets queued by the IRQ handler, so the work must be initialized (1) >> *before* devm_request_irq() gets called. Any different order would be a >> bug in the probe function since then the IRQ might run before the work >> is initialized. > > How are we now going to audit the order of these calls to ensure that > this is done correctly? That still feels like it is ripe for bugs in a > much easier way than without these functions. We already need to audit probe() functions to make sure that all resources which the IRQ handlers need are allocated before the IRQ gets requested. This is why requesting the IRQ is usually one of the last things which driver's probe functions do. Using devm (and only devm without mixing and matching) avoids to have to _also_ audit the probe-error-exit teardown and driver-release paths. Those will automatically be done in the right order since devm will release everything in reverse order of allocation. So this reduces the amount of auditing work we need to do. In my experience the probe-error-exit teardown and driver-release paths are full of bugs, bugs which are often never caught because most drivers bind once and then stay bound until reboot / shutdown so this paths are often not exercised during testing. So automating this teardown is a good thing to do, it removes a whole class of bugs. >> Since devm unrolls / releases resources in reverse order, this means that >> it will automatically free the IRQ (which was requested later) before >> cancelling the work. >> >> So by switching to the new devm_delayed_work_autocancel() function we avoid >> a case where a driver author can cause a race on driver detach because it is >> relying on devm to free the IRQ, which may cause it to requeue a just >> cancelled work. >> >> IOW introducing this function (and using it where appropriate) actually >> removes a possible class of bugs. >> >> patch 2/7 actually has a nice example of this, drivers/extcon/extcon-gpio.c >> also uses a delayed work queued by an interrupt, together with devm managing >> the interrupt, yet the removed driver_remove callback: >> >> -static int gpio_extcon_remove(struct platform_device *pdev) >> -{ >> - struct gpio_extcon_data *data = platform_get_drvdata(pdev); >> - >> - cancel_delayed_work_sync(&data->work); >> - >> - return 0; >> -} >> - >> >> Is missing the explicit free on the IRQ which is necessary to avoid >> the race. One the one hand this illustrates your (Greg's) argument that >> devm managed IRQs may be a bad idea. > > I still think it is :) I understand and I'm not under the illusion that I'm going to change your mind about this :) >> OTOH it shows that if we have devm managed IRQs anyways that then also >> having devm managed autocancel works is a good idea, since this RFC patch-set >> not only results in some cleanup, but is actually fixing at least 1 driver >> detach race condition. > > Fixing bugs is good, but the abstraction away from resource management > that the devm_ calls cause is worrying as the "magic" behind them can be > wrong, as seen here. Here being the gpio_extcon_remove() example ? Yes that is bad, but again that is due to mixing manual and devm managed resource management. Having this new devm_delayed_work_autocancel() helper will allow a bunch of drivers to move away from mixing the 2, which is a good thing in my book. As I said above I believe that having devm_delayed_work_autocancel() (1) in our toolbox will be a good thing to have. Driver authors can then choose to use it; or they can choose to not use it if they don't like it. I know that the reason why I did not use it in the drivers/extcon/extcon-intel-int3496.c driver is because it was not available if it had been available then I would definitely have used it, as it avoids the mixing of resource-management styles which that driver is currently doing. And I think that that is what this is ultimately about, there are 2 styles of resource-management: 1. manual 2. devm based And they both have their pros and cons, problems mostly arise when mixing them and adding new devm helpers for commonly used cleanup patterns is a good thing as it helps to get rid of mixing these 2 styles in a single driver. Regards, Hans