Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752619AbbGIFOf (ORCPT ); Thu, 9 Jul 2015 01:14:35 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:34639 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751039AbbGIFO0 (ORCPT ); Thu, 9 Jul 2015 01:14:26 -0400 MIME-Version: 1.0 In-Reply-To: <20150709044406.GA15814@dtor-ws> References: <20150706232315.GK7021@wotan.suse.de> <20150706234032.GG32140@dtor-ws> <20150709004903.GD379@dtor-ws> <20150709044406.GA15814@dtor-ws> Date: Wed, 8 Jul 2015 22:14:25 -0700 Message-ID: Subject: Re: [PATCH 2/8] driver-core: add asynchronous probing support for drivers From: Dan Williams To: Dmitry Torokhov Cc: "Luis R. Rodriguez" , Tom Gundersen , Greg Kroah-Hartman , Tejun Heo , Linux Kernel Mailing List , Arjan van de Ven , Rusty Russell , Olof Johansson , Tetsuo Handa Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6196 Lines: 113 On Wed, Jul 8, 2015 at 9:44 PM, Dmitry Torokhov wrote: > On Wed, Jul 08, 2015 at 06:00:41PM -0700, Dan Williams wrote: >> On Wed, Jul 8, 2015 at 5:49 PM, Dmitry Torokhov >> wrote: >> > On Wed, Jul 08, 2015 at 05:36:04PM -0700, Dan Williams wrote: >> >> On Mon, Jul 6, 2015 at 4:40 PM, Dmitry Torokhov >> >> wrote: >> >> > On Tue, Jul 07, 2015 at 01:23:15AM +0200, Luis R. Rodriguez wrote: >> >> >> On Sat, Jul 04, 2015 at 07:09:19AM -0700, Dan Williams wrote: >> >> >> > On Fri, Jul 3, 2015 at 11:30 AM, Luis R. Rodriguez wrote: >> >> >> > > On Sat, Jun 27, 2015 at 04:45:25PM -0700, Dan Williams wrote: >> >> >> > >> On Mon, Mar 30, 2015 at 4:20 PM, Dmitry Torokhov >> >> >> > >> wrote: >> >> >> > >> > Some devices take a long time when initializing, and not all drivers are >> >> >> > >> > suited to initialize their devices when they are open. For example, >> >> >> > >> > input drivers need to interrogate their devices in order to publish >> >> >> > >> > device's capabilities before userspace will open them. When such drivers >> >> >> > >> > are compiled into kernel they may stall entire kernel initialization. >> >> >> > >> > >> >> >> > >> > This change allows drivers request for their probe functions to be >> >> >> > >> > called asynchronously during driver and device registration (manual >> >> >> > >> > binding is still synchronous). Because async_schedule is used to perform >> >> >> > >> > asynchronous calls module loading will still wait for the probing to >> >> >> > >> > complete. >> >> >> > >> > >> >> >> > >> > Note that the end goal is to make the probing asynchronous by default, >> >> >> > >> > so annotating drivers with PROBE_PREFER_ASYNCHRONOUS is a temporary >> >> >> > >> > measure that allows us to speed up boot process while we validating and >> >> >> > >> > fixing the rest of the drivers and preparing userspace. >> >> >> > >> > >> >> >> > >> > This change is based on earlier patch by "Luis R. Rodriguez" >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > Signed-off-by: Dmitry Torokhov >> >> >> > >> > --- >> >> >> > >> > drivers/base/base.h | 1 + >> >> >> > >> > drivers/base/bus.c | 31 +++++++--- >> >> >> > >> > drivers/base/dd.c | 149 ++++++++++++++++++++++++++++++++++++++++++------- >> >> >> > >> > include/linux/device.h | 28 ++++++++++ >> >> >> > >> > 4 files changed, 182 insertions(+), 27 deletions(-) >> >> >> > >> >> >> >> > >> Just noticed this patch. It caught my eye because I had a hard time >> >> >> > >> getting an open coded implementation of asynchronous probing to work >> >> >> > >> in the new libnvdimm subsystem. Especially the messy races of tearing >> >> >> > >> things down while probing is still in flight. I ended up implementing >> >> >> > >> asynchronous device registration which eliminated a lot of complexity >> >> >> > >> and of course the bugs. In general I tend to think that async >> >> >> > >> registration is less risky than async probe since it keeps wider >> >> >> > >> portions of the traditional device model synchronous >> >> >> > > >> >> >> > > but its not see -DEFER_PROBE even before async probe. >> >> >> > >> >> >> > Except in that case you know probe has been seen by the driver at >> >> >> > least once. So I see that as less of a surprise, but point taken. >> >> >> > >> >> >> > >> and leverages the >> >> >> > >> fact that the device model is already well prepared for asynchronous >> >> >> > >> arrival of devices due to hotplug. >> >> >> > > >> >> >> > > I think this sounds reasonable, do you have your code upstream or posted? >> >> >> > >> >> >> > Yes, see nd_device_register() in drivers/nvdimm/bus.c >> >> >> >> >> >> It should be I think rather easy for Dmitry to see if he can convert this input >> >> >> driver (not yet upstream) to this API and see if the same issues are fixed. >> >> > >> >> > No, I would rather not as it means we lose error handling on device >> >> > registration. >> >> > >> >> >> >> I think this is a red herring as I don't see how async probing is any >> >> better at handling device registration errors. The error is logged >> >> and "handled" by the fact that a device fails to appear, what other >> >> action would you take? In fact libnvdimm does detect registration >> >> failures and reports that in a parent device attribute (at least for a >> >> region device and their namespace child devices). >> > >> > What is libnvdimm behavior if you try to unload a module that tries to >> > register a device but it failed? Memory leak or crash, right? >> >> No, in the case of the "region" driver it is part of the core >> libnvdimm and it is pinned while any region device is active. > > No, not quite. Let's take a look for example at nd_btt_probe(). It calls > __nd_btt_create() which in turn calls __nd_device_register() which > returns void and asynchronously schedules device registration. Now > consider the device registration fails. The async code will drop 2 > references to the device, effectively freeing it. In the mean time > nd_btt_probe() stores the device pointer which may or may no longer be > valid and goes on it's merry way using it. nd_btt_probe() is the driver probe for the btt device. If registration fails then the device is never probed. > The similar thing in nvdimm_create which returns a pointer that may no > longer be valid Exactly, which is why we fail the entire bus probe if any of the nvdimm devices failed to register. See nvdimm_bus_check_dimm_count(). > I have not traced enough through the code to make sure > if it can blow up, but this kind of situation is not desirable, > especially if the async registration pattern is applied generally > throughout the kernel. I do appreciate the review, but I don't think this signals the death knell for async registration just yet. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/