Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1034239imu; Thu, 13 Dec 2018 08:21:24 -0800 (PST) X-Google-Smtp-Source: AFSGD/V/3dB/8g6YiqXc8/8pWj2GCza0BVyNe4kbJGohU/99uYYE/5xB/ehcTuQbJK5TNC+BabzS X-Received: by 2002:a62:6f49:: with SMTP id k70mr24470055pfc.7.1544718084189; Thu, 13 Dec 2018 08:21:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544718084; cv=none; d=google.com; s=arc-20160816; b=vEh0Ktan1++eRtaaV6A+4sxI1TYmyxMfBuvMTQEwmKdaLBV3aly2UcVavw/6ZTod/C 03zOX5B2YKWXEv0HP/hwkNQlBbVK6BtTnKYkfcfxd5NNVkqgjT+HDj9qnsExS5reaX+F p+6u2evfte12/neNm+yeprOxdNZutTEteZriDIGRbce8cX+4v7MCFdrvITMr9vj30qEh x81MtcyRuWnh9lw6ZU7LaclLbj/OjTrCUMaGlVghijh2Mm0vtsYFeyQCQz3Mb6L5oH82 esEtMNQJfI7eZgvfCX15pzD5S4intShboR4Rfxk37xbxEAnhtezUOTyZXKaHmlXcw/CB QH1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=DJBrH2QhbieiejWYRdO+nAzDS+f3TMMfbklK9XVEN5U=; b=BRxxM9q+mMWSjG2Z9OD7Nzp+NV85qe2B2DGR3BnI3woImIOUgB9y5lvBMSTcL4Kt3e amI+hlHSDD3/RKDCfPs+SIs8xATCPo0uQCt7sKp3AjylojNMb8l1K0bt/+IZj+m4jkP+ GtsgHkaWTWHcwNST05AfjHglsByrJnz5h68jklK1+JpUeAac4APgf4uf/lh35YvKHxvo UlWAAT792yCp+uHF9o+iEYyS+1ztOyyUfE3VqmV3Ajl9ZcqPq3TEwqWoXYl6qW79LUMQ C9VcS5wZMGvNlA3fFwiIDUjaT0XrOniNnRwYUPjbgOQsdrVqFkzb8vSdHZ0qiezaKErZ Ny6w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y2si1764719pfy.29.2018.12.13.08.20.58; Thu, 13 Dec 2018 08:21:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729534AbeLMQSo (ORCPT + 99 others); Thu, 13 Dec 2018 11:18:44 -0500 Received: from mail-oi1-f194.google.com ([209.85.167.194]:43586 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727428AbeLMQSn (ORCPT ); Thu, 13 Dec 2018 11:18:43 -0500 Received: by mail-oi1-f194.google.com with SMTP id u18so2051284oie.10 for ; Thu, 13 Dec 2018 08:18:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DJBrH2QhbieiejWYRdO+nAzDS+f3TMMfbklK9XVEN5U=; b=QJ3pvo+z/SQUwGFU04LJ1Ej+3O6CL+Z5JhEgqkJxxXJKYhmBo73eUPfGeKYAuUbcdS 7wv3oWzraqfO5VhLWI+gbxvG2LgoquaX7n++FU1zuMf3UFqRFZ4hz1D8DwRkfH+iYyAs zybZflGJCsf7fQEb1tNYQ3h/DmafudVBAsyZlj90q4lnqct/PQbCtvbQBQOk5EpWikEk vjIMDUj0VoJUHJD/0jPLe6ANA4vb3N3Zuk+rdg6fpn5Lg4M3EnRK7VcrCxTkb9opddEX Y1bhIEiAPoPT6cbRT8yyK0XRzZEjddsYkr2DcqcOq+sigRN7f7XA8OBsngj6Xfr7QnIW 4Pfg== X-Gm-Message-State: AA+aEWZd+dhhuoaMl+2HcSxks5pmHmAHkBcn0YqZxBy+SwzlZ+EFC/bN c1b7Pp1FkEK0zPHKeYJwQMIG+5j0mcvPprCfu5k= X-Received: by 2002:aca:195:: with SMTP id 143mr3034418oib.322.1544717922685; Thu, 13 Dec 2018 08:18:42 -0800 (PST) MIME-Version: 1.0 References: <20181210084653.7268-1-daniel.vetter@ffwll.ch> <20181213095814.GC21184@phenom.ffwll.local> In-Reply-To: From: "Rafael J. Wysocki" Date: Thu, 13 Dec 2018 17:18:29 +0100 Message-ID: Subject: Re: [PATCH] drivers/base: use a worker for sysfs unbind To: Daniel Vetter Cc: "Rafael J. Wysocki" , Linux Kernel Mailing List , dri-devel , ramalingam.c@intel.com, Greg Kroah-Hartman , Daniel Vetter Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 13, 2018 at 1:36 PM Daniel Vetter wrote: > > On Thu, Dec 13, 2018 at 11:23 AM Rafael J. Wysocki wrote: > > > > On Thu, Dec 13, 2018 at 10:58 AM Daniel Vetter wrote: > > > > > > On Thu, Dec 13, 2018 at 10:38:14AM +0100, Rafael J. Wysocki wrote: > > > > On Mon, Dec 10, 2018 at 9:47 AM Daniel Vetter wrote: > > > > > > > > > > Drivers might want to remove some sysfs files, which needs the same > > > > > locks and ends up angering lockdep. Relevant snippet of the stack > > > > > trace: > > > > > > > > > > kernfs_remove_by_name_ns+0x3b/0x80 > > > > > bus_remove_driver+0x92/0xa0 > > > > > acpi_video_unregister+0x24/0x40 > > > > > i915_driver_unload+0x42/0x130 [i915] > > > > > i915_pci_remove+0x19/0x30 [i915] > > > > > pci_device_remove+0x36/0xb0 > > > > > device_release_driver_internal+0x185/0x250 > > > > > unbind_store+0xaf/0x180 > > > > > kernfs_fop_write+0x104/0x190 > > > > > > > > Is the acpi_bus_unregister_driver() in acpi_video_unregister() the > > > > source of the lockdep unhappiness? > > > > > > Yeah I guess I cut out too much of the lockdep splat. It complains about > > > kernfs_fop_write and kernfs_remove_by_name_ns acquiring the same lock > > > class. It's ofc not the same lock, so no real deadlock. Getting the > > > device_release_driver outside of the callchain under kernfs_fop_write, > > > which this patch does, "fixes" it. For "fixes" = shut up lockdep. > > > > OK, so the problem really is that the operation is started via sysfs > > which means that this code is running under a lock already. > > > > Which lock does lockdep complain about, exactly? > > mutex_lock(&of->mutex); OK (I thought so) > > > Other options: > > > - Anotate the recursion with the usual lockdep annotations. Potentially > > > results in lockdep not catching real deadlocks (you can still have other > > > loops closing the deadlock, maybe through some subsystem/bus lock). > > > > > > - Rewrite kernfs_fop_write to drop the lock (optionally, for callbacks > > > that know what they're doing), which should be fine if we refcount > > > everything properly (bus, driver & device). > > > > > > - Also note that probably the same bug exists on the bind sysfs interface, > > > but we don't use that, so I don't care :-) > > > > > > - Most of these issues are never visible in normal usage, since normally > > > driver bind/unbind is done from a kthread or model_load/unload, neither > > > of which is running in the context of that kernfs mutex kernfs_fop_write > > > holds. That's why I think the task work is the best solution, since it > > > changes the locking context of the unbind sysfs to match the locking > > > context of module unload and hotunplug. > > > > I think that using a task work here makes sense. There is a drawback, > > which is that the original sysfs write will not wait for the driver to > > actually be released before returning to user space AFAICS, but that > > probably isn't a big deal. > > This would happen with a normal work_struct, which runs on some other > thread eventually. That added asynonchrouns execution uncovered lots > of bugs in our CI (fbcon isn't solid, let's put it that way). Hence > the task work, which will be run before the syscall returns to > userspace, but outside of anything else. Was originally created to > avoid locking inversion on the final fput, where the same "must > complete before returning to userspace, but outside of any other > locking context" issue was causing trouble. I didn't realize that it would run completely before returning to user space, thanks for pointing this out. This isn't an issue then. > > Also please note that the patch changes the code flow slightly, > > because passing a non-NULL parent pointer to > > device_release_driver_internal() potentially has side effects, but > > that should not be a big deal either. > > I can do the old code exactly, but afaict the non-NULL parent just > takes care of the parent bus locking for us, instead of hand-rolling > it in the caller. But if I missed something, I can easily undo that > part. It is different if device links are present, but I'm not worried about that case honestly. :-) > > > Unfortunately that trick doesn't work for the bind sysfs file, since that way we can't thread the errno value back to userspace. > > > > Right. That is unless we wait for the operation to complete and check > > the error left behind by it. That should be doable, but somewhat > > complicated. > > For real deadlocks this doesn't fix anything, it just hides it from > lockdep. cross-release lockdep would still complain. If we want to fix > the bind side _and_ keep reporting the errno from the driver's bind > function, then we need to rework kernfs to and add a callback which > doesn't hold the mutex. Should be doable, just a pile more work. It should be possible to store the error in a variable and export that via a separate attribute for user space to inspect. That would be a significant I/F change, however. Cheers, Rafael