Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754270AbcJEAAP (ORCPT ); Tue, 4 Oct 2016 20:00:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:56787 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753963AbcJEAAN (ORCPT ); Tue, 4 Oct 2016 20:00:13 -0400 Date: Wed, 5 Oct 2016 02:00:08 +0200 From: "Luis R. Rodriguez" To: Dmitry Torokhov Cc: "Herbert, Marc" , Linus Torvalds , "open list:DOCUMENTATION" , Jacek Anaszewski , David Woodhouse , Christian Lamparter , Julia Lawall , Andrew Morton , linuxppc-dev , Mimi Zohar , Andy Lutomirski , Richard Purdie , Wu Fengguang , Johannes Berg , "Luis R. Rodriguez" , Michal Marek , Hauke Mehrtens , Mark Brown , Jiri Slaby , Ming Lei , Daniel Vetter , Bjorn Andersson , Felix Fietkau , Roman Pen , Greg KH , Linux Kernel Mailing List , Vikram Mulukutla , Stephen Boyd , Takashi Iwai , Jeff Mahoney , Hariprasad S , Benjamin Poirier , Josh Triplett Subject: Re: [RFC] fs: add userspace critical mounts event support Message-ID: <20161005000008.GY3296@wotan.suse.de> References: <20160902235916.GO3296@wotan.suse.de> <20160903002014.GP3296@wotan.suse.de> <20160903174939.GB32345@dtor-ws> <2deae6da-dd43-7bff-e1fd-ffd26946b928@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5667 Lines: 107 On Sat, Sep 24, 2016 at 10:41:46AM -0700, Dmitry Torokhov wrote: > On Fri, Sep 23, 2016 at 6:37 PM, Herbert, Marc wrote: > > On 03/09/2016 11:10, Dmitry Torokhov wrote: > >> I was thinking if we kernel could post > >> "conditions" (maybe simple stings) that it waits for, and userspace > >> could unlock these "conditions". One of them might be "firmware > >> available". > > > > On idea offered by Josh Triplett that seems to overlap with this one > > is to have something similar to the (deprecated) userhelper with > > *per-blob* requests and notifications except for one major difference: > > userspace would not anymore be in charge of *providing* the blob but > > would instead only *signal* when a given blob becomes available and is > > either found or found missing. Then the kernel loads the blob _by > > itself_; unlike the userhelper. No new “critical filesystem” concept > > and a *per-blob basis*, allowing any variation of blob locations > > across any number of initramfs and filesystems. > > > > Really, I do not quite understand why people have issues with usermode > helper/uevents. One reason is you'd have to implement your own cache for suspend/resume. > It used to work reasonably well (if you were using > request_firmware_nowait()), as the kernel would post the request and > then, when userspace was ready[^Hier], uevents would be processed and > firmware would be loaded. We had a timeout of 60(?) seconds by > default, but that would be adjusted as systems needed. The issue with the timeout was kernel developers *assumed* module init and probe were detached, and saying 'thou shall not load firmware on probe' seems actually like a more radical change than just saying 'thou shall load firmware on init'. I'll note that as it stands its the right thing to complain about these users only because we lack the semantics to ensure correctness if used on init or probe. The timeout incurred huge latencies for optional firmwares, and while we had a new API added to avoid the wait on optional firmware, that obviously still leaved the races as possible. We now have async probe which *does* enable some original misconceptions by kernel developers, but by now other issues have also been found on the usermode helper, the cache was one, another one was a recent discusion over the user of the UMH lock with the assumption this was providing a sort of safeguard on early boot use -- it does not, for the same exact reasons why a UMH lock does not suffice to avoid all possible rootfs races. For this later issue refer to a recent discussion in review with Daniel Wagner's patches. > Unfortunately it all broke when udev started insisting [1] on > servicing some uevents in strict sequence, which resulted in boot > stalls. That was not the only issue... another implicit issue was that you are reducing the number of possible supported number of devices Linux supports per module by the timeout, it would depend on the combine time it takes to both init and probe. Some drivers are super complex and even if you *don't* have firmware requirements and say burn the firmware onto a device we found that *probe* alone was taking a long long time on some device drivers -- check out cxgb4 driver, where one device actually ends up loading about 4 subdevices underneath it. Yes that's a mess and the driver needs a major rewrite to address this in a clean way but that takes time. Its no trivial pursuit. The umh timeout then would not be implicated anymore *but* since systemd implemented the timeout in general for kmod loading it did mean system was limiting them Linux drivers and how much devices a driver can support depending on this timeout value. At SUSE we solved this by lifting this timeout for kmod workers for now. A long term goal here, which could help, is also to just detach init and probes, so we give to system what it originally thought. Summary of this all is here: http://www.do-not-panic.com/2015/12/linux-asynchronous-probe.html I have some code that starts to enable some of this on systemd/kmod but it still needs some more testing before I post. > Maybe the ultimate answer is to write a firmware loading > daemon that would also listen to netlink events and do properly what > udev refused to be doing? Meh, in the wireless subsystem we devised our own file loader, check CRDA. That worked for us since we needed to optionally enable digital RSA signed file checking, but long term our experience is that this is pointless. So we're going to phase that out in favor of using the firmware API for the file loading of this file, and support then digital signatures on the firmware. I am not sure how/why a firmware loading daemon would be a better idea now. What Marc describes that Josh proposed with signals for userspcae seems more aligned with what we likely need -- but note that since we now use a shared common API for kernel reads from a path via kernel_read_file_from_path() we'd probably want something like a notifier for any kernel_read_file_from_path() user. The ability for the kernel to register a generic userspace notifier seems worthy of consideration, but I'd be surprised if we don't already have something quite like this already? > The distribution would know when it is ready > to service firmware requests (and thus when to start this daemon), and > we would have the freedom of having drivers both built-in and as > modules and bulding firmware into kernel, intiramfs or keep on a > "real" fs available at later time. What difference would there be if we just used notifications to guarantee to the kernel the file in question is now available? Luis