Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752715AbcKIJNa (ORCPT ); Wed, 9 Nov 2016 04:13:30 -0500 Received: from cit-hm8-mail01.bmw-carit.de ([212.118.206.84]:31821 "EHLO cit-hm8-gw01.bmw-carit.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751520AbcKIJNY (ORCPT ); Wed, 9 Nov 2016 04:13:24 -0500 X-CTCH-RefID: str=0001.0A0C0206.5822E8A8.017F,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0 Subject: Re: [RFC] fs: add userspace critical mounts event support To: "Luis R. Rodriguez" , Linus Torvalds , Jiri Kosina , Johannes Berg , Jouni Malinen , Seth Forshee , Tom Gundersen , Kay Sievers , Bjorn Andersson , Daniel Wagner , "Rafael J. Wysocki" References: <20160824203901.GT3296@wotan.suse.de> <20160825194133.GC3296@wotan.suse.de> <20160902235916.GO3296@wotan.suse.de> <20160903002014.GP3296@wotan.suse.de> <6332a54e-74c6-eafd-368e-71e87a3fa34e@landley.net> <20161005180017.GD3296@wotan.suse.de> <20161005194633.GE3296@wotan.suse.de> <20161108224726.GD13978@wotan.suse.de> CC: "Herbert, Marc" , Daniel Vetter , Rob Landley , Mimi Zohar , Felix Fietkau , David Woodhouse , Roman Pen , Ming Lei , Andrew Morton , Michal Marek , Greg KH , Linux Kernel Mailing List , Vikram Mulukutla , Stephen Boyd , Mark Brown , Takashi Iwai , Christian Lamparter , Hauke Mehrtens , Josh Boyer , Dmitry Torokhov , Jiri Slaby , Andy Lutomirski , Wu Fengguang , Richard Purdie , Jeff Mahoney , Jacek Anaszewski , , Julia Lawall , , , David Howells , Alessandro Rubini , Kevin Cernekee , Kees Cook , Jonathan Corbet , Thierry Martinez , linux-serial , "open list:DOCUMENTATION" , linuxppc-dev , Josh Triplett , Harald Hoyer From: Daniel Wagner Message-ID: Date: Wed, 9 Nov 2016 10:13:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161108224726.GD13978@wotan.suse.de> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.10.50.50] X-ClientProxiedBy: CIT-HM8-EX01.bmw-carit.intra (10.40.100.13) To CIT-HM8-EX01.bmw-carit.intra (10.40.100.13) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7204 Lines: 132 [CC: added Harald] On 11/08/2016 11:47 PM, Luis R. Rodriguez wrote: > On Wed, Oct 05, 2016 at 09:46:33PM +0200, Luis R. Rodriguez wrote: >> On Wed, Oct 05, 2016 at 11:08:06AM -0700, Linus Torvalds wrote: >>> On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguez wrote: >>>> On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote: >>>> >>>>> I did some shuffling around of those code to make initmpfs work, does >>>>> anybody know why initramfs extraction _before_ we initialize drivers >>>>> would be a bad thing? >>>> >>>> No, but it seems sensible to me, if its done before do_initcalls() >>>> that should resolve the race for initramfs users >>> >>> initramfs should already be set up before drivers are. >> >> Actually you are right, the issue would only be for old initrd, for initramfs >> we populate that via rootfs_initcall(populate_rootfs), so as long as drivers >> in question use an init level beyond rootfs's we're good there. >> >>> Exactly what is it that has trouble right now? >> >> It would seem then that the only current stated race possible should >> then be non-initramfs users. > > Or: > > a) initramfs users that include a driver but do not include the firmware > into initramfs > > b) driver is built-in but firmware is not in initrafms (Johannes reports > this causes driver failure on intel wireless for instance, and I guess > you need to reload) > >> One example if very large firmware for >> remote-proc, whereby an initramfs is just not practical or desirable. > > This issue still stands. At Plumbers Johannes Berg did indicate to me > he had a simple elegant solution in mind. He suggested that since the > usermode helper was available, he had added support to be able to > differentiate async firmware request calls form sync requests, and that > userspace should not return an error *iff* the request made was async and > it can determine we're initramfs. The semantics issue is the same though, > is there a generic way to determine we're initramfs ? What if we move > multiple levels? Anyway -- provided we could figure this out, userspace > would simply yield and wait until the real rootfs is met. Upon pivot_root() > the assumption is all previous udev events pending would be re-triggered > and finally udev could finally confirm/deny if the firmware was present. > This would *also* allow you to stuff your firmware whever, however big > it was. This however relied on the userspace firmware loading support, > it turns out that (I think because of an incorrect negative backlash > back in the day over blaming this over booting issues due to the timeout > whereas the real issue was the kmod timeout was affecting our long > standing serial init()/probe()) the systemd userspace firmware laoding > support was removed from systemd udev in 2014 by Kay via commit > be2ea723b1d023b3d ("udev: remove userspace firmware loading support"). > > Systemd might *still* be able to provide a solution here, however I will > note Johannes was asking for *all* async firmware requests to always > rely on the kernel syfs UMH fallback -- this suggestion is against the > direction we've been taking to eventually compartamentalize the kernel UMH > code, so whatever we decide to do, lets please take a breather and seriously > address this properly *with* systemd folks. > > A side race discussed at Plumbers worth mentioning given its related to the > UMH was inspired by Jiri's talk on the abuse of the freezer for kthreads -- > and his suggestion to use freeze_super(). Currently the UMH lock is used > for the UMH but as I have noted in Daniel Wagner's recent patches to > give some love to this code and further compartamentalize the UMH -- > the UMH lock was originally added to help avoid drivers use the firmware > API on resume, given the races. The firmware cache solution implemented by > Ming Lei years ago helped address this, whereby devm helpers are used > based on the requested firmware and prior to suspend we cache all required > firmware for drivers so that upon resume calls would work without the > effective race present. This mitigated the actual races / issues with > drivers, but they must not use the firmware API on suspend/resume. Since > this solution *kills* all pending UMH caller on suspend obviously this > means on suspend using request_firmware*() API and expecting it to work > is brutally dumb as we will eventually kill any pending requests. This > is a long winded way to say that if you rely on the UMH for firmware > you must figure out your own proactive firmware cache solution and > must definitely not request firmware on suspend. Two things then: > > 1) I've been brainstorming with Daniel how to use freeze_super() to > replace the now invalid UMH lock -- its purpose only helps races > on boot, for the fallback case to the UMH. But note most distributions > disable forcing it always on, so these days we *only* rely on the UMH > as a fallback if the driver explicitly requested it > > 2) Drivers relying on the UMH very likely have a broken cache solution > if they are doing this on suspend > > Whatever the outcome of this discussion is -- Johannes seemed to *want* > to further use the UMH by default on *all* async alls... even if the > driver did not explicitly requested it -- I'm concerned about this given > all the above and the existing flip/flop on systemd for it. Whatever > we try to dream up here, please consider all the above as well. As Harald pointed out over a beer yesterday evening, there is at least one more reason why UMH isn't obsolete. The ordering of the firmware loading might be of important. Say you want to greet the user with a splash screen really early on, the graphic card firmware should be loaded first. Also the automotive world has this fancy requirement that rear camera must be on the screen within 2 seconds. So controlling the firmware loading order is of importance (e.g. also do not overcommit the I/O bandwith not so important firmwares). A user space helper is able to prioritize the request accordingly the use case. >>> The gating issue for initramfs is that technically the filesystem >>> setup needs to be done, which means that it currently ends up being >>> populated _fairly_ late in the initcall series, but certainly before >>> drivers. But since initramfs really only needs very limited filesystem >>> functionality, I assume Rob had few problems with just moving it >>> earlier. >>> >>> Still, what kind of ordering issues did people have? What is it that >>> needs to load files even before driver init? Some crazy subsystem? >> >> No, I think this is just about non-initramfs users now, > > And as Johannes pointed two above to cases. > >> if we disregard >> old initrd users. Bjorn, Marc, correct me if I'm wrong, as I think its >> so far you both who have seemed to run into race issues and have then >> ended up trying to look for hacks to address this race or considered using >> the usermode helper (which we're trying to minimize users for). Daniel >> seems to note a lot of video drivers use firmware on probe as well so >> there's a potential issue for those users if they don't use initramfs. Daniel